1 Introduction 1.1 Thematic Context 1.2 Functional Principles of Markov Models 1.3 Goal and Structure of the Book 2 Application Areas 2.1 Speech 2.2 Writing 2.3 Biological Sequences 2.4 Outlook Part I Theory 3 PartFoundations of Mathematical Statistics 3.1 Random Experiment, Event, and Probability 3.2 Random Variables and Probability Distributions 3.3 Parameters of Probability Distributions 3.4 Normal Distributions and Mixture Models 3.5 Stochastic Processes and Markov Chains 3.6 Principles of Parameter Estimation 3.6.1 Maximum Likelihood Estimation 3.6.2 Maximum a Posteriori Estimation 3.7 Bibliographical Remarks 4 PartVector Quantization and Mixture Estimation 4.1 Definition 4.2 Optimality 4.2.1 Nearest-Neighbor Condition 4.2.2 Centroid Condition 4.3 Algorithms for Vector Quantizer Design 4.3.1 Lloyd's Algorithm 4.3.2 LBG Algorithm 4.3.3 k-Means Algorithm 4.4 Estimation of Mixture Density Models 4.4.1 EM Algorithm 4.4.2 EM Algorithm for Gaussian Mixtures 4.5 Bibliographical Remarks 5 Hidden Markov Models 5.1 Definition 5.2 Modeling Outputs 5.3 Use Cases 5.4 Notation 5.5 Evaluation 5.5.1 The Total Output Probability 5.5.2 Forward Algorithm 5.5.3 The Optimal Output Probability 5.6 Decoding 5.6.1 Viterbi Algorithm 5.7 Parameter Estimation 5.7.1 Foundations 5.7.2 Forward-Backward Algorithm 5.7.3 Training Methods 5.7.4 Baum-Welch Algorithm 5.7.5 Viterbi Training
5.7.6 Segmental k-Means Algorithm 5.7.7 Multiple Observation Sequences 5.8 Model Variants 5.8.1 Alternative Algorithms 5.8.2 Alternative Model Architectures 5.9 Bibliographical Remarks 6 n-Gram Models 6.1 Definition 6.2 Use Cases 6.3 Notation 6.4 Evaluation 6.5 Parameter Estimation 6.5.1 Redistribution of Probability Mass 6.5.2 Discounting 6.5.3 Incorporation of More General Distributions 6.5.4 Interpolation 6.5.5 Backing off 6.5.6 Optimization of Generalized Distributions 6.6 Model Variants 6.6.1 Category-Based Models 6.6.2 Longer Temporal Dependencies 6.7 Bibliographical Remarks Part II Practice 7 Computations with Probabilities 7.1 Logarithmic Probability Representation 7.2 Lower Bounds for Probabilities 7.3 Codebook Evaluation for Semi-continuous HMMs 7.4 Probability Ratios 8 Configuration of Hidden Markov Models 8.1 Model Topologies 8.2 Modularization 8.2.1 Context-Independent Sub-word Units 8.2.2 Context-Dependent Sub-word Units 8.3 Conpound Models 8.4 Profile HMMs 8.5 Modeling Outputs 9 Robust Parameter Estimation 9.1 Feature Optimization 9.1.1 Decorrelation 9.1.2 Principal Component Analysis I 9.1.3 Whitening 9.1.4 Dimensionality Reduction 9.1.5 Principal Component Analysis IⅡ 9.1.6 Linear Discriminant Analysis 9.2 Tying 9.2.1 Sub-model Units 9.2.2 State Tying 9.2.3 Tying in Mixture Models 9.3 Initialization of Parameters 10 Efficient Model Evaluation
10.1 Efficient Evaluation of Mixture Densities 10.2 Efficient Decoding of Hidden Markov Models 10.2.1 Beam Search Algorithm 10.3 Efficient Generation of Recognition Results 10.3.1 First-Best Decoding of Segmentation Units 10.3.2 Algorithms for N-Best Search 10.4 Efficient Parameter Estimation 10.4.1 Forward–Backward Pruning 10.4.2 Segmental Baum-Welch Algorithm 10.4.3 Training of Model Hierarchies 10.5 Tree-Like Model Organization 10.5.1 HMM Prefix Trees 10.5.2 Tree-Like Representation for n-Gram Models 11 Model Adaptation 11.1 Basic Principles 11.2 Adaptation of Hidden Markov Models 11.2.1 Maximum-Likelihood Linear-Regression 11.3 Adaptation of n-Gram Models 11.3.1 Cache Models 11.3.2 Dialog-Step Dependent Models 11.3.3 Topic-Based Language Models 12 Integrated Search Methods 12.1 HMM Networks 12.2 Multi-pass Search 12.3 Search Space Copies 12.3.1 Context-Based Search Space Copies 12.3.2 Time-Based Search Space Copies 12.3.3 Language-Model Look-Ahead 12.4 Time~Synchronous Parallel Model Decoding 12.4.1 Generation of Segment Hypotheses 12.4.2 Language-Model-Based Search Part III Systems 13 Speech Recognition 13.1 Recognition System of RWTH Aachen University 13.1.1 Feature Extraction 13.1.2 Acoustic Modeling 13.1.3 Language Modeling 13.1.4 Search 13.2 BBN Speech Recognizer BYBLOS 13.2.1 Feature Extraction 13.2.2 Acoustic Modeling 13.2.3 Language Modeling 13.2.4 Search 13.3 ESMERALDA 13.3.1 Feature Extraction 13.3.2 Acoustic Modeling 13.3.3 Statistical and Declarative Language Modeling 13.3.4 Incremental Search 14 Handwriting Recognition 14.1 Recognition System by BBN
14.1.1 Preprocessing 14.1.2 Feature Extraction 14.1.3 Script Modeling 14.1.4 Language Modeling and Search 14.2 Recognition System of RWTH Aachen University 14.2.1 Preprocessing 14.2.2 Feature Extraction 14.2.3 Script Modeling 14.2.4 Language Modeling and Search 14.3 ESMERALDA Offline Recognition System 14.3.1 Preprocessing 14.3.2 Feature Extraction 14.3.3 Handwriting Model 14.3.4 Language Modeling and Search 14.4 Bag-of-Features Hidden Markov Models 15 Analysis of Biological Sequences 15.1 HMMER 15.1.1 Model Structure 15.1.2 Parameter Estimation 15.1.3 Interoperability 15.2 SAM 15.3 ESMERALDA 15.3.1 Feature Extraction 15.3.2 Statistical Models of Proteins References Index