Preface Acknowledgments Symbols 1 Introduction 1.1 Decision Functions 1.1.1 Decision Functions for Two-Class Problems 1.1.2 Decision Functions for Multiclass Problems 1.2 Determination of Decision Functions 1.3 Data Sets Used in the Book 1.4 Classifier Evaluation References 2 Two-Class Support Vector Machines 2.1 Hard-Margin Support Vector Machines 2.2 L1 Soft-Margin Support Vector Machines 2.3 Mapping to a High-Dimensional Space 2.3.1 Kernel Tricks 2.3.2 Kernels 2.3.3 Normalizing Kernels 2.3.4 Properties of Mapping Functions Associated with Kernels 2.3.5 Implicit Bias Terms 2.3.6 Empirical Feature Space 2.4 L2 Soft-Margin Support Vector Machines 2.5 Advantages and Disadvantages 2.5.1 Advantages 2.5.2 Disadvantages 2.6 Characteristics of Solutions 2.6.1 Hessian Matrix 2.6.2 Dependence of Solutions on C 2.6.3 Equivalence of L1 and L2 Support Vector Machines 2.6.4 Nonunique Solutions 2.6.5 Reducing the Number of Support Vectors 2.6.6 Degenerate Solutions 2.6.7 Duplicate Copies of Data 2.6.8 Imbalanced Data 2.6.9 Classification for the Blood Cell Data 2.7 Class Boundaries for Different Kernels 2.8 Developing Classifiers 2.8.1 Model Selection 2.8.2 Estimating Generalization Errors 2.8.3 Sophistication of Model Selection 2.8.4 Effect of Model Selection by Cross-Validation 2.9 Invariance for Linear Transformation References 3 Multiclass Support Vector Machines 3.1 One-Against-All Support Vector Machines 3.1.1 Conventional Support Vector Machines 3.1.2 Fuzzy Support Vector Machines 3.1.3 Equivalence of Fuzzy Support Vector Machines and Support Vector Machines with Continuous Decision Functions 3.1.4 Decision-Tree-Based Support Vector Machines 3.2 Pairwise Support Vector Machines
3.2.1 Conventional Support Vector Machines 3.2.2 Fuzzy Support Vector Machines 3.2.3 Performance Comparison of Fuzzy Support Vector Machines 3.2.4 Cluster-Based Support Vector Machines 3.2.5 Decision-Tree-Based Support Vector Machines 3.2.6 Pairwise Classification with Correcting Classifiers 3.3 Error-Correcting Output Codes 3.3.1 Output Coding by Error-Correcting Codes 3.3.2 Unified Scheme for Output Coding 3.3.3 Equivalence of ECOC with Membership Functions 3.3.4 Performance Evaluation 3.4 All-at-Once Support Vector Machines 3.5 Comparisons of Architectures 3.5.1 One-Against-All Support Vector Machines 3.5.2 Pairwise Support Vector Machines 3.5.3 ECOC Support Vector Machines 3.5.4 All-at-Once Support Vector Machines 3.5.5 Training Difficulty 3.5.6 Training Time Comparison References 4 Variants of Support Vector Machines 4.1 Least-Squares Support Vector Machines 4.1.1 Two-Class Least-Squares Support Vector Machines 4.1.2 One-Against-All Least-Squares Support Vector Machines 4.1.3 Pairwise Least-Squares Support Vector Machines 4.1.4 All-at-Once Least-Squares Support Vector Machines 4.1.5 Performance Comparison 4.2 Linear Programming Support Vector Machines 4.2.1 Architecture 4.2.2 Performance Evaluation 4.3 Sparse Support Vector Machines 4.3.1 Several Approaches for Sparse Support Vector Machines 4.3.2 Idea 4.3.3 Support Vector Machines Trained in the Empirical Feature Space 4.3.4 Selection of Linearly Independent Data 4.3.5 Performance Evaluation 4.4 Performance Comparison of Different Classifiers 4.5 Robust Support Vector Machines 4.6 Bayesian Support Vector Machines 4.6.1 One-Dimensional Bayesian Decision ~nctions 4.6.2 Parallel Displacement of a Hyperplane 4.6.3 Normal Test 4.7 Incremental Training 4.7.1 Overview 4.7.2 Incremental Training Using Hyperspheres 4.8 Learning Using Privileged Information 4.9 Semi-Supervised Learning 4.10 Multiple Classifier Systems 4.11 Multiple Kernel Learning 4.12 Confidence Level
4.13 Visualization References 5 Training Methods 5.1 Preselecting Support Vector Candidates 5.1.1 Approximation of Boundary Data 5.1.2 Performance Evaluation 5.2 Decomposition Techniques 5.3 KKT Conditions Revisited 5.4 Overview of Training Methods 5.5 Primal Dual Interior-Point Methods 5.5.1 Primal-Dual Interior-Point Methods for Linear Programming 5.5.2 Primal-Dual Interior-Point Methods for Quadratic Programming 5.5.3 Performance Evaluation 5.6 Steepest Ascent Methods and Newton's Methods 5.6.1 Solving Quadratic Programming Problems Without Constraints 5.6.2 Training of L1 Soft-Margin Support Vector Machines 5.6.3 Sequential Minimal Optimization 5.6.4 Training of L2 Soft-Margin Support Vector Machines 5.6.5 Performance Evaluation 5.7 Batch Training by Exact Incremental Training 5.7.1 KKT Conditions 5.7.2 Training by Solving a Set of Linear Equations 5.7.3 Performance Evaluation 5.8 Active Set Training in Primal and Dual 5.8.1 Training Support Vector Machines in the Primal 5.8.2 Comparison of Training Support Vector Machines in the Primal and the Dual 5.8.3 Performance Evaluation 5.9 Training of Linear Programming Support Vector Machines 5.9.1 Decomposition Techniques 5.9.2 Decomposition Techniques for Linear Programming Support Vector Machines 5.9.3 Computer Experiments References 6 Kernel-Based Methods 6.1 Kernel Least Squares 6.1.1 Algorithm 6.1.2 Performance Evaluation 6.2 Kernel Principal Component Analysis 6.3 Kernel Mahalanobis Distance 6.3.1 SVD-Based Kernel Mahalanobis Distance 6.3.2 KPCA-Based Mahalanobis Distance 6.4 Principal Component Analysis in the Empirical Feature Space 6.5 Kernel Discriminant Analysis 6.5.1 Kernel Discriminant Analysis for Two-Class Problems 6.5.2 Linear Discriminant Analysis for Two-Class Problems in the Empirical Feature Space 6.5.3 Kernel Discriminant Analysis for Multiclass Problems References 7 Feature Selection and Extraction 7.1 Selecting an Initial Set of Features 7.2 Procedure for Feature Selection 7.3 Feature Selection Using Support Vector Machines
7.3.1 Backward or Forward Feature Selection 7.3.2 Support Vector Machine-Based Feature Selection 7.3.3 Feature Selection by Cross-Validation 7.4 Feature Extraction References 8 Clustering 8.1 Domain Description 8.2 Extension to Clustering References 9 Maximum-Margin Multilayer Neural Networks 9.1 Approach 9.2 Three-Layer Neural Networks 9.3 CARVE Algorithm 9.4 Determination of Hidden-Layer Hyperplanes 9.4.1 Rotation of Hyperplanes 9.4.2 Training Algorithm 9.5 Determination of Output-Layer Hyperplanes 9.6 Determination of Parameter Values 9.7 Performance Evaluation References 10 Maximum-Margin Fuzzy Classifiers 10.1 Kernel Fuzzy Classifiers with Ellipsoidal Regions 10.1.1 Conventional Fuzzy Classifiers with Ellipsoidal Regions 10.1.2 Extension to a Feature Space 10.1.3 Transductive Training 10.1.4 Maximizing Margins 10.1.5 Performance Evaluation 10.2 Fuzzy Classifiers with Polyhedral Regions 10.2.1 Training Methods 10.2.2 Performance Evaluation References 11 Function Approximation 11.1 Optimal Hyperplanes 11.2 L1 Soft-Margin Support Vector Regressors 11.3 L2 Soft-Margin Support Vector Regressors 11.4 Model Selection 11.5 Training Methods 11.5.1 Overview 11.5.2 Newton's Methods 11.5.3 Active Set Training 11.6 Variants of Support Vector Regressors 11.6.1 Linear Programming Support Vector Regressors 11.6.2 v-Support Vector Regressors 11.6.3 Least-Squares Support Vector Regressors 11.7 Variable Selection 11.7.1 Overview 11.7.2 Variable Selection by Block Deletion 11.7.3 Performance Evaluation References A Conventional Classifiers
A.1 Bayesian Classifiers A.2 Nearest-Neighbor Classifiers References B Matrices B.1 Matrix Properties B.2 Least-Squares Methods and Singular Value Decomposition B.3 Covariance Matrices References C Quadratic Programming C.1 Optimality Conditions C.2 Properties of Solutions D Positive Semidefinite Kernels and Reproducing Kernel Hilbert Space D.1 Positive Semidefinite Kernels D.2 Reproducing Kernel Hilbert Space References Index