目錄
1 Statistical Learning as a Regression Problem
1.1 Getting Started
1.2 Setting the Regression Context
1.3 Revisiting the Ubiquitous Linear Regression Model
1.3.1 Problems in Practice
1.4 Working with Statistical Models that are Wrong
1.4.1 An Alternative Approach to Regression
1.4.2 More on Statistical Inference with Wrong Models
1.4.3 Introduction to Sandwich Standard Errors
1.4.4 Introduction to Conformal Inference
1.4.5 Introduction to the Nonparametric Bootstrap
1.4.6 Wrong Regression Models with Binary Response Variables
1.5 The Transition to Statistical Learning
1.5.1 Models Versus Algorithms
1.6 Some Initial Concepts
1.6.1 Overall Goals of Statistical Learning
1.6.2 Forecasting with Supervised Statistical Learning
1.6.3 Overfitting
1.6.4 Data Snooping
1.6.5 Some Constructive Responses to Overfitting and Data Snooping
1.6.6 Loss Functions and Related Concepts
1.6.7 The Bias-Variance Tradeoff
1.6.8 Linear Estimators
1.6.9 Degrees of Freedom
1.6.10 Basis Functions
1.6.11 The Curse of Dimensionality
1.7 Statistical Learning in Context
Endnotes
References
2 Splines, Smoothers, and Kernels
2.1 Introduction
2.2 Regression Splines
2.2.1 Piecewise Linear Population Approximations
2.2.2 Polynomial Regression Splines
2.2.3 Natural Cubic Splines
2.2.4 B-Splines
2.3 Penalized Smoothing
2.3.1 Shrinkage and Regularization
2.4 Penalized Regression Splines
2.4.1 An Application
2.5 Smoothing Splines
2.5.1 A Smoothing Splines Illustration
2.6 Locally Weighted Regression as a Smoother
2.6.1 Nearest Neighbor Methods
2.6.2 Locally Weighted Regression
2.7 Smoothers for Multiple Predictors
2.7.1 Smoothing in Two Dimensions
2.7.2 The Generalized Additive Model
2.8 Smoothers with Categorical Variables
2.8.1 An Illustration Using the Generalized Additive Model with a Binary Outcome
2.9 An Illustration of Statistical Inference After Model Selection
2.9.1 Level I Versus Level II Summary
2.10 Kernelized Regression
2.10.1 Radial Basis Kernel
2.10.2 ANOVA Radial Basis Kernel
2.10.3 A Kernel Regression Application
2.11 Summary and Conclusions
Endnotes
References
3 Classification and Regression Trees (CART)
3.1 Introduction
3.2 An Introduction to Recursive Partitioning in CART
3.3 The Basic Ideas in More Depth
3.3.1 Tree Diagrams for Showing What the Greedy Algorithm Determined
3.3.2 An Initial Application
3.3.3 Classification and Forecasting with CART
3.3.4 Confusion Tables
3.3.5 CART as an Adaptive Nearest Neighbor Method
3.4 The Formalities of Splitting a Node
3.5 An Illustrative Prison Inmate Risk Assessment Using CART ...
3.6 Classification Errors and Costs
3.6.1 Default Costs in CART
3.6.2 Prior Probabilities and Relative Misclassification Costs
3.7 Varying the Prior and the Complexity Parameter
3.8 An Example with Three Response Categories
3.9 Regression Trees
3.9.1 A CART Application for the Correlates of a Student's GPA in High School
3.10 Pruning
3.11 Missing Data
3.11.1 Missing Data with CART
3.12 More on CART Instability
3.13 Summary of Statistical Inference with CART
3.13.1 Summary of Statistical Inference for CART Forecasts
3.14 Overall Summary and Conclusions
Exercises
Endnotes
References
4 Bagging
4.1 Introduction
4.2 The Bagging Algorithm
4.3 Some Bagging Details
4.3.1 Revisiting the CART Instability Problem
4.3.2 Resampling Methods for Bagging
4.3.3 Votes Over Trees and Probabilities
4.3.4 Forecasting and Imputation
4.3.5 Bagging Estimation and Statistical Inference
4.3.6 Margins for Classification
4.3.7 Using Out-of-Bag Observations as Test Data
4.3.8 Bagging and Bias
4.4 Some Limitations of Bagging
4.4.1 Sometimes Bagging Cannot Help
4.4.2 Sometimes Bagging Can Make the Estimation Bias Worse
4.4.3 Sometimes Bagging Can Make the Estimation Variance Worse
4.5 A Bagging Illustration
4.6 Summary and Conclusions
Exercises
Endnotes
References
5 Random Forests
5.1 Introduction and Overview
5.1.1 Unpacking How Random Forests Works
5.2 An Initial Random Forests Illustration
5.3 A Few Technical Formalities
5.3.1 What Is a Random Forest?
5.3.2 Margins and Generalization Error for Classifiers in General
5.3.3 Generalization Error for Random Forests
5.3.4 The Strength of a Random Forest
5.3.5 Dependence
5.3.6 Putting It Together
5.4 Random Forests and Adaptive Nearest Neighbor Methods
5.5 Introducing Misclassification Costs
5.5.1 A Brief Illustration Using Asymmetric Costs
5.6 Determining the Importance of the Predictors
5.6.1 Contributions to the Fit
5.6.2 Contributions to Prediction
5.7 Input Response Functions
5.7.1 Partial Dependence Plot Example
5.7.2 More than Two Response Classes
5.8 Classification and the Proximity Matrix
5.8.1 Clustering by Proximity Values
5.9 Empirical Margins
5.10 Quantitative Response Variables
5.11 A Random Forest Illustration Using a Quantitative Response Variable
5.12 Statistical Inference with Random Forests
5.13 Software and Tuning Parameters
5.14 Bayesian Additive Regression Trees (BART)
5.15 Summary and Conclusions
Exercises
Endnotes
References
6 Boosting
6.1 Introduction
6.2 AdaBoost
6.2.1 A Toy Numerical Example of AdaBoost.M1
6.2.2 Why Does Boosting Work so Well for Classification? ...
6.3 Stochastic Gradient Boosting
6.3.1 Gradient Boosting More Formally
6.3.2 Stochastic Gradient Boosting in Practice
6.3.3 Tuning Parameters
6.3.4 Output
6.4 Asymmetric Costs
6.5 Boosting, Estimation, and Consistency
6.6 A Binomial Example
6.7 Boosting for Statistical Inference and Forecasting
6.7.1 An Imputation Example
6.8 A Quantile Regression Example
6.9 Boosting in Service of Causal Inference in Observational Studies
6.10 Summary and Conclusions
Exercises
Endnotes
References
7 Support Vector Machines
7.1 Introduction
7.2 Support Vector Machines in Pictures
7.2.1 The Support Vector Classifier
7.2.2 Support Vector Machines
7.3 Support Vector Machines More Formally
7.3.1 The Support Vector Classifier Again: The Separable Case
7.3.2 The Nonseparable Case
7.3.3 Support Vector Machines
7.3.4 SVM for Regression
7.3.5 Statistical Inference for Support Vector Machines
7.4 A Classification Example
7.5 Summary and Conclusions
Exercises
Endnotes
References
8 Neural Networks
8.1 Introduction
8.2 Conventional (Vanilla) Neural Networks
8.2.1 Implementation of Gradient Descent
8.2.2 Statistical Inference with Neural Networks
8.2.3 An Application
8.2.4 Some Recent Developments
8.2.5 Implications of Conventional Neural Nets for Practice...
8.3 Deep Learning with Neural Networks
8.3 . I Convolutional Neural Networks
8.3.2 Recurrent Neural Networks
8.3.3 Adversarial Neural Networks
8.4 Conclusions
Endnotes
References
9 Reinforcement Learning and Genetic Algorithms
9.1 Introduction to Reinforcement Learning
9.2 Genetic Algorithms
9.3 An Application
9.4 Conclusions
Endnotes
References
10 Integrating Themes and a Bit of Craft Lore
10.1 Some Integrating Technical Themes
10.2 Integrating Themes Addressing Ethics and Politics
10.3 Some Suggestions for Day-to-Day Practice
10.3.1 Choose the Right Data Analysis Procedure
10.3.2 Get to Know Your Software
10.3.3 Do Not Forget the Basics
10.3.4 Getting Good Data
10.3.5 Match Your Goals to What You Can Credibly Do
10.4 Some Concluding Observations
Endnotes
References
Bibliography
Index