Preface Part I. Introduction 1. Software for Modeling Fundamentals for Modeling Software Types of Models Descriptive Models Inferential Models Predictive Models Connections Between Types of Models Some Terminology How Does Modeling Fit into the Data Analysis Process? Chapter Summary 2. A Tiflyverse Primer Tidyverse Principles Design for Humans Reuse Existing Data Structures Design for the Pipe and Functional Programming Examples of Tidyverse Syntax Chapter Summary 3. A Review of R Modeling Fundamentals An Example What Does the R Formula Do? Why Tidiness Is Important for Modeling Combining Base R Models and the Tidyverse The tidymodels Metapackage Chapter Summary Part II. Modeling Basics 4. The Ames Housing Data Exploring Features of Homes in Ames Chapter Summary 5. Spending Our Data Common Methods for Splitting Data What About a Validation Set? Multilevel Data Other Considerations for a Data Budget Chapter Summary 6. Fitting Models with parsnip Create a Model Use the Model Results Make Predictions parsnip-Extension Packages Creating Model Specifications Chapter Summary 7. A Model Workflow Where Does the Model Begin and End? Workflow Basics Adding Raw Variables to the workflow0 How Does a workflow0 Use the Formula? Tree-Based Models Special Formulas and Inline Functions
Creating Multiple Workflows at Once Evaluating the Test Set Chapter Summary 8. Feature Engineering with Recipes A Simple recipe() for the Ames Housing Data Using Recipes How Data Are Used by the recipe() Examples of Steps Encoding Qualitative Data in a Numeric Format Interaction Terms Spline Functions Feature Extraction Row Sampling Steps General Transformations Natural Language Processing Skipping Steps for New Data Tidy a recipe() Column Roles Chapter Summary 9. Judging Model Effectiveness Performance Metrics and Inference Regression Metrics Binary Classification Metrics Multiclass Classification Metrics Chapter Summary Part Ill. Tools for Creating Effective Models 10. Resampling for Evaluating Performance The Resubstitution Approach Resampling Methods Cross-Validation Repeated Cross-Validation Leave-One-Out Cross-Validation Monte Carlo Cross-Validation Validation Sets Bootstrapping Rolling Forecasting Origin Resampling Estimating Performance Parallel Processing Saving the Resampled Objects Chapter Summary 11. Comparing Models with Resampling Creating Multiple Models with Workflow Sets Comparing Resampled Performance Statistics Simple Hypothesis Testing Methods Bayesian Methods A Random Intercept Model The Effect of the Amount of Resampling Chapter Summary 12. Model Tuning and the Dangers of Overntting Model Parameters
Tuning Parameters for Different Types of Models What Do We Optimize? The Consequences of Poor Parameter Estimates Two General Strategies for Optimization Tuning Parameters in tidymodels Chapter Summary 13. Grid Search Regular and Nonregular Grids Regular Grids Nonregular Grids Evaluating the Grid Finalizing the Model Tools for Creating Tuning Specifications Tools for Efficient Grid Search Submodel Optimization Parallel Processing Benchmarking Boosted Trees Access to Global Variables Racing Methods Chapter Summary 14. Iterative Search A Support Vector Machine Model Bayesian Optimization A Gaussian Process Model Acquisition Functions The tune_bayes() Function Simulated Annealing Simulated Annealing Search Process The tune_sim_anneal() Function Chapter Summary 15. Screening Many Models Modeling Concrete Mixture Strength Creating the Workflow Set Tuning and Evaluating the Models Efficiently Screening Models Finalizing a Model Chapter Summary Part IV. Beyond the Basics 16. Dimensionality Reduction What Problems Can Dimensionality Reduction Solve? A Picture Is Worth a Thousand...Beans A Starter Recipe Recipes in the Wild Preparing a Recipe Baking the Recipe Feature Extraction Techniques Principal Component Analysis Partial Least Squares Independent Component Analysis Uniform Manifold Approximation and Projection
Modeling Chapter Summary 17. Encoding Categorical Data Is an Encoding Necessary? Encoding Ordinal Predictors Using the Outcome for Encoding Predictors Effect Encodings in tidymodels Effect Encodings with Partial Pooling Feature Hashing More Encoding Options Chapter Summary 18. Explaining Models and Predictions Software for Model Explanations Local Explanations Global Explanations Building Global Explanations from Local Explanations Back to Beans! Chapter Summary 19. When Should You Trust Your Predictions? Equivocal Results Determining Model Applicability Chapter Summary 20. Ensembles of Models Creating the Training Set for Stacking Blend the Predictions Fit the Member Models Test Set Results Chapter Summary 21. Inferential Analysis Inference for Count Data Comparisons with Two-Sample Tests Log-Linear Models A More Complex Model More Inferential Analysis Chapter Summary Appendix. Recommended Preprocessing References Index