內容大鋼
當今機器學習的潛力令人驚嘆,但其複雜性也讓很多有志於此的開發者和專業人士望而卻步。無論你是希望提升技能並將機器學習應用於實際項目,還是單純對AI系統的運作原理感到好奇,本書都是你的理想起點。
作者Aur?lien G?ron以通俗易懂又不失深度的風格,為你奉上一本機器學習與深度學習的權威入門指南。本書注重清晰的講解與貼近現實的案例,帶你深入探索Scikit-Learn、PyTorch、Hugging Face等前沿工具,從基礎的回歸方法到高級神經網路架構。無論你是學生、專業人士還是技術愛好者,都能從中獲得構建智能系統的技能。
掌握包括過擬合與超參數調優等概念在內的機器學習基礎
使用Scikit-Learn完成端到端的機器學習項目,涵蓋從數據探索到模型評估的全過程
學習聚類與異常檢測等無監督學習技術
使用PyTorch構建基於Transformer的聊天機器人和擴散模型等高級架構
駕馭預訓練模型(包括LLM),並學習如何對其進行微調與加速
使用強化學習訓練自主智能體
目錄
Preface
Part I.The Fundamentals of Machine Learning
1.The Machine Learning Landscape
What Is Machine Learning?
Why Use Machine Learning?
Examples of Applications
Types of Machine Learning Systems
Training Supervision
Batch Versus Online Learning
Instance-Based Versus Model-Based Learning
Main Challenges of Machine Learning
Insufficient Quantity of Training Data
Nonrepresentative Training Data
Poor-Quality Data
Irrelevant Features
Overfitting the Training Data
Underfitting the Training Data
Deployment Issues
Stepping Back
Testing and Validating
Hyperparameter Tuning and Model Selection
Data Mismatch
Exercises
2.End-to-End Machine Learning Project
Working with Real Data
Look at the Big Picture
Frame the Problem
Select a Performance Measure
Check the Assumptions
Get the Data
Running the Code Examples Using Google Colab
Saving Your Code Changes and Your Data
The Power and Danger of Interactivity
Book Code Versus Notebook Code
Download the Data
Take a Quick Look at the Data Structure
Create a Test Set
Explore and Visualize the Data to Gain Insights
Visualizing Geographical Data
Look for Correlations
Experiment with Attribute Combinations
Prepare the Data for Machine Learning Algorithms
Clean the Data
Handling Text and Categorical Attributes
Feature Scaling and Transformation
Custom Transformers
Transformation Pipelines
Select and Train a Model
Train and Evaluate on the Training Set
Better Evaluation Using Cross-Validation
Fine-Tune Your Model
Grid Search
Randomized Search
Ensemble Methods
Analyzing the Best Models and Their Errors
Evaluate Your System on the Test Set
Launch, Monitor, and Maintain Your System
Try It Out!
Exercises
3.Classification
MNIST
Training a Binary Classifier
Performance Measures
Measuring Accuracy Using Cross-Validation
Confusion Matrices
Precision and Recall
The Precision/Recall Trade-Off
The ROC Curve
Multiclass Classification
Error Analysis
Multilabel Classification
Multioutput Classification
Exercises
4.Training Models
Linear Regression
The Normal Equation
Computational Complexity
Gradient Descent
Batch Gradient Descent
Stochastic Gradient Descent
Mini-Batch Gradient Descent
Polynomial Regression
Learning Curves
Regularized Linear Models
Ridge Regression
Lasso Regression
Elastic Net Regression
Early Stopping
Logistic Regression
Estimating Probabilities
Training and Cost Function
Decision Boundaries
Softmax Regression
Exercises
5.Decision Trees
Training and Visualizing a Decision Tree
Making Predictions
Estimating Class Probabilities
The CART Training Algorithm
Computational Complexity
Gini Impurity or Entropy?
Regularization Hyperparameters
Regression
Sensitivity to Axis Orientation
Decision Trees Have a High Variance
Exercises
6.Ensemble Learning and Random Forests
Voting Classifiers
Bagging and Pasting
Bagging and Pasting in Scikit-Learn
Out-of-Bag Evaluation
Random Patches and Random Subspaces
Random Forests
Extra-Trees
Feature Importance
Boosting
AdaBoost
Gradient Boosting
Histogram-Based Gradient Boosting
Stacking
Exercises
7.Dimensionality Reduction
The Curse of Dimensionality
Main Approaches for Dimensionality Reduction
Projection
Manifold Learning
PCA
Preserving the Variance
Principal Components
Projecting Down to d Dimensions
Using Scikit-Learn
Explained Variance Ratio
Choosing the Right Number of Dimensions
PCA for Compression
Randomized PCA
Incremental PCA
Random Projection
LLE
Other Dimensionality Reduction Techniques
Exercises
8.Unsupervised Learning Techniques
Clustering Algorithms: k-means and DBSCAN
k-Means Clustering
Limits of k-Means
Using Clustering for Image Segmentation
Using Clustering for Semi-Supervised Learning
DBSCAN
Other Clustering Algorithms
Gaussian Mixtures
Using Gaussian Mixtures for Anomaly Detection
Selecting the Number of Clusters
Bayesian Gaussian Mixture Models
Other Algorithms for Anomaly and Novelty Detection
Exercises
Part II.Neural Networks and Deep Learning
9.Introduction to Artificial Neural Networks
From Biological to Artificial Neurons
Biological Neurons
Logical Computations with Neurons
The Perceptron
The Multilayer Perceptron and Backpropagation
Building and Training MLPs with Scikit-Learn
Regression MLPs
Classification MLPs
Hyperparameter Tuning Guidelines
Number of Hidden Layers
Number of Neurons per Hidden Layer
Learning Rate
Batch Size
Other Hyperparameters
Exercises
10.Building Neural Networks with PyTorch
PyTorch Fundamentals
PyTorch Tensors
Hardware Acceleration
Autograd
Implementing Linear Regression
Linear Regression Using Tensors and Autograd
Linear Regression Using PyTorch's High-Level API
Implementing a Regression MLP
Implementing Mini-Batch Gradient Descent Using DataLoaders
Model Evaluation
Building Nonsequential Models Using Custom Modules
Building Models with Multiple Inputs
Building Models with Multiple Outputs
Building an Image Classifier with PyTorch
Using TorchVision to Load the Dataset
Building the Classifier
Fine-Tuning Neural Network Hyperparameters with Optuna
Saving and Loading PyTorch Models
Compiling and Optimizing a PyTorch Model
Exercises
11.Training Deep Neural Networks
The Vanishing/Exploding Gradients Problems
Glorot Initialization and He Initialization
Better Activation Functions
Batch Normalization
Layer Normalization
Gradient Clipping
Reusing Pretrained Layers
Transfer Learning with PyTorch
Unsupervised Pretraining
Pretraining on an Auxiliary Task
Faster Optimizers
Momentum
Nesterov Accelerated Gradient
AdaGrad
RMSProp
Adam
AdaMax
NAdam
AdamW
Learning Rate Scheduling
Exponential Scheduling
Cosine Annealing
Performance Scheduling
Warming Up the Learning Rate
Cosine Annealing with Warm Restarts
1cycle Scheduling
Avoiding Overfitting Through Regularization
l1 and l2 Regularization
Dropout
Monte Carlo Dropout
Max-Norm Regularization
Practical Guidelines
Exercises
12.Deep Computer Vision Using Convolutional Neural Networks
The Architecture of the Visual Cortex
Convolutional Layers
Filters
Stacking Multiple Feature Maps
Implementing Convolutional Layers with PyTorch
Pooling Layers
Implementing Pooling Layers with PyTorch
CNN Architectures
LeNet-5
AlexNet
GoogLeNet
ResNet
Xception
SENet
Other Noteworthy Architectures
Choosing the Right CNN Architecture
GPU RAM Requirements: Inference Versus Training
Reversible Residual Networks (RevNets)
Implementing a ResNet-34 CNN Using PyTorch
Using TorchVision's Pretrained Models
Pretrained Models for Transfer Learning
Classification and Localization
Object Detection
Fully Convolutional Networks
You Only Look Once
Object Tracking
Semantic Segmentation
Exercises
13.Processing Sequences Using RNNs and CNNs
Recurrent Neurons and Layers
Memory Cells
Input and Output Sequences
Training RNNs
Forecasting a Time Series
The ARMA Model Family
Preparing the Data for Machine Learning Models
Forecasting Using a Linear Model
Forecasting Using a Simple RNN
Forecasting Using a Deep RNN
Forecasting Multivariate Time Series
Forecasting Several Time Steps Ahead
Forecasting Using a Sequence-to-Sequence Model
Handling Long Sequences
Fighting the Unstable Gradients Problem
Tackling the Short-Term Memory Problem
Exercises
14.Natural Language Processing with RNNs and Attention
Generating Shakespearean Text Using a Character RNN
Creating the Training Dataset
Embeddings
Building and Training the Char-RNN Model
Generating Fake Shakespeare Text
Sentiment Analysis Using Hugging Face Libraries
Tokenization Using the Hugging Face Tokenizers Library
Reusing Pretrained Tokenizers
Building and Training a Sentiment Analysis Model
Bidirectional RNNs
Reusing Pretrained Embeddings and Language Models
Task-Specific Classes
The Trainer API
Hugging Face Pipelines
An Encoder-Decoder Network for Neural Machine Translation
Beam Search
Attention Mechanisms
Exercises
15.Transformers for Natural Language Processing and Chatbots
Attention Is All You Need: The Original Transformer Architecture
Positional Encodings
Multi-Head Attention
Building the Rest of the Transformer
Building an English-to-Spanish Transformer
Encoder-Only Models for Natural Language Understanding
BERT's Architecture
BERT Pretraining
BERT Fine-Tuning
Other Encoder-Only Models
Decoder-Only Transformers
GPT-1 Architecture and Generative Pretraining
GPT-2 and Zero-Shot Learning
GPT-3, In-Context Learning, One-Shot Learning, and Few-Shot Learning
Using GPT-2 to Generate Text
Using GPT-2 for Question Answering
Downloading and Running an Even Larger Model: Mistral-7B
Turning a Large Language Model into a Chatbot
Fine-Tuning a Model for Chatting and Following Instructions Using SFT and RLHF
Direct Preference Optimization (DPO)
Fine-Tuning a Model Using the TRL Library
From a Chatbot Model to a Full Chatbot System
Model Context Protocol
Libraries and Tools
Encoder-Decoder Models
Exercises
16.Vision and Multimodal Transformers
Vision Transformers
RNNs with Visual Attention
DETR: A CNN-Transformer Hybrid for Object Detection
The Original ViT
Data-Efficient Image Transformer
Pyramid Vision Transformer for Dense Prediction Tasks
The Swin Transformer: A Fast and Versatile ViT
DINO: Self-Supervised Visual Representation Learning
Other Major Vision Models and Techniques
Multimodal Transformers
VideoBERT: A BERT Variant for Text plus Video
ViLBERT: A Dual-Stream Model for Text plus Image
CLIP: A Dual-Encoder Text plus Image Model Trained with Contrastive Pretraining
DALL-E: Generating Images from Text Prompts
Perceiver: Bridging High-Resolution Modalities with Latent Spaces
Perceiver IO: A Flexible Output Mechanism for the Perceiver
Flamingo: Open-Ended Visual Dialogue
BLIP and BLIP-2
Other Multimodal Models
Exercises
17.Speeding Up Transformers
18.Autoencoders, GANs, and Diffusion Models
Efficient Data Representations
Performing PCA with an Undercomplete Linear Autoencoder
Stacked Autoencoders
Implementing a Stacked Autoencoder Using PyTorch
Visualizing the Reconstructions
Anomaly Detection Using Autoencoders
Visualizing the Fashion MNIST Dataset
Unsupervised Pretraining Using Stacked Autoencoders
Tying Weights
Training One Autoencoder at a Time
Convolutional Autoencoders
Denoising Autoencoders
Sparse Autoencoders
Variational Autoencoders
Generating Fashion MNIST Images
Discrete Variational Autoencoders
Generative Adversarial Networks
The Difficulties of Training GANs
Diffusion Models
Exercises
19.Reinforcement Learning
What Is Reinforcement Learning?
Policy Gradients
Introduction to the Gymnasium Library
Neural Network Policies
Evaluating Actions: The Credit Assignment Problem
Solving the CartPole Using Policy Gradients
Value-Based Methods
Markov Decision Processes
Temporal Difference Learning
Q-Learning
Exploration Policies
Approximate Q-Learning and Deep Q-Learning
Implementing Deep Q-Learning
DQN Improvements
Actor-Critic Algorithms
Mastering Atari Breakout Using the Stable-Baselines3 PPO Implementation
Overview of Some Popular RL Algorithms
Exercises
Thank You!
A.Autodiff
B.Mixed Precision and Quantization
Index