內容大鋼
機器學習(ML)和人工智慧(AI)領域正在蓬勃發展,幾乎每天都有新的研究、模型和技術出現。面對如此豐富的選擇,數據科學家、機器學習工程師和軟體開發人員很容易迷失在將AI/ML模型從實驗階段推向生產的眾多步驟中。
這本實用書籍專註于生產環境機器學習,指導你將ML模型轉化為可行的產品和應用。生產環境機器學習涵蓋了ML的所有領域,不僅限於簡單的模型訓練。本書特彆強調了ML流水線,幫助你為ML生產系統奠定基礎。
你即將開啟探索之旅,學習將ML應用投入生產所需的廣泛技術,以及需要考慮的問題和方法。關鍵的ML工程主題包括:
·數據收集、驗證、存儲、特徵工程
·模型分析、服務、監控、日誌記錄
·使用TensorFlow Extended(TFX)和其他工具編排機器學習流水線
本書提供了深入的實例,包括適用於自然語言處理(NLP)和電腦視覺模型的端到端機器學習流水線。
目錄
Foreword
Preface
1. Introduction to Machine Learning Production Systems
What Is Production Machine Learning?
Benefits of Machine Learning Pipelines
Focus on Developing New Models, Not on Maintaining Existing Models
Prevention of Bugs
Creation of Records for Debugging and Reproducing Results
Standardization
The Business Case for ML Pipelines
When to Use Machine Learning Pipelines
Steps in a Machine Learning Pipeline
Data Ingestion and Data Versioning
Data Validation
Feature Engineering
Model Training and Model Tuning
Model Analysis
Model Deployment
Looking Ahead
2. Collecting, Labeling, and Validating Data
Important Considerations in Data Collection
Responsible Data Collection
Labeling Data: Data Changes and Drift in Production ML
Labeling Data: Direct Labeling and Human Labeling
Validating Data: Detecting Data Issues
Validating Data: TensorFlow Data Validation
Skew Detection with TFDV
Types of Skew
Example: Spotting Imbalanced Datasets with TensorFlow Data Validation
Conclusion
3. Feature Engineering and Feature Selection
Introduction to Feature Engineering
Preprocessing Operations
Feature Engineering Techniques
Normalizing and Standardizing
Bucketing
Feature Crosses
Dimensionality and Embeddings
Visualization
Feature Transformation at Scale
Choose a Framework That Scales Well
Avoid Training–Serving Skew
Consider Instance Level Versus Full Pass Transformations
Using TensorFlow Transform
Analyzers
Code Example
Feature Selection
Feature Spaces
Feature Selection Overview
Filter Methods
Wrapper Methods
Embedded Methods
Feature and Example Selection for LLMs and GenAI
Example: Using TF Transform to Tokenize Text
Benefits of Using TF Transform
Alternatives to TF Transform
Conclusion
4. Data Journey and Data Storage
Data Journey
ML Metadata
Using a Schema
Schema Development
Schema Environments
Changes Across Datasets
Enterprise Data Storage
Feature Stores
Data Warehouses
Data Lakes
Conclusion
5. Advanced Labeling, Augmentation, and Data Preprocessing
Advanced Labeling
Semi Supervised Labeling
Active Learning
Weak Supervision
Advanced Labeling Review
Data Augmentation
Example: CIFAR 10
Other Augmentation Techniques
Data Augmentation Review
Preprocessing Time Series Data: An Example
Windowing
Sampling
Conclusion
6. Model Resource Management Techniques
Dimensionality Reduction: Dimensionality Effect on Performance
Example: Word Embedding Using Keras
Curse of Dimensionality
Adding Dimensions Increases Feature Space Volume
Dimensionality Reduction
Quantization and Pruning
Mobile, IoT, Edge, and Similar Use Cases
Quantization
Optimizing Your TensorFlow Model with TF Lite
Optimization Options
Pruning
Knowledge Distillation
Teacher and Student Networks
Knowledge Distillation Techniques
TMKD: Distilling Knowledge for a Q&A Task
Increasing Robustness by Distilling EfficientNets
Conclusion
7. High-Performance Modeling.
8. Model Analysis.
9. Interpretability
10. Neural Architecture Search
11. Introduction to Model Serving
12. Model Servincl Patterns
13. Model Serving Infrastructure
14. Model Serving Examples
15. Model Manaqement and Delivery
16. Model Monitoring and Logging
17. Privacy and Legal Requirements
18. Orchestrating Machine Learning Pipelines
19. AdvancedTFX
20. ML Pipelines for Computer Vision Problems.
21. ML Pipelines for Natural Language Processing
22. Generative AI
23. The Future of Machine Learning Production Systems and Next Steps
Index