自2017年推出以來,transformers已迅速成為在各種自然語言處理任務中實現最佳結果的主導架構。如果你是一名數據科學家或程序員,這本實踐用書將向你展示如何使用Hugging Face Transformers(基於Python的深度學習庫)訓練和擴展這些大型模型。 Transformers已經被用來撰寫真實的新聞故事、改進Google搜索查詢,甚至創建會講老套笑話的聊天機器人。在這本指南中,作者Lewis Tunstall、Leandro von Werra、Thomas Wolf(Hugging Face Transformers的創建者)通過實踐方法來教你如何使用transformers以及如何將它集成到你的應用中。你將快速學習可以由transformers幫助解決的各種任務。 為核心NLP任務構建、調試和優化transformers模型,例如文本分類、命名實體識別和問答; 學習如何使用transformers進行跨語言遷移學習; 在缺乏標記數據的實際場景中應用transformers; 使用提取、修剪和量化等技術高效部署transformers模型; 從頭開始訓練transformers並學習如何擴展到多個GPU和分散式環境。
作者介紹
(澳大利亞)劉易斯·湯斯頓//(瑞士)雷安德羅·v.韋拉//(法國)托馬斯·沃夫|責編:張燁
目錄
Foreword Preface 1. Hello Transformers The Encoder-Decoder Framework Attention Mechanisms Transfer Learning in NLP Hugging Face Transformers: Bridging the Gap A Tour of Transformer Applications Text Classification Named Entity Recognition Question Answering Summarization Translation Text Generation The Hugging Face Ecosystem The Hugging Face Hub Hugging Face Tokenizers Hugging Face Datasets Hugging Face Accelerate Main Challenges with Transformers Conclusion 2. Text Classification The Dataset A First Look at Hugging Face Datasets From Datasets to DataFrames Looking at the Class Distribution How Long Are Our Tweets? From Text to Tokens Character Tokenization Word Tokenization Subword Tokenization Tokenizing the Whole Dataset Training a Text Classifier Transformers as Feature Extractors Fine-Tuning Transformers Conclusion 3. Transformer Anatomy The Transformer Architecture The Encoder Self-Attention The Feed-Forward Layer Adding Layer Normalization Positional Embeddings Adding a Classification Head The Decoder Meet the Transformers The Transformer Tree of Life The Encoder Branch The Decoder Branch The Encoder-Decoder Branch
Conclusion 4. Multilingual Named Entity Recognition The Dataset Multilingual Transformers A Closer Look at Tokenization The Tokenizer Pipeline The SentencePiece Tokenizer Transformers for Named Entity Recognition The Anatomy of the Transformers Model Class Bodies and Heads Creating a Custom Model for Token Classification Loading a Custom Model Tokenizing Texts for NER Performance Measures Fine-Tuning XLM-RoBERTa Error Analysis Cross-Lingual Transfer When Does Zero-Shot Transfer Make Sense? Fine-Tuning on Multiple Languages at Once Interacting with Model Widgets Conclusion 5. Text Generation The Challenge with Generating Coherent Text Greedy Search Decoding Beam Search Decoding Sampling Methods Top-k and Nucleus Sampling Which Decoding Method Is Best? Conclusion 6. Summarization The CNN/DailyMail Dataset Text Summarization Pipelines Summarization Baseline GPT-2 T5 BART PEGASUS Comparing Different Summaries Measuring the Quality of Generated Text BLEU ROUGE Evaluating PEGASUS on the CNN/DailyMail Dataset Training a Summarization Model Evaluating PEGASUS on SAMSum Fine-Tuning PEGASUS Generating Dialogue Summaries Conclusion 7. Question Answering Building a Review-Based QA System The Dataset
Extracting Answers from Text Using Haystack to Build a QA Pipeline Improving Our QA Pipeline Evaluating the Retriever Evaluating the Reader Domain Adaptation Evaluating the Whole QA Pipeline Going Beyond Extractive QA Conclusion 8. Making Transformers Efficient in Production Intent Detection as a Case Study Creating a Performance Benchmark Making Models Smaller via Knowledge Distillation Knowledge Distillation for Fine-Tuning Knowledge Distillation for Pretraining Creating a Knowledge Distillation Trainer Choosing a Good Student Initialization Finding Good Hyperparameters with Optuna Benchmarking Our Distilled Model Making Models Faster with Quantization Benchmarking Our Quantized Model Optimizing Inference with ONNX and the ONNX Runtime Making Models Sparser with Weight Pruning Sparsity in Deep Neural Networks Weight Pruning Methods Conclusion 9. Dealing with Few to No Labels Building a GitHub Issues Tagger Getting the Data Preparing the Data Creating Training Sets Creating Training Slices Implementing a Naive Bayesline Working with No Labeled Data Working with a Few Labels Data Augmentation Using Embeddings as a Lookup Table Fine-Tuning a Vanilla Transformer In-Context and Few-Shot Learning with Prompts Leveraging Unlabeled Data Fine-Tuning a Language Model Fine-Tuning a Classifier Advanced Methods Conclusion 10. Training Transformers from Scratch Large Datasets and Where to Find Them Challenges of Building a Large-Scale Corpus Building a Custom Code Dataset Working with Large Datasets Adding Datasets to the Hugging Face Hub
Building a Tokenizer The Tokenizer Model Measuring Tokenizer Performance A Tokenizer for Python Training a Tokenizer Saving a Custom Tokenizer on the Hub Training a Model from Scratch A Tale of Pretraining Obiectives Initializing the Model Implementing the Dataloader Defining the Training Loop The Training Run Results and Analysis Conclusion 11. Future Directions Scaling Transformers Scaling Laws Challenges with Scaling Attention Please! Sparse Attention Linearized Attention Going Beyond Text Vision Tables Multimodal Transformers Speech-to-Text Vision and Text Where to from Here? Index