內容大鋼
如果你想構建一款使用自然語言文本的企業級應用,但不確定從哪裡著手或者該使用什麼工具,這本實用指南可以助你一臂之力。Wisecube首席數據科學家Alex Thomas向軟體工程師和數據科學家們展示了如何使用深度學習和Apache Spark NLP庫構建可擴展的自然語言處理(Natural Language Processing,NLP)應用。
通過具體的示例、實踐和理論解釋,以及在Spark處理框架上使用NLP進行的動手練習,本書將教授你從基本語言學和書寫系統到情感分析和搜索引擎的一切。除此之外,你還將探究開發基於文本的應用時要特別注意的性能等問題。
在本書的四個部分中,你將學習到NLP基礎知識和基本構成要素,然後再深入研究應用和系統構建:
基礎:理解自然語言處理、Apache Stark上的NLP及深度學習的基礎知識。
基本構成要素:學習包括標記化、句子分割和命名實體識別在內的NLP應用構建技術,知曉其工作方式及工作原理。
應用:探究構建你自己的NLP應用所涉及的設計、開發和實驗過程。
構建NLP系統:考慮生產和部署NLP模型的備選方案,包括支持哪些人類語言。
目錄
Preface
Part I. Basics
1. Getting Started
Introduction
Other Tools
Setting Up Your Environment
Prerequisites
Starting Apache Spark
Checking Out the Code
Getting Familiar with Apache Spark
Starting Apache Spark with Spark NLP
Loading and Viewing Data in Apache Spark
Hello World with Spark NLP
2. Natural Language Basics
What Is Natural Language?
Origins of Language
Spoken Language Versus Written Language
Linguistics
Phonetics and Phonology
Morphology
Syntax
Semantics
Sociolinguistics: Dialects, Registers, and Other Varieties
Formality
Context
Pragmatics
Roman ]akobson
How To Use Pragmatics
Writing Systems
Origins
Alphabets
Abiads
Abugidas
Syllabaries
Logographs
Encodings
ASCII
Unicode
UTF-8
Exercises: Tokenizing
Tokenize English
Tokenize Greek
Tokenize Ge'ez (Amharic)
Resources
3. NLP on Apache Spark
Parallelism, Concurrency, Distributing Computation
Parallelization Before Apache Hadoop
MapReduce and Apache Hadoop
Apache Spark
Architecture of Apache Spark
Physical Architecture
Logical Architecture
Spark SQL and Spark MLlib
Transformers
Estimators and Models
Evaluators
NLP Libraries
Functionality Libraries
Annotation Libraries
NLP in Other Libraries
Spark NLP
Annotation Library
Stages
Pretrained Pipelines
Finisher
Exercises: Build a Topic Model
Resources
4. Deep Learning Basics
Gradient Descent
Backpropagation
Convolutional Neural Networks
Filters
Pooling
Recurrent Neural Networks
Backpropagation Through Time
Elman Nets
LSTMs
Exercise 1
Exercise 2
Resources
Part II. Building Blocks
5. Processing Words
6. Information Retrieval
7. Classification and Regression
8. Sequence Modeling with Keras
9. Information Extraction
10. Topic Modeling
11. Word Embeddings
Part III. Applications
12. Sentiment Analysis and Emotion Detection
13. Building Knowledqe Bases
14. Search Engine
15. Chatbot
16. Object Character Recognition
Part IV. Building NLP Systems
17. Supporting Multiple Languages
18. Human Labeling
19. Productionizing NLP Applications
Glossary
Index