目錄
Preface
Part I. A Guided Tour of the Social Web
Prelude
1. Mining Twitter: Exploring Trending Topics, Discovering What People Are Talking
About, and More
1.1 Overview
1.2 Why Is Twitter All the Rage?
1.3 Exploring Twitter's API
1.3.1 Fundamental Twitter Terminology
1.3.2 Creating a Twitter API Connection
1.3.3 Exploring Trending Topics
1.3.4 Searching for Tweets
1.4 Analyzing the 140 (or More) Characters
1.4.1 Extracting Tweet Entities
1.4.2 Analyzing Tweets and Tweet Entities with Frequency Analysis
1.4.3 Computing the Lexical Diversity of Tweets
1.4.4 Examining Patterns in Retweets
1.4.5 Visualizing Frequency Data with Histograms
1.5 Closing Remarks
1.6 Recommended Exercises
1.7 Online Resources
2. Mining Facebook: Analyzing Fan Pages, Examining Friendships, and More
2.1 Overview
2.2 Exploring Facebook's Graph API
2.2.1 Understanding the Graph API
2.2.2 Understanding the Open Graph Protocol
2.3 Analyzing Social Graph Connections
2.3.1 Analyzing Facebook Pages
2.3.2 Manipulating Data Using pandas
2.4 Closing Remarks
2.5 Recommended Exercises
2.6 Online Resources
3. Mining Instagram: Computer Vision, Neural Networks, Object Recognition,
and Face Detection
3.1 Overview
3.2 Exploring the Instagram API
3.2.1 Making Instagram API Requests
3.2.2 Retrieving Your Own Instagram Feed
3.2.3 Retrieving Media by Hashtag
3.3 Anatomy of an Instagram Post
3.4 Crash Course on Artificial Neural Networks
3.4.1 Training a Neural Network to "Look" at Pictures
3.4.2 Recognizing Handwritten Digits
3.4.3 Object Recognition Within Photos Using Pretrained Neural
Networks
3.5 Applying Neural Networks to Instagram Posts
3.5.1 Tagging the Contents of an Image
3.5.2 Detecting Faces in Images
3.6 Closing Remarks
3.7 Recommended Exercises
3.8 Online Resources
4. Mining Linkeflln: Faceting Job Titles, Clustering Colleagues, and More
4.1 Overview
4.2 Exploring the LinkedIn API
4.2.1 Making LinkedIn API Requests
4.2.2 Downloading LinkedIn Connections as a CSV File
4.3 Crash Course on Clustering Data
4.3.1 Normalizing Data to Enable Analysis
4.3.2 Measuring Similarity
4.3.3 Clustering Algorithms
4.4 Closing Remarks /
4.5 Recommended Exercises
4.6 Online Resources
5. Mining Text Files: Computing Document Similarity, Extracting Collocations, and More.
5.1 Overview
5.2 Text Files
5.3 A Whiz-Bang Introduction to TF-IDF
5.3.1 Term Frequency
5.3.2 Inverse Document Frequency
5.3.3 TF-IDF
5.4 Querying Human Language Data with TF-IDF
5.4.1 Introducing the Natural Language Toolkit
5.4.2 Applying TF-IDF to Human Language
5.4.3 Finding Similar Documents
5.4.4 Analyzing Bigrams in Human Language
5.4.5 Reflections on Analyzing Human Language Data
5.5 Closing Remarks
5.6 Recommended Exercises
5.7 Online Resources
6. Mining Web Pages: Using Natural Language Processing to Understand Human
Language, Summarize Blog Posts, and More
6.1 Overview
6.2 Scraping, Parsing, and Crawling the Web
6.2.1 Breadth-First Search in Web Crawling
6.3 Discovering Semantics by Decoding Syntax
6.3.1 Natural Language Processing Illustrated Step-by-Step
6.3.2 Sentence Detection in Human Language Data
6.3.3 Document Summarization
6.4 Entity-Centric Analysis: A Paradigm Shift
6.4.1 Gisting Human Language Data
6.5 Quality of Analytics for Processing Human Language Data
6.6 Closing Remarks
6.7 Recommended Exercises
6.8 Online Resources
7. Mining Mailboxes: Analyzing Who's Talking to Whom About What,
How Often, and More
7.1 Overview
7.2 Obtaining and Processing a Mail Corpus
7.2.1 A Primer on Unix Mailboxes
7.2.2 Getting the Enron Data
7.2.3 Converting a Mail Corpus to a Unix Mailbox
7.2.4 Converting Unix Mailboxes to pandas DataFrames
7.3 Analyzing the Enron Corpus
7.3.1 Querying by Date/Time Range
7.3.2 Analyzing Patterns in Sender/Recipient Communications
7.3.3 Searching Emails by Keywords
7.4 Analyzing Your Own Mail Data
7.4.1 Accessing Your Gmail with OAuth
7.4.2 Fetching and Parsing Email Messages
7.4.3 Visualizing Patterns in Email with Immersion
7.5 Closing Remarks
7.6 Recommended Exercises
7.7 Online Resources
8. Mining GitHub: Inspecting Software Collaboration Habits, Building Interest Graphs,
and More
8.1 Overview
8.2 Exploring GitHub's API
8.2.1 Creating a GitHub API Connection
8.2.2 Making GitHub API Requests
8.3 Modeling Data with Property Graphs
8.4 Analyzing GitHub Interest Graphs
8.4.1 Seeding an Interest Graph
8.4.2 Computing Graph Centrality Measures
8.4.3 Extending the Interest Graph with "Follows" Edges for Users
8.4.4 Using Nodes as Pivots for More Efficient Queries
8.4.5 Visualizing Interest Graphs
8.5 Closing Remarks
8.6 Recommended Exercises
8.7 Online Resources
Part II. Twitter Cookbook
9. Twitter Cookbook
9.1 Accessing Twitter's API for Development Purposes
9.2 Doing the OAuth Dance to Access Twitter's API for Production Purposes
9.3 Discovering the Trending Topics
9.4 Searching for Tweets
9.5 Constructing Convenient Function Calls
9.6 Saving and Restoring ]SON Data with Text Files
9.7 Saving and Accessing JSON Data with MongoDB /
9.8 Sampling the Twitter Firehose with the Streaming API
9.9 Collecting Time-Series Data
9.10 Extracting Tweet Entities
9.11 Finding the Most Popular Tweets in a Collection of Tweets
9.12 Finding the Most Popular Tweet Entities in a Collection of Tweets
9.13 Tabulating Frequency Analysis
9.14 Finding Users Who Have Retweeted a Status
9.15 Extracting a Retweet's Attribution
9.16 Making Robust Twitter Requests
9.17 Resolving User Profile Information
9.18 Extracting Tweet Entities from Arbitrary Text
9.19 Getting All Friends or Followers for a User
9.20 Analyzing a User's Friends and Followers
9.21 Harvesting a User's Tweets
9.22 Crawling a Friendship Graph
9.23 Analyzing Tweet Content
9.24 Summarizing Link Targets
9.25 Analyzing a User's Favorite Tweets
9.26 Closing Remarks
9.27 Recommended Exercises
9.28 Online Resources
Part III. Appendixes
A. Information About This Book's Virtual Machine Experience
B. OAuth Primer
C. Python and Jupyter Notebook Tips and Tricks
Index