內容大鋼
本書由Python pandas項目創始人Wes McKinney親筆撰寫,詳細介紹利用Python進行操作、處理、清洗和規整數據等方面的具體細節和基本要點。你將在閱讀過程中學習到新版本的pandas、NumPy、IPython和Jupyter。
本書由Wes McKinney創作,他是Python pandas項目的創始人。本書是對Python數據科學工具的實操化、現代化的介紹,非常適合剛學Python的數據分析師或剛學數據科學以及科學計算的Python編程者。數據文件和相關的材料可以在GitHub上找到:使用IPython shell和Jupyter notebook進行探索性計算;學習NumPy(Numerical Python)的基礎和高級特性;入門pandas庫中的數據分析工具;使用靈活工具對數據進行載入、清洗、變換、合併和重塑;使用matplotlib創建富含信息的可視化;將pandas的groupby功能應用於對數據集的切片、分塊和匯總;分析並操作規則和不規則的時間序列數據;利用完整的、詳細的示例學習如何解決現實中數據分析問題。
作者介紹
(美)韋斯·麥金尼|責編:張燁
韋斯·麥金尼是流行的Python開源數據分析庫pandas的創始人。他是一名活躍的演講者,也是Python數據社區和Apache軟體基金會的Python/C++開源開發者。目前他在紐約從事軟體架構師工作。
目錄
Preface
1.Preliminaries
1.1 What Is This Book About
What Kinds of Data
1.2 Whv Python for Data Analysis
Python as Glue
Solving the Two—Language Problem
WhvNot Python
1.3 Essential Python Libraries
NumPy
pandas
matplotlib
IPython and Iupyter
SciPy
scikit-learn
statsmodels
Other Packages
1.4 Installation and Setup
Miniconda on Windows
GNU/Linux
Miniconda on macOS
Installing Necessary Packages
Integrated Development Environments and Text Editors
1.5 Community and Conferences
1.6 Navigating This Book
Code Examples
Data for Examples
Import Conventions
2.Python Language Basics,IPython,and Jupyter Notebooks
2.1 The Python Interpreter
2.2 IPython Basics
Running the IPython Shell
Running the Jupyter Notebook
Tab Completion
Introspection
2.3 Python Language Basics
Language Semantics
ScalarTypes
Control Flow
2.4 Conclusion
3.Built.In Data Structures,Functions,and Files
3.1 Data Structures and Sequences
Tuple
List
Dictionary
Set
Built—In Sequence Functions
List,Set,and Dictionary Comprehensions
3.2 Functions
Namespaces,Scope,and Local Functions
Returning Multiple Values
Functions Are Objects
Anonymous(Lambda)Functions
Generators
Errors and Exception Handling
3.3 Files and the Operating System
Bytes and Unicode with Files
3.4 Conclusion
4.NumPy Basic:Arrays and Vectorized Computation
4.1 The NumPy ndarray:A Multidimensional Array Object
Creating ndarrays
DataTypesforndarrays
Arithmetic with NumPy Arrays
Basic Indexing and Slicing
Boolean Indexing
Fancy Indexing
Transposing Arrays and Swapping Axes
4.2 Pseudorandom Number Generation
4.3 Universal Functions:Fast Element—Wise Array Functions
4.4 Array—Oriented Programming with Arrays
Expressing Conditional Logic as Array Operations
Mathematical and Statistical Methods
Methods for Boolean Arrays
Sorting
Unique and Other Set Logic
4.5 File Input and Output with Arrays
4.6 Linear Algebra
4.7 Example:Random Walks
Simulating Many Random Walks at Once
4.8 Conclusion
5.Getting Startedwith pandas
5.1 Introduction to pandas Data Structures
Series
DataFrame
Index Objects
5.2 Essential Functionality
Reindexing
Dropping Entries from an Axis
Indexing,Selection,and Filtering
Arithmetic and Data Alignment
Function Application and Mapping
Sorting and Ranking
Axis Indexes with Duplicate Labels
5.3 Summarizing and Computing Descriptive Statistics
Correlation and C:ovariance
Unique Values,Value Counts,and Membership
5.4 Conclusion
6.Data Loading,Storage,and File Formats
6.1 Reading and Writing Data in Text Format
Reading Text Files in Pieces
WiRing Data to Text Format
Working with Other Delimited Formats
TSON Data
XML and HTML:Wleb Scraping
6.2 Binary Data Formats
Reading Microsoft Excel Files
Using HDF5 Format
6.3 Intera