Foreword Preface 1.Understanding Performant Python The Fundamental Computer System Computing Units Memory Units Communications Layers Putting the Fundamental Elements Together Idealized Computing Versus the Python Virtual Machine So Why Use Python? How to Be a Highly Performant Programmer Good Working Practices Some Thoughts on Good Notebook Practice Getting the Joy Back into Your Work 2.Profiling to Find Bottlenecks. Profiling Efficiently Introducing the Julia Set Calculating the Full Julia Set Simple Approaches to Timing—print and a Decorator Simple Timing Using the Unix time Command Using the cProfile Module Visualizing cProfile Output with SnakeViz Using line_profiler for Line-by-Line Measurements Using memory_profiler to Diagnose Memory Usage Introspecting an Existing Process with PySpy Bytecode: Under the Hood Using the dis Module to Examine CPython Bytecode Different Approaches, Different Co mplexity Unit Testing During Optimization to Maintain Correctness No-op @profile Decorator Strategies to Profile Your Code Successfully Wrap-Up 3.Lists and Tuples A More Efficient Search Lists Versus Tuples Lists as Dynamic Arrays Tuples as Static Arrays Wrap-Up 4.Dictionaries and Sets. How Do Dictionaries and Sets Work? Inserting and Retrieving Deletion Resizing Hash Functions and Entropy Dictionaries and Namespaces Wrap-Up 5.Iterators and Generators. Iterators for Infinite Series Lazy Generator Evaluation Wrap-Up
6.Matrix and Vector Computation. Introduction to the Problem Aren't Python Lists Good Enough? Problems with Allocating Too Much Memory Fragmentation Understanding perf Making Decisions with perf's Output Enter numpy Applying numpy to the Diffusion Problem Memory Allocations and In-Place Operations Selective Optimizations: Finding What Needs to Be Fixed numexpr: Making In-Place Operations Faster and Easier A Cautionary Tale: Verify 「Optimizations"(scipy) Lessons from Matrix Optimizations Pandas Pandas's Internal Model Applying a Function to Many Rows of Data Building DataFrames and Series from Partial Results Rather than Concatenating There's More Than One (and Possibly a Faster) Way to Do a Job Advice for Effective Pandas Development asu Wrap-Up 7.Compiling to C. What Sort of Speed Gains Are Possible? JIT Versus AOT Compilers Why Does Type Information Help the Code Run Faster? Using a C Compiler Reviewing the Julia Set Example Cython Compiling a Pure Python Version Using Cython pyximport Cython Annotations to Analyze a Block of Code Adding Some Type Annotations Cython and numpy Parallelizing the Solution with OpenMP on One Machine Numba Numba to Compile NumPy for Pandas PyPy Garbage Collection Differences Running PyPy and Installing Modules A Summary of Speed Improvements When to Use Each Technology Other Upcoming Projects Graphics Processing Units (GPUs) Dynamic Graphs: PyTorch Basic GPU Profiling Performance Considerations of GPUs When to Use GPUs Foreign Function Interfaces ctypes
cffi f2py CPython Module Wrap-Up 8.Asynchronous l/0. 9.The multiprocessing Module. 10.Clusters and Job Queues 11.Using Less RAM. 12.Lessons from the Field. Index