  • 作者:(美)大衛·B.柯克//胡文美|責編:曲熠
  • 出版社:機械工業
  • ISBN:9787111668367
  • 出版日期:2021/01/01
  • 裝幀:平裝
  • 頁數:550
人民幣:RMB 139 元      售價:

    本書是並行編程領域的必讀之作,被圖靈獎得主David Patterson譽為「天賜之書」。書中融會了兩位作者多年來的教學和科研經驗,被伊利諾伊大學厄巴納一香檳分校(UIUC)、麻省理工學院(MIT)等名校用作教材。
    全書內容簡潔、直觀、實用,強調計算思維能力和並行編程技巧,通過三個階段的階梯式教學逐步優化程序性能,最終實現高效的並行程序。書中不僅深入講解了並行模式、性能、CUDA動態並行等各項技術,而且用豐富的應用案例來閘釋並行程序的開發過程。此外,本書還免費提供配套的Illinois-NVIDIA GPU教學工具箱,以及教學PPT、實驗作業、項目指南等資料。


CHAPTER.1  Introduction
  1.1  Heterogeneous Parallel Computing
  1.2  Architecture of a Modern GPU
  1.3  Why More Speed or Parallelism
  1.4  Speeding Up Real Applications
  1.5  Challenges in Parallel Programming
  1.6  Parallel Programming Languages and Models
  1.7  Overarching Goals
  1.8  Organization of the Book
CHAPTER.2  Data Parallel Computing
  2.1  Data Parallelism
  2.2  CUDA C Program Structure
  2.3  A Vector Addition Kernel
  2.4  Device Global Memory and Data Transfer
  2.5  Kernel Functions and Threading
  2.6  Kernel Launch
  2.7  Summary
    Function Declarations
    Kernel Launch
    Built-in (Predefined) Variables
    Run-time API
  2.8  Exercises
CHAPTER.3  Scalable Parallel Execution
  3.1  CUDA Thread Organization
  3.2  Mapping Threads to Multidimensional Data
  3.3  Image Blur: A More Complex Kernel
  3.4  Synchronization and Transparent Scalability
  3.5  Resource Assignment
  3.6  Querying Device Properties
  3.7  Thread Scheduling and Latency Tolerance
  3.8  Summary
  3.9  Exercises
CHAPTER.4  Memory and Data Locality
  4.1  Importance of Memory Access Efficiency
  4.2  Matrix Multiplication
  4.3  CUDA Memory Types
  4.4  Tiling for Reduced Memory Traffic
  4.5  A Tiled Matrix Multiplication Kernel
  4.6  Boundary Checks
  4.7  Memory as a Limiting Factor to Parallelism
  4.8  Summary
  4.9  Exercises
CHAPTER 17 Parallel Programming and ComputationalThinking
  17.1  Goals of Parallel Computing
  17.2  Problem Decomposition

  17.3  Algorithm Selection
  17.4  Computational Thinking
  17.5  Single Program, Multiple Data,Shared Memoryand Locality
  17.6  Strategies for Computational Thinking
  7.7  A Hypothetical Example: Sodium Map of the Brain
  17.8  Summary
  17.9  Exercises
CHAPTER 18 Programming a Heterogeneous ComputingCluster
  18.1  Background
  18.2  A Running Example
  18.3  Message Passing Interface Basics
  18.4  Message Passing Interface Point-to-Point Communication
  18.5  Overlapping Computation and Communication
  18.7  CUDA-Aware Message Passing Interface
  18.8  Summary
  18.9  Exercises
CHAPTER 19 Parallel Programming with OpenACC
  19.1  The OpenACC Execution Model
  19.2  OpenACC Directive Format
  19.3  OpenACC by Example
    The OpenACC Kernels Directive
    The OpenACC Parallel Directive
    Comparison of Kernels and Parallel Directives
    OpenACC Data Directives
    OpenACC Loop Optimizations
    OpenACCRoutine Directive
    Asynchronous Computation and Data
  19.4  Comparing OpenACC and CUDA
  19.5  Interoperability with CUDA and Libraries
    Calling CUDA or Libraries with OpenACC Arrays
    Using CUDA Pointers in OpenACC
    Calling CUDA Device Kernels from OpenACC
  19.6  The Future of OpenACC
  19.7  Exercises
CHAPTER 20 M ore on CUDA and Graphics Processing Unit
  20.1  Model of Host/Device Interaction
  20.2  Kernel Execution Control
  20.3  Memory Bandwidth and Compute Throughput
  20.4  Programming Environment
  20.5  Future Outlook
CHAPTER 21 Conclusion and Outlook
  21.1  Goals Revisited
  21.2  Future Outlook

Appendix A:An Introduction to OpenCL
Appendix B:THRUST:a Productivity-oriented Library for CUDA
Appendix C:CUDA Fortran
Appendix D:An introduction to C++AMP

