幫助中心 | 我的帳號 | 關於我們

生成式AI實用指南(使用Transformer和擴散模型影印版)(英文版)

  • 作者:(美)Omar Sanseviero//Pedro Cuenca//Apolinario Passos//Jonathan Whitaker|責編:張燁
  • 出版社:東南大學
  • ISBN:9787576620061
  • 出版日期:2025/04/01
  • 裝幀:平裝
  • 頁數:396
人民幣:RMB 184 元      售價:
放入購物車
加入收藏夾

內容大鋼
    通過這本實用的動手指南,你可以學習如何使用生成式AI技術創造全新的文本、圖像、音頻,甚至音樂。你將了解最先進的生成模型的工作原理,學習如何根據需求對其進行微調和適配,以及如何組合現有的構建模塊來創造新的模型和進行不同領域的創意應用。
    這本入門書從理論概念著手,然後指導讀者開展實際應用,並提供了大量代碼示例和易懂的插圖。你將學習如何使用開源庫來利用transformer和擴散模型進行代碼探索,並研究若干現有項目來幫助指導你的工作實踐。
    構建和自定義能夠生成文本和圖像的模型。
    探索使用預訓練模型與微調自定義模型之間的權衡。
    創建並使用能夠以任意風格生成、編輯、修改圖像的模型。
    定製transformer和擴散模型以滿足多種創意需求。
    訓練能夠反映個人獨特風格的模型。

作者介紹
(美)Omar Sanseviero//Pedro Cuenca//Apolinario Passos//Jonathan Whitaker|責編:張燁

目錄
Table of Contents
Preface
Part I. Leveraging Open Models
  1. An Introduction to Generative Media
    Generating Images
    Generating Text
    Generating Sound Clips
    Ethical and Societal Implications
    Where We've Been and Where Things Stand
    How Are Generative AI Models Created?
    Summary
  2. Transformers
    A Language Model in Action
    Tokenizing Text
    Predicting Probabilities
    Generating Text
    Zero Shot Generalization
    Few Shot Generalization
    A Transformer Block
    Transformer Model Genealogy
    Sequence to Sequence Tasks
    Encoder Only Models
    The Power of Pretraining
    Transformers Recap
    Limitations
    Beyond Text
    Project Time: Using LMs to Generate Text
    Summary
    Exercises
    Challenges
    References
  3. Compressing and Representing Information
    AutoEncoders
      Preparing the Data
      Modeling the Encoder
      Decoder
      Training
      Exploring the Latent Space
      Visualizing the Latent Space
      Variational AutoEncoders
      VAE Encoders and Decoders
      Sampling from the Encoder Distribution
      Training the VAE
      VAEs for Generative Modeling
    CLIP
      Contrastive Loss
      Using CLIP, Step by Step
      Zero Shot Image Classification with CLIP
      Zero Shot Image Classification Pipeline
      CLIP Use Cases

      Alternatives to CLIP
      Project Time: Semantic Image Search
      Summary
      Exercises
      Challenges
      References
  4. Diffusion Models
    The Key Insight: Iterative Refinement
    Training a Diffusion Model
      The Data
      Adding Noise
      The UNet
      Training
      Sampling
      Evaluation
    In Depth: Noise Schedules
      Why Add Noise?
      Starting Simple
      The Math
      Effect of Input Resolution and Scaling
    In Depth: UNets and Alternatives
      A Simple UNet
      Improving the UNet
      Alternative Architectures
    In Depth: Diffusion Objectives
    Project Time: Train Your Diffusion Model
    Summary
    Exercises
    Challenges
    References
  5. Stable Diffusion and Conditional Generation
    Adding Control: Conditional Diffusion Models
    Preparing the Data
    Creating a Class Conditioned Model
    Training the Model
    Sampling
    Improving Efficiency: Latent Diffusion
    Stable Diffusion: Components in Depth
      The Text Encoder
      The Variational AutoEncoder
      The UNet
      Stable Diffusion XL
      FLUX, SD3, and Video
      Classifier Free Guidance
    Putting It All Together: Annotated Sampling Loop
    Open Data, Open Models
    Challenges and the Sunset of LAION 5B
    Alternatives
    Fair and Commercial Use
    Project Time: Build an Interactive ML Demo with Gradio

    Summary
    Exercises
    Challenge
    References
Part II. Transfer Learning for Generative Models
  6. Fine Tuning Language Models
    Classifying Text
    Identify a Dataset
    Define Which Model Type to Use
    Select a Good Base Model
    Preprocess the Dataset
    Define Evaluation Metrics
    Train the Model
    Still Relevant?
    Generating Text
    Picking the Right Generative Model
    Training a Generative Model
    Instructions
    A Quick Introduction to Adapters
    A Light Introduction to Quantization
    Putting It All Together
    A Deeper Dive into Evaluation
    Project Time: Retrieval Augmented Generation
    Summary
    Exercises
    Challenge
    References
  7. Fine Tuning Stable Diffusion
    Full Stable Diffusion Fine Tuning
      Preparing the Dataset
      Fine Tuning the Model
      Inference
    DreamBooth
      Preparing the Dataset
      Prior Preservation
      DreamBoothing the Model
      Inference
    Training LoRAs
    Giving Stable Diffusion New Capabilities
      Inpainting
      Additional Inputs for Special Conditionings
    Project Time: Train an SDXL DreamBooth LoRA by Yourself
    Summary
    Exercises
    Challenge
    References
Part III. Going Further
  8. Creative Applications of Text to Image Models
    Image to Image
    Inpainting

    Prompt Weighting and Image Editing
      Prompt Weighting and Merging
      Editing Diffusion Images with Semantic Guidance
    Real Image Editing via Inversion
      Editing with LEDITS++
      Real Image Editing via Instruction Fine Tuning
    ControlNet
    Image Prompting and Image Variations
      Image Variations
      Image Prompting
    Project Time: Your Creative Canvas
    Summary
    Exercises
    References
  9. Generating Audio
    Audio Data
      Waveforms
      Spectrograms
    Speech to Text with Transformer Based Architectures
      Encoder Based Techniques
      Encoder Decoder Techniques
      From Model to Pipeline
      Evaluation
    From Text to Speech to Generative Audio
    Generating Audio with Sequence to Sequence Models
    Going Beyond Speech with Bark
    AudioLM and MusicLM
    AudioGen and MusicGen
    Audio Diffusion and Riffusion
    More on Diffusion Models for Generative Audio
    Evaluating Audio Generation Systems
    What's Next?
    Project Time: End to End Conversational System
    Summary
    Exercises
    Challenges
    References
  10. Rapidly Advancing Areas in Generative AI
    Preference Optimization
    Long Contexts
    Mixture of Experts
    Optimizations and Quantizations
    Data
    One Model to Rule Them All
    Computer Vision
    3D Computer Vision
    Video Generation
    Multimodality
    Community
  A. Open Source Tools

  B. LLM Memory Requirements
  C. End to End Retrieval Augmented Generation
  Index

  • 商品搜索:
  • | 高級搜索
首頁新手上路客服中心關於我們聯絡我們Top↑
Copyrightc 1999~2008 美商天龍國際圖書股份有限公司 臺灣分公司. All rights reserved.
營業地址:臺北市中正區重慶南路一段103號1F 105號1F-2F
讀者服務部電話:02-2381-2033 02-2381-1863 時間:週一-週五 10:00-17:00
 服務信箱:bookuu@69book.com 客戶、意見信箱:cs@69book.com
ICP證:浙B2-20060032