The science of deep learning /
Iddo Drori
- xxii, 338 pages : illustrations (black and white, and colour), maps (colour) ; 25 cm
Formerly CIP.
Includes bibliographical references (pages 299-334) and index
Foundations -- Introduction -- Deep Learning -- Outline -- Part I: Foundations: Backpropagation, Optimization, and Regularization -- Part II: Architectures: CNNs, RNNs, GNNs, and Transformers -- Part III: Generative Models: GANs, VAEs, and Normalizing Flows -- Part IV: Reinforcement Learning -- Part V: Applications -- Appendices -- Code -- Exercises -- Forward and Backpropagation -- Introduction -- Fully Connected Neural Network -- Forward Propagation -- Algorithm -- Example -- Logistic Regression -- Non-linear Activation Functions -- Sigmoid -- Hyperbolic Tangent -- Rectified Linear Unit -- Swish -- Softmax -- Loss Functions -- Backpropagation -- Differentiable Programming -- Computation Graph -- Example -- Logistic Regression -- Forward and Backpropagation -- Derivative of Non-linear Activation Functions -- Backpropagation Algorithm -- Example -- Chain Rule for Differentiation -- Two Functions in One Dimension -- Three Functions in One Dimension -- Two Functions in Higher Dimensions -- Gradient of Loss Function -- Gradient Descent -- Initialization and Normalization -- Software Libraries and Platforms -- Summary -- Optimization -- Introduction -- Overview -- Optimization Problem Classes -- Optimization Solution Methods -- Derivatives and Gradients -- Gradient Computation -- First-Order Methods -- Gradient Descent -- Step Size -- Mini-Batch Gradient Descent -- Stochastic Gradient Descent -- Adaptive Gradient Descent -- Momentum -- Adagrad -- Adam: Adaptive Moment Estimation -- Hypergradient Descent -- Second-Order Methods -- Newton's Method -- Second-Order Taylor Approximation -- Quasi-Newton Methods -- Evolution Strategies -- Summary -- Regularization -- Introduction -- Generalization -- Overfitting -- Cross Validation -- Bias and Variance -- Vector Norms -- Ridge Regression and Lasso -- Regularized Loss Functions -- Dropout Regularization -- Random Least Squares with Dropout -- Least Squares with Noise Input Distortion -- Data Augmentation -- Batch Normalization -- Summary -- Architectures -- Convolutional Neural Networks -- Introduction -- Representations Sharing Weights -- Convolution -- One-Dimensional Convolution -- Matrix Multiplication -- Two-Dimensional Convolution -- Separable Filters -- Properties -- Composition -- Three-Dimensional Convolution -- Layers -- Convolution -- Pooling -- Example -- Architectures -- Applications -- Summary -- Sequence Models -- Introduction -- Natural Language Models -- Bag of Words -- Feature Vector -- N-grams -- Markov Model -- State Machine -- Recurrent Neural Network -- Recurrent Neural Network -- Architectures -- Loss Function -- Deep RNN -- Bidirectional RNN -- Backpropagation Through Time -- Gated Recurrent Unit -- Update Gate -- Candidate Activation -- Reset Gate -- Function -- Long Short-Term Memory -- Forget Gate -- Input Gate -- Memory Cell -- Candidate Memory -- Output Gate -- Peephole Connections -- GRU vs. LSTM -- Sequence to Sequence -- Attention -- Embeddings -- Introduction to Transformers -- Summary -- Graph Neural Networks -- Introduction -- Definitions -- Embeddings -- Node Similarity -- Adjacency-based Similarity -- Multi-hop Similarity -- Overlap Similarity -- Random Walk Embedding -- Graph Neural Network Properties -- Neighborhood Aggregation in Graph Neural Networks -- Supervised Node Classification Using a GNN -- Graph Neural Network Variants -- Graph Convolution Network -- GraphSAGE -- Gated Graph Neural Networks -- Graph Attention Networks -- Message-Passing Networks -- Applications -- Software Libraries, Benchmarks, and Visualization -- Summary -- Transformers -- Introduction -- General-Purpose Transformer-Based Architectures -- BERT -- Self-Attention -- Multi-head Attention -- Transformer -- Positional Encoding -- Encoder -- Decoder -- Pre-training and Fine-tuning -- Transformer Models -- Autoencoding Transformers -- Auto-regressive Transformers -- Sequence-to-Sequence Transformers -- GPT-3 -- Vision Transformers -- Multi-modal Transformers -- Text and Code Transformers -- Summary -- Generative Models -- Generative Adversarial Networks -- Introduction -- Progress -- Game Theory -- Co-evolution -- Minimax Optimization -- Divergence between Distributions -- Least Squares GAN -- I-GAN -- Optimal Objective Value -- Gradient Descent Ascent -- Optimistic Gradient Descent Ascent -- GAN Training -- Discriminator Training -- Generator Training -- Alternating Discriminator-Generator Training -- GAN Losses -- Wasserstein GAN -- Unrolled GAN -- GAN Architectures -- Progressive GAN -- Deep Convolutional GAN -- Semi-Supervised GAN -- Conditional GAN -- Image-to-image Translation -- Cycle-Consistent GAN -- Registration GAN -- Self-Attention GAN and BigGAN -- Composition and Control with GANs -- Instance Conditioned GAN -- Evaluation -- Inception Score -- Frechet Inception Distance -- Applications -- Super Resolution and Restoration -- Style Synthesis -- Image Completion -- De-raining -- Map Synthesis -- Pose Synthesis -- Face Editing -- Training Data Generation -- Text-to-image Synthesis -- Medical Imaging -- Video Synthesis -- Motion Retargeting -- 3D Synthesis -- Graph Synthesis -- Autonomous Vehicles -- Text-to-Speech Synthesis -- Voice Conversion -- Music Synthesis -- Protein Design -- Natural Language Synthesis -- Cryptography -- Software Libraries, Benchmarks, and Visualization -- Summary -- Variational Autoencoders -- Introduction -- Variational Inference -- Reverse KL -- Score Gradient -- Reparameterization Gradient -- Forward KL -- Variational Autoencoder -- Autoencoder -- Variational Autoencoder -- Generative Flows -- Denoising Diffusion Probabilistic Model -- Forward Noising Process -- Reverse Generation by Sampling -- Geometric Variational Inference -- Moser Flow -- Riemannian Score-Based Generative Models -- Software Libraries -- Summary -- Reinforcement Learning -- Reinforcement Learning -- Introduction -- Multi-Armed Bandit -- Greedy Approach -- E-Greedy Approach -- Upper Confidence Bound -- State Machines -- Markov Processes -- Markov Decision Processes -- State of Environment and Agent -- Definitions -- Policy -- State Action Diagram -- State Value Function -- Action Value Function -- Reward -- Model -- Agent Types -- Problem Types -- Agent Representation of State -- Bellman Expectation Equation for State Value Function -- Bellman Expectation Equation for Action Value Function -- Optimal Policy -- Optimal Value Function -- Bellman Optimality Equation for K -- Bellman Optimality Equation for -- Planning by Dynamic Programming with a Known MDP -- Iterative Policy Evaluation -- Policy Iteration -- Infinite Horizon Value Iteration -- Reinforcement Learning -- Model-Based Reinforcement Learning -- Policy Search -- Monte Carlo Sampling -- Temporal Difference Sampling -- Q-Learnirig -- Sarsa -- On-Policy vs. Off-Policy Methods -- Sarsa(A) -- Maximum Entropy Reinforcement Learning -- Summary -- Deep Reinforcement Learning -- Introduction -- Function Approximation -- State Value Function Approximation -- Action Value Function Approximation -- Value-Based Methods -- Experience Replay Machine generated contents note: pt. I 1. 1.1. 1.2. 1.2.1. 1.2.2. 1.2.3. 1.2.4. 1.2.5. 1.2.6. 1.3. 1.4. 2. 2.1. 2.2. 2.3. 2.3.1. 2.3.2. 2.3.3. 2.4. 2.4.1. 2.4.2. 2.4.3. 2.4.4. 2.4.5. 2.5. 2.6. 2.7. 2.8. 2.8.1. 2.8.2. 2.8.3. 2.9. 2.10. 2.10.1. 2.11. 2.11.1. 2.11.2. 2.11.3. 2.12. 2.13. 2.14. 2.15. 2.16. 3. 3.1. 3.2. 3.2.1. 3.2.2. 3.2.3. 3.2.4. 3.3. 3.3.1. 3.3.2. 3.3.3. 3.3.4. 3.3.5. 3.3.6. 3.3.7. 3.3.8. 3.3.9. 3.4. 3.4.1. 3.4.2. 3.4.3. 3.5. 3.6. 4. 4.1. 4.2. 4.3. 4.4. 4.5. 4.6. 4.7. 4.8. 4.9. 4.9.1. 4.9.2. 4.10. 4.11. 4.12. pt. II 5. 5.1. 5.1.1. 5.2. 5.2.1. 5.2.2. 5.2.3. 5.2.4. 5.2.5. 5.2.6. 5.2.7. 5.3. 5.3.1. 5.3.2. 5.4. 5.5. 5.6. 5.7. 6. 6.1. 6.2. 6.2.1. 6.2.2. 6.2.3. 6.2.4. 6.2.5. 6.2.6. 6.3. 6.3.1. 6.3.2. 6.3.3. 6.3.4. 6.3.5. 6.4. 6.4.1. 6.4.2. 6.4.3. 6.4.4. 6.5. 6.5.1. 6.5.2. 6.5.3. 6.5.4. 6.5.5. 6.5.6. 6.5.7. 6.6. 6.7. 6.8. 6.9. 6.10. 7. 7.1. 7.2. 7.3. 7.4. 7.4.1. 7.4.2. 7.4.3. 7.4.4. 7.4.5. 7.5. 7.5.1. 7.6. 7.6.1. 7.6.2. 7.6.3. 7.6.4. 7.6.5. 7.7. 7.8. 7.9. 8. 8.1. 8.2. 8.2.1. 8.3. 8.4. 8.5. 8.5.1. 8.5.2. 8.5.3. 8.5.4. 8.6. 8.6.1. 8.6.2. 8.6.3. 8.6.4. 8.7. 8.8. 8.9. 8.10. pt. III 9. 9.1. 9.1.1. 9.1.2. 9.1.3. 9.2. 9.3. 9.3.1. 9.3.2. 9.4. 9.5. 9.6. 9.7. 9.7.1. 9.7.2. 9.7.3. 9.8. 9.8.1. 9.8.2. 9.9. 9.9.1. 9.9.2. 9.9.3. 9.9.4. 9.9.5. 9.9.6. 9.9.7. 9.9.8. 9.9.9. 9.9.10. 9.10. 9.10.1. 9.10.2. 9.11. 9.11.1. 9.11.2. 9.11.3. 9.11.4. 9.11.5. 9.11.6. 9.11.7. 9.11.8. 9.11.9. 9.11.10. 9.11.11. 9.11.12. 9.11.13. 9.11.14. 9.11.15. 9.11.16. 9.11.17. 9.11.18. 9.11.19. 9.11.20. 9.11.21. 9.12. 9.13. 10. 10.1. 10.2. 10.2.1. 10.2.2. 10.2.3. 10.2.4. 10.3. 10.3.1. 10.3.2. 10.4. 10.5. 10.5.1. 10.5.2. 10.6. 10.6.1. 10.6.2. 10.7. 10.8. pt. IV 11. 11.1. 11.2. 11.2.1. 11.2.2. 11.2.3. 11.3. 11.4. 11.5. 11.5.1. 11.6. 11.6.1. 11.6.2. 11.6.3. 11.6.4. 11.6.5. 11.6.6. 11.6.7. 11.6.8. 11.6.9. 11.6.10. 11.6.11. 11.7. 11.7.1. 11.7.2. 11.7.3. 11.8. 11.8.1. 11.8.2. 11.8.3. 11.9. 11.9.1. 11.9.2. 11.9.3. 11.9.4. 11.9.5. 11.9.6. 11.9.7. 11.9.8. 11.10. 11.11. 12. 12.1. 12.2. 12.2.1. 12.2.2. 12.3. 12.3.1. Neural Fitted Q-Iteration -- Deep Q-Network -- Target Network -- Algorithm -- Prioritized Replay -- Double DQN -- Dueling Networks -- Policy-Based Methods -- Policy Gradient -- REINFORCE -- Subtracting a Baseline -- Actor-Critic Methods -- Advantage Actor-Critic -- Asynchronous Advantage Actor-Critic -- Importance Sampling -- Surrogate Loss -- Natural Policy Gradient -- Trust Region Policy Optimization -- Proximal Policy Optimization -- Deep Deterministic Policy Gradient -- Model-Based Reinforcement Learning -- Monte Carlo Tree Search -- Expert Iteration and AlphaZero -- World Models -- Imitation Learning -- Exploration -- Sparse Rewards -- Summary -- Applications -- Applications -- Introduction -- Autonomous Vehicles -- Climate Change and Climate Monitoring -- Predicting Ocean Biogeochemistry -- Predicting Atlantic Multidecadal Variability -- Predicting Wildfire Growth -- Computer Vision -- Kinship Verification -- Image-to-3D -- Image2LEGO℗ʼ -- Imaging through Scattering Media -- Contrastive Language-Image Pre-training -- Speech and Audio Processing -- Audio Reverb Impulse Response Synthesis -- Voice Swapping -- Explainable Musical Phrase Completion -- Natural Language Processing -- Quantifying and Alleviating Distribution Shifts in Foundation Models on Review Classification -- Automated Machine Learning -- Education -- Learning-to-Learn STEM Courses -- Protcomics -- Protein Structure Prediction -- Protein Docking -- Combinatorial Optimization -- Problems over Graphs -- Learning Graph Algorithms as Single-Player Games -- Physics -- Pedestrian Wind Estimation in Urban Environments -- Fusion Plasma -- Summary -- Matrix Calculus -- Gradient Computations for Backpropagation -- Scalar by Vector -- Scalar by Matrix -- Vector by Vector -- Matrix by Scalar -- Gradient Computations for Optimization -- Dot Product by Vector -- Quadratic Form by Vector -- Scientific Writing and Reviewing Best Practices -- Writing Best Practices -- Introduction -- Methods -- Figures and Tables -- Results -- Abbreviations and Notation -- Reviewing Best Practices -- Ranking -- Rebuttal Note continued: 12.3.2. 12.3.3. 12.3.4. 12.3.5. 12.3.6. 12.3.7. 12.3.8. 12.4. 12.4.1. 12.4.2. 12.4.3. 12.5. 12.5.1. 12.5.2. 12.5.3. 12.5.4. 12.5.5. 12.5.6. 12.5.7. 12.5.8. 12.6. 12.6.1. 12.6.2. 12.6.3. 12.7. 12.8. 12.8.1. 12.9. pt. V 13. 13.1. 13.2. 13.3. 13.3.1. 13.3.2. 13.3.3. 13.4. 13.4.1. 13.4.2. 13.4.3. 13.4.4. 13.4.5. 13.5. 13.5.1. 13.5.2. 13.5.3. 13.6. 13.6.1. 13.7. 13.8. 13.8.1. 13.9. 13.9.1. 13.9.2. 13.10. 13.10.1. 13.10.2. 13.11. 13.11.1. 13.11.2. 13.12. Appendix A A.1. A.1.1. A.1.2. A.1.3. A.1.4. A.2. A.2.1. A.2.2. Appendix B B.1. B.1.1. B.1.2. B.1.3. B.1.4. B.1.5. B.2. B.2.1. B.2.2.
The Science of Deep Learning emerged from courses taught by the author that have provided thousands of students with training and experience for their academic studies, and prepared them for careers in deep learning, machine learning, and artificial intelligence in top companies in industry and academia. The book begins by covering the foundations of deep learning, followed by key deep learning architectures. Subsequent parts on generative models and reinforcement learning may be used as part of a deep learning course or as part of a course on each topic. The book includes state-of-the-art topics such as Transformers, graph neural networks, variational autoencoders, and deep reinforcement learning, with a broad range of applications. The appendices provide equations for computing gradients in backpropagation and optimization, and best practices in scientific writing and reviewing. The text presents an up-to-date guide to the field built upon clear visualizations using a unified notation and equations, lowering the barrier to entry for the reader. The accompanying website provides complementary code and hundreds of exercises with solutions