Acknowledgments Foreword CHAPTER 1 Introduction CHAPTER 2 Preliminaries of Information Theory and Neural Networks 2.1 Elements of Information Theory 2.1.1 Entropy and Information 2.1.2 Joint Entropy and Conditional Entropy 2.1.3 Kullback-Leibler Entropy 2.1.4 Mutual Information 2.1.5 Differential Entropy, Relative Entropy and Mutual Information 2.1.6 Chain Rules 2.1.7 Fundamental Information Theory Inequalities 2.1.8 Coding Theory 2.2 Elements of the Theory of Neural Networks 2.2.1 Neural Network Modeling 2.2.2 Neural Architectures 2.2.3 Learning Paradigms 2.2.4 Feedforward Networks: Baekpropagation 2.2.5 Stochastic Recurrent Networks: Boltzmann Machine 2.2.6 Unsupervised Competitive Learning 2.2.7 Biological Learning Rules PART Ⅰ: Unsupervised Learning CHAPTER 3 Linear Feature Extraction: Infomax Principle 3.1 Principal Component Analysis: Statistical Approach 3.1.1 PCA and Diagonalization of the Covariance Matrix 3.1.2 PCA and Optimal Reconstruction 3.1.3 Neural Network Algorithms and PCA 3.2 Information Theoretic Approach: Infomax 3.2.1 Minimization of Information Loss Principle and Infomax Principle 3.2.2 Upper Bound of Information Loss 3.2.3 Information Capacity as a Lyapunov Function of the General Stochastic Approximation CHAPTER 4 Independent Component Analysis: General Formulation and Linear Case 4.1 ICA-Definition 4.2 General Criteria for ICA 4.2.1 Cumulant Expansion Based Criterion for ICA 4.2.2 Mutual Information as Criterion for ICA 4.3 Linear ICA 4.4 Gaussian Input Distribution and Linear ICA 4.4.1 Networks With Anti-Symmetric Lateral Connections 4.4.2 Networks With Symmetric Lateral Connections 4.4.3 Examples of Learning with Symmetric and Anti-Symmetric Networks 4.5 Learning in Gaussian ICA with Rotation Matrices: PCA 4.5.1 Relationship Between PCA and ICA in Gaussian Input Case 4.5.2 Linear Gaussian ICA and the Output Dimension Reduction 4.6 Linear ICA in Arbitrary Input Distribution 4.6.1 Some Properties of Cumulants at the Output of a Linear Transformation 4.6.2 The Edgeworth Expansion Cri 5.1 Infomax Principle for Boltzmann Machines 5.1.1 Learning Model 5.1.2 Examples of Infomax Principle in Boltzmann Machine 5.2 Redundancy Minimization and Infomax for the Boltzmann Machine 5.2.1 Learning Model 5.2.2 Numerical Complexity of the Learning Rule 5.2.3 Factorial Learning Experiments 5.2.4 Receptive Fields Formation from a Retina 5.3 Appendix CHAPTER 6 Nonfinear Feature Extraction: Deterministic Neural Networks 6.1 Redundancy Reduction by Triangular Volume Conserving Architectures 6.1.1 Networks with Linear, Sigraoidal and Higher Order Activation Functions 6.1.2 Simulations and Results 6.2 Unsupervised Modeling of Chaotic Time Series 6.2.1 Dynamical System Modeling 6.3 Redundancy Reduction by General Symplectic Architectures 6.3.1 General Entropy Preserving Nonlinear Maps 6.3.2 Optimizing a Parameterized Symplectic Map 6.3.3 Density Estimation and Novelty Detection 6.4 Example: Theory of Early Vision 6.4.1 Theoretical Background 6.4.2 Retina Model PART Ⅱ: Supervised Learning CHAPTER 7 Supervised Learning and Statistical Estimation 7.1 Statistical Parameter Estimation- Basic Definitions 7.1.1 Cramer-Rao Inequality for Unbiased Estimators 7.2 Maximum Likelihood Estimators 7.2.1 Maximum Likelihood and the Information Measure 7.3 Maximum A Posteriori Estimation 7.4 Extensions of MLE to Include Model Selection 7.4.1 Akaike's Information Theoretic Criterion (A1C) 7.4.2 Minimal Description Length and Stochastic Complexity 7.5 Generalization and Learning on the Same Data Set CHAPTER 8 Statistical Physics Theory of Supervised Learning and GeneraliTation 8.1 Statistical Mechanics Theory of Supervised Learning 8.1.1 Maximum Entropy Principle 8.1.2 Probability Inference with an Ensemble of Networks 8.1.3 Information Gain and Complexity Analysis 8.2 Learning with Higher Order Neural Networks 8.2.1 Partition Function Evaluation 8.2.2 Information Gain in Polynomial Networks 8.2.3 Numerical Experiments 8.3 Learning with General Feedforward Neural Networks 8.3.1 Partition Function Approximation 8.3.2 Numerical E 9.2 Composite Models as Gaussian Mixtures CHAPTER 10 Information Theory Based Regularizing Methods 10.1 Theoretical Framework 10.1.1 Network Complexity Regulation 10.1.2 Network Architecture and Learning Paradigm 10.1.3 Applications of the Mutual Information Based Penalty Term 10.2 Regularization in Stochastic Potts Neural Network 10.2.1 Neural Network Architecture 10.2.2 Simulations References Index