文件名称:
Mathematics for Machine Learning
开发工具:
文件大小: 8mb
下载次数: 0
上传时间: 2019-03-16
详细说明:Mathematics for Machine Learning,作者是Marc Peter Deisenroth, A Aldo Faisal, Cheng Soon Ong
这本书的书签应该是正确的Contents
List of illustrations
Foreword
Part i Mathematical Foundations
1 Introduction and motivation
11
1.1 Finding Words for Intuitions
12
1.2 Two Ways to Read this Book
1.3 Exercises and feedback
16
Linear Algebra
17
2.1 Systems of Linear equations
19
2.2 Matrices
22
2.3 Solving Systems of Linear Equations
27
2.4 Vector Spaces
35
2.5 Linear Independence
40
2.6 Basis and rank
44
2.7 Linear Mappings
8
2.8 Affine Spaces
61
2.9 Further Reading
63
Exercises
63
3
Analytic Geomety
70
3.1 Norms
71
3.2 Inner product
3.3 Lengths and Distances
3.4 Angles and orthogonality
76
3.5 Orthonormal basis
78
3.6 Orthogonal Complement
79
3.7 Inner Product of Functions
80
3.8 Orthogonal projections
81
3. 9 Rotations
91
3.10 Further Reading
94
Exercises
95
4 Matrix Decompositions
98
Draft(March 15, 2019)of"Mathematics for Machine Learning"(2019 by Marc Peter Deisenroth,
A. Aldo Faisal, and Cheng Soon Ong. To be published by Cambridge University Press. Please do
notpostordistributethisfilepleaselinktohttps://mml-book.com
4,1 Determinant and trace
99
4.2 Eigenvalues and eigenvectors
105
4.3 Cholesky decomposition
114
4.4 Eigendecomposition and Diagonalization
115
4.5 Singular Value Decomposition
119
4.6 Matrix Approximation
129
4.7 Matrix Phylogeny
134
4.8 Further Reading
135
Exercises
137
Vector Calcul
139
5.1 Differentiation of Univariate functions
141
5.2 Partial Differentiation and gradients
146
5.3 Gradients of vector - Valued functions
149
5.4 Gradients of matrices
155
5.5 Useful Identities for Computing Gradients
158
5.6 Backpropagation and Automatic Differentiation
159
5.7 Higher-order derivatives
164
5.8 Linearization and Multivariate Taylor Series
165
Further re
170
Exercises
170
6 Probability and distributions
172
6. 1 Construction of a Probability Space
172
6.2 Discrete and continuous probabilities
178
6. 3 Sum Rule, Product Rule and bayes'Theorem
183
6.4 Summary Statistics and Independence
186
6.5 Gaussian distribution
197
6.6 Conjugacy and the Exponential Family
204
6.7 Change of Variables/Inverse Transform
214
6. 8 Further readi
220
Exercises
7
Continuous Optimization
225
7.1 Optimization using Gradient Descent
227
7.2 Constrained Optimization and Lagrange multipliers
233
7. 3 Convex Optimization
236
7. 4 Further Reading
246
Exercises
Part II Central Machine Learning Problems
249
8 when models meet data
251
8.1 Empirical risk minimization
258
8. 2 Parameter estimation
265
8.3 Probabilistic Modeling and Inference
272
8.4 Directed Graphical Models
277
Draft(2019-03-15)of"mathematicsforMachineLearning".Feedbacktohttps://mml-book.com
Contents
8.5 Model selection
283
Linear regression
289
9.1 Problem Formulation
291
9.2 Parameter Estimation
292
9.3 Bayesian Linear Regression
303
9.4 Maximum Likelihood as Orthogonal Projection
313
9.5 Further reading
315
10 Dimensionality Reduction with Principal Component Analysis 317
1 Problem Setting
318
10.2 Maximum Variance Perspective
320
10.3 Projection Perspective
325
10.4 Eigenvector Computation and Low-Rank Approximations
333
10.5 PCA in High dimensions
335
10.6 Key Steps of PCa in Practice
336
10.7 Latent Variable Perspective
339
10.8 Further Reading
343
11 Density Estimation with Gaussian Mixture Models
348
11.1 Gaussian Mixture model
349
11.2 Parameter Learning via Maximum Likelihood
350
11.3 EM Algorithm
360
11.4 Latent Variable Perspective
363
11.5 Further Reading
368
12 Classification with Support Vector Machines
370
12.1 Separating Hyperplanes
372
12.2 Primal Support Vector Machine
374
12. 3 Dual Support Vector Machine
383
12.4 Kernels
388
12.5 Numerical solution
390
12.6 Further Reading
392
References
395
Index
407
C2019 M. P. Deisenroth, AA Faisal, C S. Ong. To be published by Cambridge University Press.
List of Figures
1.1 The foundations and four pillars of machine learning
14
2.1 Different types of vectors.
2.2 Linear algebra mind map
19
2.3 Geometric interpretation of systems of linear equations
2.4 A matrix can be represented as a long vector.
22
2.5 Matrix multiplication
23
2.6 Examples of subspaces
39
2.7 Geographic example of linearly dependent vectors
2.8 Two different coordinate systems
50
2.9 Different coordinate representations of a vector
2.10 Three examples of linear transformations
52
2.11 Basis change
56
2. 12 Kernel and image of a linear mapping p: V,w
59
2. 13 Lines are affine subspaces
62
3.1 Analytic geometry mind map
3.2 Illustration of different norms
3.3 Triangle inequality.
3.4
76
3.5 Angle between two vectors
6 Angle between two vectors
3.7 A plane can be described by its nornal vector.
80
3.9 Orthogonal projection
82
3.10 Examples of projections onto one-dimensional subspaces
83
3.11 Projection onto a two-dimensional subspace
85
3.12 Gram-Schmidt orthogonalization
89
3. 13 Projection onto an affine space
3.14 Rotation
3.15 Robotic arm
91
3. 16 Rotation of the standard basis in R by an angle g
92
3. 17 Rotation in three dimensions
93
4.1 Matrix decomposition mind map
99
4. 2 The area of a parallelogram computed using the determinant
101
4.3 The volume of a parallelepiped computed using the determinant. 101
4. 4 Determinants and eigenspaces
109
4.5 C elegans neural network
110
4.6 Geometric interpretation of eigenvalues
113
4.7 Eigendecomposition as sequential transformations
Draft(March 15, 2019)of"Mathematics for Machine Learning"(2019 by Marc Peter Deisenroth
A. Aldo Faisal, and Cheng Soon Ong. To be published by Cambridge University Press. Please do
notpostordistributethisfilepleaselinktohttps://mml-book.com
List of figures
4.8 Intuition behind SvD as sequential transformations
4.9 SVD and mapping of vectors
122
4.10 SVd decomposition for movie ratings
127
4.11 Image processing with the SVD
130
4.12 Image reconstruction with the Svd
131
4.13 Phylogeny of matrices in machine learning
134
5.1 Different problems for which we need vector calculus
139
5.2 Vector calculus mindmap
140
5.3 Difference quotient
141
5.4 Taylor polynomials.
144
5.5 Jacobian determinant
151
5.6 Dimensionality of partial derivatives
152
7 Gradient computation of a matrix with respect to a vector
155
5. 8 Forward pass in a multi-layer neural network
160
5.9 Backward pass in a multi-layer neural network.
161
5.10 Data flow graph.
161
5.11 Computation graph
162
5.12 Linear approximation of a function
165
5.13 Visualizing outer products
166
6.1 Probability mind map
173
6.2 Visualization of a discrete bivariate probability mass function. 179
6.3 Examples of discrete and continuous uniform distributions
182
6.4 Mean. Mode and Median
189
6.5 Identical means and variances but different covariances
191
6.6 Geometry of random variables
196
.7 Gaussian distribution of two random variables a, y
197
6.8 Gaussian distributions overlaid with 100 samples
198
6.9 Bivariate Gaussian with conditional and marginal
200
6.10 Examples of the Binomial distribution
206
6.11 Examples of the Beta distribution for different values of a and 6. 207
7.1 Optimization mind map
226
7.2 Example objective function
227
7. 3 Gradient descent on a two-dimensional quadratic surface
229
7.4 Illustration of constrained optimization
233
7.5 Example of a convex function
236
7.6 Example of a convex set.
236
7.7 Example of a nonconvex set
237
7. 8 The negative entropy and its tangent.
238
7. 9 Illustration of a linear program.
240
8.1 Toy data for linear regression
254
8.2 Example function and its prediction
255
8.3 Example function and its uncertainty
256
8.4 K-fold cross validation
263
8.5 Maximum likelihood estimate
8.6 Maximum a posteriori estimation
8.7 Model fitting
270
8.8 Fitting of different model classes
271
8.9 Examples of directed graphical models
278
C2019 M. P. Deisenroth, AA Faisal, C S. Ong. To be published by Cambridge University Press.
8.10 Graphical models for a repeated Bernoulli experiment
280
8. 11 D-separation example
281
8. 12 Three types of grap
phical models
282
8. 13 Nested cross validation
28
8. 14 Bayesian inference embodies Occam's razor
285
8.15 Hierarchical generative process in Bayesian model selection
286
9.1 Regression
9
9.2 Linear regression example
292
9.3 Probabilistic graphical model for linear regression
292
9.4 Polynomial regression
9.5 Maximum likelihood fits for different polynomial degrees M. 299
9.6 Training and test error.
300
Polynomial regression: Maximum likelihood and MAP estimates. 302
9.8 Graphical model for Bayesian linear regressior
304
9. 9 Prior over functions
305
9.10 Bayesian linear regression and posterior over functions
310
9.11 Bayesian linear regression
311
9.12 Geometric interpretation of least squares
313
10.1 Illustration: Dimensionality reduction
317
10.2 Graphical illustration of PCA.
10. 3 Examples of handwritten digits from the mnist dataset
320
10.4 Illustration of the maximum variance perspective
321
10.5 Properties of the training data of mnist'8'
324
10.6 Illustration of the projection approach
325
10.7 Simplified projection setting
326
10.8 Optimal projection
328
10.9 Orthogonal projection and displacement vectors
330
10. 10 Embedding of mnist digits
332
10.11 Steps of PCA
10. 12 Effect of the number of principal components on reconstruction. 338
10. 13 Squared reconstruction error versus the number of components. 339
10. 14 PPCA graphical model
340
10. 15 Generating new MNiST digits
341
10.16 PCA as an auto-encoder
344
11.1 Dataset that cannot be represented by a gaussian
348
11.2 Gaussian mixture model
350
11.3 Initial setting: GMM with three mixture components
350
11.4 Update of the mean parameter of mixture component in a gmm. 355
11.5 Effect of updating the mean values in a gmn
355
11.6 Effect of updating the variances in a GMm
358
11.7 Effect of updating the mixture weights in a gmm
360
11.8 EM algorithm applied to the GMM from Figure 11.2.
11.9 Illustration of the EM algorithm
362
11.10 GMM fit and responsibilities when EM converges
36
11.11 Graphical model for a GMM with a single data point.
11 12 Graphical model for a gmM with N data points
366
11 13 Histogram and kernel density estimation
369
12. 1 Example 2d data for classification
371
Draft(2019-03-15)of"mathematicsforMachineLearning".Feedbacktohttps://mml-book.com
List of figures
12.2 Equation of a separating hyperplane
373
12.3 Possible separating hyperplanes
374
12.4 Vector addition to express distance to hyperplane
12.5 Derivation of the margin:r
376
12.6 Linearly separable and non linearly separable data
379
12.7 Soft margin SVM allows examples to be within the margin
380
12.8 The hinge loss is a convex upper bound of zero-one loss
382
12.9 Convex hulls
386
12.10 SVM with different kernels
389
C2019 M. P. Deisenroth, AA Faisal, C S. Ong. To be published by Cambridge University Press.
(系统自动生成,下载前可以参看下载内容)
下载文件列表
相关说明
- 本站资源为会员上传分享交流与学习,如有侵犯您的权益,请联系我们删除.
- 本站是交换下载平台,提供交流渠道,下载内容来自于网络,除下载问题外,其它问题请自行百度。
- 本站已设置防盗链,请勿用迅雷、QQ旋风等多线程下载软件下载资源,下载后用WinRAR最新版进行解压.
- 如果您发现内容无法下载,请稍后再次尝试;或者到消费记录里找到下载记录反馈给我们.
- 下载后发现下载的内容跟说明不相乎,请到消费记录里找到下载记录反馈给我们,经确认后退回积分.
- 如下载前有疑问,可以通过点击"提供者"的名字,查看对方的联系方式,联系对方咨询.