文件名称:
矩阵微积分 Matrix calculus
开发工具:
文件大小: 390kb
下载次数: 0
上传时间: 2019-03-03
详细说明:矩阵微积分 Matrix calculus,附录,对矩阵和向量求导D 1. GRADIENT, DIRECTIONAL DERIVATIVE, TAYLOR SERIES
551
VxI
2(X)△|vax
(X)Vbx2|∈K×L×KxL
dg(x
V
6g(X)
aa(X)
OKI
aXK2
OXKL
(1962
VV
X
五×1×L×H×L
R
9(X)
where the gradient V is with respect to matrix X
Gradient of vector-valued function g(X): IAXL-IR" on matrix domain is acubix
[x(:)91(X)
(X)
9N
V9(x)4 VX( 2)91(X) VX( 2)92(x)
X(:,2)9N
x(D)91(X)W(,92(X)…又x(,L)9N(x)]
(1963)
KxN
9
N(X)]∈
while the second-order gradient has a five-dimensional representation
VVc, 1)91(X)VV
X(;,1)92(X
)…VVx(1)9N(X)
V2g(X)全
Vx(:,2)91(X)Vx(,2)y2(X)…VVx(,2)y(X)
VVx(, L)91(X)VVx(, L)92(X) .Vx()N(X)
(1964)
=[V2g1(X)V2g2(X)…V2gN(X)]∈ iR
The gradient of matrix-valued function 9(X): RkXL-RMXN on matrix domain has
a four-dimensional representation called quarti (fourth-order tensor)
911(X) V912(X
VyIN(X)
vg(x)△Nm(x)Vg2(X)
V92N(X
∈R
M× NXKXL
(1965
VgMI(X) V9M2 X).. V9MN(X)
while the secolld-order gradient has d six-diinlensioIlial representatiOn
1(X)
X
V 9IN(X)
y(XV291(x)V2(x)…V292(x
∈ RRMXNXAXLXN X(1966
V 9MI(X 9M2(X)
and so on
APPENDIX D. MATRIX CALCULUS
D.1.2 Product rules for matrix-functions
Given dimensionally compatible matrix-valued functions of matrix variable f(X)
and
((x)9(X))=Vx()g+vx(9)f
(1967)
while57,§8.3358
Vxt((x9(x)=(r(f(xy2)+t(9(x)(z)
These expressions implicitly apply as well to scalar-, vector-, or matrix-valued functions
of scalar, vector, or matrix arguments
D 1. 2.0.1 Example. Cubin
Suppose f(X):R2×2→R2- xa and g(X):R
2×2
R4—Xb. We wish to find
Vx(f(X) 9(X))
X-b
(1
using the product rule. Formula(1967) calls for
Vxax b=Vx(Xa)Xb+ Vx(BxA
(1970)
Consider the first of the two terms
x()g=x(Xa)Xb
V(XTa)1 V(Xa)2JXb
(1971)
The gradient of XTa forms a cubix in R2x2X2; a.k. a, third-order tensor
Xa
a(x a)2
(197
OX
Xb)
V(X a)Xb
∈R
2×1×2
0(Xa)2
ax
Xb
9(xa)1
a(x a)2
OX
0X22
Because gradient of the product(1969) requires total change with respect to change in
each entry of matrix X, the Xb vector must make an inner product with each vector in
that sccond dimension of the cubix indicated by dotted linc scgmcnts
0
a1
Vx(X aXb
b1X11+b2X12
6IX21+ b2X
∈R2×1x2
(1973
(在1X11+b2X12)a1(b1X21+b2X2)
R2×2
(b1X11+b2X12)a2(b1X21+b2X22)
abUT
where the cubix appears as a complete 2 x2 matrix. In like manner for the second
term Vx(g)f
D 1. GRADIENT, DIRECTIONAL DERIVATIVE, TAYLOR SERIES
553
0
VYiXbX
X11a1+X21a
X
∈R
2×1×2
0
(1974
XTabT∈mR2×2
The solution
X b=abX+Xab
(1975)
can be found from Table D.2.1 or verified using(1968)
D.1.2.1 Kronecker product
a partial remedy for venturing into hyperdimensional matrix representations, such as
the cubix or quarti, is to first vectorize matrices as in 39). This device gives rise
to the Kronecker product of matrices a.k.a, tensor product (kron( in Matlab
Although its definition sees reversal in the literature, [ 369, $2.1 Kronecker product is not
commutative(B⑧A≠A⑧B). We adopt the definition:forA∈Rmx"andB∈RPXn
BilA B12A
Biga
B21A b22A
Ba
B⑧A
∈
(1
BoIA B2A…BngA
for which A1=18A-A (real unity acts like Identity
One advantage to vectorization is existence of the traditional two-dimensional matrix
representation(second-order tensor) for the second-order gradient of a real function with
respect to a vectorized matrix. From SA. 1. 1 no 36(SD. 2. 1) for square A, BERX, for
example[194,§5.2[14,、3]
Vvecx tr(AXBX )=Vecx vec(X )(bA)vec X= BOA+B8AER A(1977)
To disadvantage is a large new but known set of algebraic rules (SA.1. 1) and the fact
that its mere use does not generally guarantee two-dimensional matrix representation of
gradients.
Another application of the Kronecker product is to reverse order of appearance iI
a matrix product: Suppose we wish to weight the columns of a matrix SERX/, for
example, by respective entries wi from the main diagonal in
N
(1978)
0
UN
A conventional means for accomplishing column weighting is to multiply s by diagonal
matrix W on the right side
01
0
SW=S
S(:,1)1…S(:,N)Nx]∈RMxN(19
To reverse product order such that diagonal matrix W instead appears to the left of s
forI∈S(Law)
S(:,1)0
0
SW=(6(W)8I
05(:,2)
M×N
(1980)
0
554
APPENDIX D. MATRIX CALCULUS
To instead weight the rows of S via diagonal matrix WES, for IES/
0
WS=
2
(6(W)⑧I)∈RMxN
(1981)
0 SM
D.1.2.2 Hadamard product
For any matrices of like size, S,YER, Hadamard, s product o denotes simple
multiplication of corresponding entries(. in Matlab). It is possible to convert Hadamard
product into a standard product of matrices
SoY=[(Y(:,1)…o(Y(:,N)]
0
∈R4xX(1982)
0
0S(:;N)
In the special case that S=s and Y=y are vectors in RM
soy=8(s)y
(198
6y=y
(1984)
D 1.3 Chain rules for composite matrix-functions
Given dimensionally compatible matrix-valued functions of matrix variable f(X) and
9(X)393,.15.7]
Vx g(f(X))-VxfVf9
(1985
ⅴ9(f(X))-Vx( Vx f v,g)
2
f Vfg+ Vxf vg Vx.f
(198
D 1.3.1 Two arguments
9(f(x),b(X)2)=Vx
9
(1987)
D.1.3.1.1 Example. Chain rule for two arguments
4,§1.1
g(f(n)2,M(n))=((x)+1(x)A((x)+h(x)
(198
f()
h(a)
10
(f(a), h(
0E
4+A1)(f+h)+
01(4+4)(f+)(90
g(f(a), h(a))
1+
A-A
1
E
(1991)
lim V2 g(f(a), h(a))=(A+
(1992
from table d.2.1
These foregoing formulae remain correct when gradient produces hyperdimensional
representation
D 1. GRADIENT, DIRECTIONAL DERIVATIVE, TAYLOR SERIES
555
D. 1.4 First directional derivative
Assume that a differentiable function 9(X): RKXLRMXN has continuous first-and
simple expressions for the first and second directional derivatives in direction YERKX/ R
second-order gradients Vg and V g over dom g which is an open set. W
respectively,d∈ RAtxN and dg2∈2MxN
Assuming that the limit exists, we may state the partial derivative of the mn
th entry
of g with respect to kl th entry of X
agm (X
lim mn(X+Ateke)-gmn(X)
∈R
1993
△t→0
t
where ek is the kth standard basis vector in RK while er is the /th standard basis vector
IR. Total number of partial derivatives equals KlMN while the gradient is defined in
their terms: mn n entry of the gradient is
X11
X12
aX1
agm(X) agm (X)
n(X)
∈R
99
agm. n
1
while the gradient is a quarti
011
V9IN(X)
V921(X) V922 (X)
V92N(X)
∈ IRMXNXK×L
(1995)
VmI(X) VyM2(X)
VUMN(X)
By simply rotating our perspective of a four-dimensional representation of gradient matrix
we find one of three useful transpositions of this quarti(connoted 1)
aq(x
X10
ag(X) ag(X)
0g(X)
9(X)=|0X21 0X2
oX
∈R
K× LXMXA
(1996)
ag(X)
OXKI OXK2
When a limit for AtER exists, it is easy to show by substitution of variables in(1993)
amn(X)
gmn(x+ At Yklekel)-gmn(X)
∈R
OX
(1997)
△t→0
which may be interpreted as the change in gmn at x when the change in Xkl is equal
to hl the kll entry of any YERKXL Because the total change in gmn(x)due to y is
the sum of change with respect to each and every Xkl, the mnth entry of the directional
derivative is the corresponding total differential [393, $15.8
556
APPENDIX D. MATRIX CALCULUS
(X)
rnT
dX→Y
OX
Ykl= tr(gmn(X)r)
(1998
k,↓
gmn(X+ At ekel)-gmn(x)
li
△t
(1999)
imn9mn(X+△tY)-9mn(x
△t
(2000
gmn(X+tY
(2001
where tER. Assuming finite y, equation(2000) is called the Gateaux differential
43, App. A5230, SD. 2.][405, $5. 28 whose existence is implied by existence of the
Frechet diferential (the sun in(1998)).285, $7. 2 Each Illay be understood as the change
in gmn at. X when the change in X is equal in magnitude and direction to y. D, 2Hence
the directional derivative
dg11(X) d912(X)
dyIN(X)
dg(x)dg21(X)d922(X)
dg2N(X)
∈R4xN
dgMI()dgM(X)
dgM(X)
dX→Y
tr(vgu(x)Y) tr(V912(X)r).. tr(VgIN(XY
(V921(X)y)t(Vg2(X)y)
(V92N(X)Y)
(2002)
tr(gmi(xy tr(Vgm(X)r)..tr(VgMN(X)r)
01(xY1k,28
a912(X)
aXI
0921(X)v
k, I
X
a9m2(X)
Kl
from which it follows
dg(X)
OX
Yet for all X Edom g. any YERKXL
and some open interval of tER
(X+tr)=9(X)+tdg(X)+ o(
(2004)
which is the first-order multidimensional Taylor series expansion about X. 393, 818.4
[177,82.3.4 Differentiation with respect to t and subsequent t-zeroing isolates the second
term of expansion. Thus differentiating and zeroing g(X+ty in t is an operation
equivalent to individually differentiating and zeroing every entry gmn(X+tY)as iI
(2001). So the directional derivative of g(): RA ydRmXN in any direction YERKXL
evaluated at X e dong becomes
g(x)=元9(X+ty)
2005
D. Although Y is a matrix, we may regard it as a vector in RKL
D 1. GRADIENT, DIRECTIONAL DERIVATIVE, TAYLOR SERIES
557
f(a+ty
(a,f(a))
f (a)
07
Figure 189: Strictly convex quadratic bowl in RXR; f(a)=sTx: R2+R
6788
on some open disc in R
2
Plane slice a7l is perpendicular to function domain. Slice
intersection wit, h domain connotes bidirectional vector y. Slope of tangent, line T a.
point(a, f(a)) is value of directional derivative V f(a)y(2030) at a in slice direction y
Negative gradient-Vrf(a)ER- is direction of steepest descent. 393, $15.6[177 When
vector vER entry v3 is half directional derivative in gradient direction at a and when
Va f(a), then -v points directly toward bowl bottom
315,82.1,$5.4.51[36, 86.3. 1]which is simplest. In case of a real function g(X): RAXL-R
dg(X)=tr(V9(X)r
2027)
In case 9(X):R→R
dg(x)=v9(x)'y
Unlike gradient, directional derivative does not expand dimension; directional
derivative(2005) retains the dimensions of 9. The derivative with respect to t makes
the directional derivative resemble ordinary calculus (SD. 2);e., when g(X)is linear,
dg(X)=9(y).[285,72
D.1.4.1 Interpret ation of directional derivative
In the case of any differentiable real function 9(X): RAXL-R, the directional derivative
of g(X)at X in any direction Y yields the slope of g along the line X+tY tERI
through its domain evaluated at t=0. For higher-dimensional functions, by (2002), this
slope interpretation can bc applicd to cach cntry of thc directional dcrivativc
Figure 189, for example, shows a plane slice of a real convex bow l-shaped function
f( )along a line fa+ty tER through its domain. The slice reveals a one-dimensiona
real function of t;f(a+tg). The directional derivative at a=& in direction y is the
slope of f(a+ty) with respect to t at t=0. In the case of a real function having
vector argument h(X):R-R, its directional derivative in the normalized direction of
its gradient is the gradient magnitude.(2030)For a real function of real variable, the
directional derivative evaluated at any point in the function domain is just the slope of
that function there scaled by the real direction(confer $3.6
558
APPENDIX D. MATRIX CALCULUS
Directional derivative generalizes our one-dimensional notion of derivative to a
multidimensional domain when direction y coincides with a, member of the standard
Cartesian basis eke,(63), then a single partial derivative ag(X)/aXkl is obtained from
directional derivative(2003); such is each entry of gradient V9(X) in equalities(2027)
and(2030), for example
D. 1. 4.1.1 Theorem. Directional derivative optimality condition
285,§7.4
Suppose f(X):RAX+R is minimized on convex set CCRAX by X*, and the
directional derivative of f exists there. Then for all XE C
df(x)≥0
2006)
◇
D.1. 4.1.2 Example. Simple bowl
Bowl function(Figure 189)
R
K
R
2007
has function offset -bER, axis of revolution at =a, and positive definite hessian
(1956) everywhere in its doMain (all open h isc in R);id est, strictly convex
quadratic f(a) has unique global minimum equal to -b at x=a. A vector -v based
anywhere in dom f x r pointing toward the unique bowl-bottom is spccificd
(2)+b/∈ak
a
R
2008
Such a vector is
Vr: f(a)
Vaf(a)
(2009)
df(c)
since the gradient is
Vx/(x)=2(x-a)
(2010)
and the directional derivative in direction of the gradient is(2030)
→Vf(x)
df (a)= Vrf(ar)vf(a)=4(a-a(a-a)=4(f(r)+b)
(2011)
D. 1.5 Second directional derivative
By similar argument, it so happens: thc sccond directional dcrivativc is equally simplc
Given g(x): RKXL-RMXN on open domain
-gmn(X
(X
agm(X aVgmn(X)
a-gmn(X) ag
agm(X)
OXkL=|DXR0X
aXkLOX22
aXRLOX2L
aXiL
∈RkxL
2012
a-g
-gmn(X)
OXkLOXKI OXkLOXK2
(系统自动生成,下载前可以参看下载内容)
下载文件列表
相关说明
- 本站资源为会员上传分享交流与学习,如有侵犯您的权益,请联系我们删除.
- 本站是交换下载平台,提供交流渠道,下载内容来自于网络,除下载问题外,其它问题请自行百度。
- 本站已设置防盗链,请勿用迅雷、QQ旋风等多线程下载软件下载资源,下载后用WinRAR最新版进行解压.
- 如果您发现内容无法下载,请稍后再次尝试;或者到消费记录里找到下载记录反馈给我们.
- 下载后发现下载的内容跟说明不相乎,请到消费记录里找到下载记录反馈给我们,经确认后退回积分.
- 如下载前有疑问,可以通过点击"提供者"的名字,查看对方的联系方式,联系对方咨询.