矩阵微积分 Matrix calculus，附录，对矩阵和向量求导D 1. GRADIENT, DI

文件名称: 矩阵微积分 Matrix calculus

所属分类: 讲义

开发工具:

文件大小: 390kb

下载次数: 0

上传时间: 2019-03-03

提供者: qq_27******

下载 (390kb)

不能下载？报告错误

详细说明：矩阵微积分 Matrix calculus，附录，对矩阵和向量求导D 1. GRADIENT, DIRECTIONAL DERIVATIVE, TAYLOR SERIES 551 VxI 2(X)△|vax (X)Vbx2|∈K×L×KxL dg(x V 6g(X) aa(X) OKI aXK2 OXKL (1962 VV X 五×1×L×H×L R 9(X) where the gradient V is with respect to matrix X Gradient of vector-valued function g(X): IAXL-IR" on matrix domain is acubix [x(:)91(X) (X) 9N V9(x)4 VX( 2)91(X) VX( 2)92(x) X(:,2)9N x(D)91(X)W(,92(X)…又x(,L)9N(x)] (1963) KxN 9 N(X)]∈ while the second-order gradient has a five-dimensional representation VVc, 1)91(X)VV X(;,1)92(X )…VVx(1)9N(X) V2g(X)全 Vx(:,2)91(X)Vx(,2)y2(X)…VVx(,2)y(X) VVx(, L)91(X)VVx(, L)92(X) .Vx()N(X) (1964) =[V2g1(X)V2g2(X)…V2gN(X)]∈ iR The gradient of matrix-valued function 9(X): RkXL-RMXN on matrix domain has a four-dimensional representation called quarti (fourth-order tensor) 911(X) V912(X VyIN(X) vg(x)△Nm(x)Vg2(X) V92N(X ∈R M× NXKXL (1965 VgMI(X) V9M2 X).. V9MN(X) while the secolld-order gradient has d six-diinlensioIlial representatiOn 1(X) X V 9IN(X) y(XV291(x)V2(x)…V292(x ∈ RRMXNXAXLXN X(1966 V 9MI(X 9M2(X) and so on APPENDIX D. MATRIX CALCULUS D.1.2 Product rules for matrix-functions Given dimensionally compatible matrix-valued functions of matrix variable f(X) and ((x)9(X))=Vx()g+vx(9)f (1967) while57,§8.3358 Vxt((x9(x)=(r(f(xy2)+t(9(x)(z) These expressions implicitly apply as well to scalar-, vector-, or matrix-valued functions of scalar, vector, or matrix arguments D 1. 2.0.1 Example. Cubin Suppose f(X):R2×2→R2- xa and g(X):R 2×2 R4—Xb. We wish to find Vx(f(X) 9(X)) X-b (1 using the product rule. Formula(1967) calls for Vxax b=Vx(Xa)Xb+ Vx(BxA (1970) Consider the first of the two terms x()g=x(Xa)Xb V(XTa)1 V(Xa)2JXb (1971) The gradient of XTa forms a cubix in R2x2X2; a.k. a, third-order tensor Xa a(x a)2 (197 OX Xb) V(X a)Xb ∈R 2×1×2 0(Xa)2 ax Xb 9(xa)1 a(x a)2 OX 0X22 Because gradient of the product(1969) requires total change with respect to change in each entry of matrix X, the Xb vector must make an inner product with each vector in that sccond dimension of the cubix indicated by dotted linc scgmcnts 0 a1 Vx(X aXb b1X11+b2X12 6IX21+ b2X ∈R2×1x2 (1973 (在1X11+b2X12)a1(b1X21+b2X2) R2×2 (b1X11+b2X12)a2(b1X21+b2X22) abUT where the cubix appears as a complete 2 x2 matrix. In like manner for the second term Vx(g)f D 1. GRADIENT, DIRECTIONAL DERIVATIVE, TAYLOR SERIES 553 0 VYiXbX X11a1+X21a X ∈R 2×1×2 0 (1974 XTabT∈mR2×2 The solution X b=abX+Xab (1975) can be found from Table D.2.1 or verified using(1968) D.1.2.1 Kronecker product a partial remedy for venturing into hyperdimensional matrix representations, such as the cubix or quarti, is to first vectorize matrices as in 39). This device gives rise to the Kronecker product of matrices a.k.a, tensor product (kron( in Matlab Although its definition sees reversal in the literature, [ 369, $2.1 Kronecker product is not commutative(B⑧A≠A⑧B). We adopt the definition:forA∈Rmx"andB∈RPXn BilA B12A Biga B21A b22A Ba B⑧A ∈ (1 BoIA B2A…BngA for which A1=18A-A (real unity acts like Identity One advantage to vectorization is existence of the traditional two-dimensional matrix representation(second-order tensor) for the second-order gradient of a real function with respect to a vectorized matrix. From SA. 1. 1 no 36(SD. 2. 1) for square A, BERX, for example[194,§5.2[14,、3] Vvecx tr(AXBX )=Vecx vec(X )(bA)vec X= BOA+B8AER A(1977) To disadvantage is a large new but known set of algebraic rules (SA.1. 1) and the fact that its mere use does not generally guarantee two-dimensional matrix representation of gradients. Another application of the Kronecker product is to reverse order of appearance iI a matrix product: Suppose we wish to weight the columns of a matrix SERX/, for example, by respective entries wi from the main diagonal in N (1978) 0 UN A conventional means for accomplishing column weighting is to multiply s by diagonal matrix W on the right side 01 0 SW=S S(:,1)1…S(:,N)Nx]∈RMxN(19 To reverse product order such that diagonal matrix W instead appears to the left of s forI∈S(Law) S(:,1)0 0 SW=(6(W)8I 05(:,2) M×N (1980) 0 554 APPENDIX D. MATRIX CALCULUS To instead weight the rows of S via diagonal matrix WES, for IES/ 0 WS= 2 (6(W)⑧I)∈RMxN (1981) 0 SM D.1.2.2 Hadamard product For any matrices of like size, S,YER, Hadamard, s product o denotes simple multiplication of corresponding entries(. in Matlab). It is possible to convert Hadamard product into a standard product of matrices SoY=[(Y(:,1)…o(Y(:,N)] 0 ∈R4xX(1982) 0 0S(:;N) In the special case that S=s and Y=y are vectors in RM soy=8(s)y (198 6y=y (1984) D 1.3 Chain rules for composite matrix-functions Given dimensionally compatible matrix-valued functions of matrix variable f(X) and 9(X)393,.15.7] Vx g(f(X))-VxfVf9 (1985 ⅴ9(f(X))-Vx( Vx f v,g) 2 f Vfg+ Vxf vg Vx.f (198 D 1.3.1 Two arguments 9(f(x),b(X)2)=Vx 9 (1987) D.1.3.1.1 Example. Chain rule for two arguments 4,§1.1 g(f(n)2,M(n))=((x)+1(x)A((x)+h(x) (198 f() h(a) 10 (f(a), h( 0E 4+A1)(f+h)+ 01(4+4)(f+)(90 g(f(a), h(a)) 1+ A-A 1 E (1991) lim V2 g(f(a), h(a))=(A+ (1992 from table d.2.1 These foregoing formulae remain correct when gradient produces hyperdimensional representation D 1. GRADIENT, DIRECTIONAL DERIVATIVE, TAYLOR SERIES 555 D. 1.4 First directional derivative Assume that a differentiable function 9(X): RKXLRMXN has continuous first-and simple expressions for the first and second directional derivatives in direction YERKX/ R second-order gradients Vg and V g over dom g which is an open set. W respectively,d∈ RAtxN and dg2∈2MxN Assuming that the limit exists, we may state the partial derivative of the mn th entry of g with respect to kl th entry of X agm (X lim mn(X+Ateke)-gmn(X) ∈R 1993 △t→0 t where ek is the kth standard basis vector in RK while er is the /th standard basis vector IR. Total number of partial derivatives equals KlMN while the gradient is defined in their terms: mn n entry of the gradient is X11 X12 aX1 agm(X) agm (X) n(X) ∈R 99 agm. n 1 while the gradient is a quarti 011 V9IN(X) V921(X) V922 (X) V92N(X) ∈ IRMXNXK×L (1995) VmI(X) VyM2(X) VUMN(X) By simply rotating our perspective of a four-dimensional representation of gradient matrix we find one of three useful transpositions of this quarti(connoted 1) aq(x X10 ag(X) ag(X) 0g(X) 9(X)=|0X21 0X2 oX ∈R K× LXMXA (1996) ag(X) OXKI OXK2 When a limit for AtER exists, it is easy to show by substitution of variables in(1993) amn(X) gmn(x+ At Yklekel)-gmn(X) ∈R OX (1997) △t→0 which may be interpreted as the change in gmn at x when the change in Xkl is equal to hl the kll entry of any YERKXL Because the total change in gmn(x)due to y is the sum of change with respect to each and every Xkl, the mnth entry of the directional derivative is the corresponding total differential [393, $15.8 556 APPENDIX D. MATRIX CALCULUS (X) rnT dX→Y OX Ykl= tr(gmn(X)r) (1998 k,↓ gmn(X+ At ekel)-gmn(x) li △t (1999) imn9mn(X+△tY)-9mn(x △t (2000 gmn(X+tY (2001 where tER. Assuming finite y, equation(2000) is called the Gateaux differential 43, App. A5230, SD. 2.][405, $5. 28 whose existence is implied by existence of the Frechet diferential (the sun in(1998)).285, $7. 2 Each Illay be understood as the change in gmn at. X when the change in X is equal in magnitude and direction to y. D, 2Hence the directional derivative dg11(X) d912(X) dyIN(X) dg(x)dg21(X)d922(X) dg2N(X) ∈R4xN dgMI()dgM(X) dgM(X) dX→Y tr(vgu(x)Y) tr(V912(X)r).. tr(VgIN(XY (V921(X)y)t(Vg2(X)y) (V92N(X)Y) (2002) tr(gmi(xy tr(Vgm(X)r)..tr(VgMN(X)r) 01(xY1k,28 a912(X) aXI 0921(X)v k, I X a9m2(X) Kl from which it follows dg(X) OX Yet for all X Edom g. any YERKXL and some open interval of tER (X+tr)=9(X)+tdg(X)+ o( (2004) which is the first-order multidimensional Taylor series expansion about X. 393, 818.4 [177,82.3.4 Differentiation with respect to t and subsequent t-zeroing isolates the second term of expansion. Thus differentiating and zeroing g(X+ty in t is an operation equivalent to individually differentiating and zeroing every entry gmn(X+tY)as iI (2001). So the directional derivative of g(): RA ydRmXN in any direction YERKXL evaluated at X e dong becomes g(x)=元9(X+ty) 2005 D. Although Y is a matrix, we may regard it as a vector in RKL D 1. GRADIENT, DIRECTIONAL DERIVATIVE, TAYLOR SERIES 557 f(a+ty (a,f(a)) f (a) 07 Figure 189: Strictly convex quadratic bowl in RXR; f(a)=sTx: R2+R 6788 on some open disc in R 2 Plane slice a7l is perpendicular to function domain. Slice intersection wit, h domain connotes bidirectional vector y. Slope of tangent, line T a. point(a, f(a)) is value of directional derivative V f(a)y(2030) at a in slice direction y Negative gradient-Vrf(a)ER- is direction of steepest descent. 393, $15.6[177 When vector vER entry v3 is half directional derivative in gradient direction at a and when Va f(a), then -v points directly toward bowl bottom 315,82.1,$5.4.51[36, 86.3. 1]which is simplest. In case of a real function g(X): RAXL-R dg(X)=tr(V9(X)r 2027) In case 9(X):R→R dg(x)=v9(x)'y Unlike gradient, directional derivative does not expand dimension; directional derivative(2005) retains the dimensions of 9. The derivative with respect to t makes the directional derivative resemble ordinary calculus (SD. 2);e., when g(X)is linear, dg(X)=9(y).[285,72 D.1.4.1 Interpret ation of directional derivative In the case of any differentiable real function 9(X): RAXL-R, the directional derivative of g(X)at X in any direction Y yields the slope of g along the line X+tY tERI through its domain evaluated at t=0. For higher-dimensional functions, by (2002), this slope interpretation can bc applicd to cach cntry of thc directional dcrivativc Figure 189, for example, shows a plane slice of a real convex bow l-shaped function f( )along a line fa+ty tER through its domain. The slice reveals a one-dimensiona real function of t;f(a+tg). The directional derivative at a=& in direction y is the slope of f(a+ty) with respect to t at t=0. In the case of a real function having vector argument h(X):R-R, its directional derivative in the normalized direction of its gradient is the gradient magnitude.(2030)For a real function of real variable, the directional derivative evaluated at any point in the function domain is just the slope of that function there scaled by the real direction(confer $3.6 558 APPENDIX D. MATRIX CALCULUS Directional derivative generalizes our one-dimensional notion of derivative to a multidimensional domain when direction y coincides with a, member of the standard Cartesian basis eke,(63), then a single partial derivative ag(X)/aXkl is obtained from directional derivative(2003); such is each entry of gradient V9(X) in equalities(2027) and(2030), for example D. 1. 4.1.1 Theorem. Directional derivative optimality condition 285,§7.4 Suppose f(X):RAX+R is minimized on convex set CCRAX by X*, and the directional derivative of f exists there. Then for all XE C df(x)≥0 2006) ◇ D.1. 4.1.2 Example. Simple bowl Bowl function(Figure 189) R K R 2007 has function offset -bER, axis of revolution at =a, and positive definite hessian (1956) everywhere in its doMain (all open h isc in R);id est, strictly convex quadratic f(a) has unique global minimum equal to -b at x=a. A vector -v based anywhere in dom f x r pointing toward the unique bowl-bottom is spccificd (2)+b/∈ak a R 2008 Such a vector is Vr: f(a) Vaf(a) (2009) df(c) since the gradient is Vx/(x)=2(x-a) (2010) and the directional derivative in direction of the gradient is(2030) →Vf(x) df (a)= Vrf(ar)vf(a)=4(a-a(a-a)=4(f(r)+b) (2011) D. 1.5 Second directional derivative By similar argument, it so happens: thc sccond directional dcrivativc is equally simplc Given g(x): RKXL-RMXN on open domain -gmn(X (X agm(X aVgmn(X) a-gmn(X) ag agm(X) OXkL=|DXR0X aXkLOX22 aXRLOX2L aXiL ∈RkxL 2012 a-g -gmn(X) OXkLOXKI OXkLOXK2

(系统自动生成,下载前可以参看下载内容)