Matrix factorization for multivariate time series

文件名称: Matrix factorization for multivariate time series analysis

所属分类: 机器学习

开发工具:

文件大小: 288kb

下载次数: 0

上传时间: 2019-03-16

提供者: lex_g******

下载 (288kb)

不能下载？报告错误

详细说明：Matrix factorization is a powerful data analysis tool. It has been used in multivariate time series analysis, leading to the decomposition of the series in a small set of latent factors. However, little is known on the statistical performances of matrix factorization for time series. In this paper, we extend the results known for matrix estimation in the i.i.d setting to time series. Moreover, we prove that when the series exhibit some additional structure like periodicity or smoothness, it is possible to improve on the classical rates of convergence.MATRIX FACTORIZATION FOR MULTIVARLATE ' TIME SERIES ANALYSIS Example 2. 1(Periodic series). Assume that T=pr with p e N"for the sake of :implicity To as sum.e that the latent series in, the dictionary w are T-periodic is exactly equivalent to writing W=vA where V E MK, (R)andA=(1 1. T(R) is defined by blocks, I being the indentity matri.c in M.(IR) Example 2.2(Smooth series. We can assume that the series in w are smooth For example, say that they belong aa Sobolev space ith smoothness B, ae h.ve where(en)men is the Fourier basis (the definition of a Sobolev space is reminded is Section 3 g below ). Of course, there are infinitely many coefficients Ui.n and to estimate them all is not feasible, howcver, for T large enough, the appro mimation Wita U will be suitable, and can be rewritten as W= UA where Ai, t =c:(t/T). More details will be given in Section 3. 2 where we actually cover more general basis of nnction,s So our complctc modcl will finally be writtcn as X=M+E=UV∧+E, where U∈Mnk(R)andV∈Mk,r(R) are unknown,bur≤ T and a∈Mr,7(C such that rank(A)=r are known(note that the unstructured case corresponds to Tand△=Ir) Note that more constraint can be imposed on the estimator. For example, in nonnegative matrix factorization 35. one imposes that all the entries in U and w are nonnegative. Here, we will more generally assume that uv belongs to some prescribed subsct S Md,T(IR) In what follows, we will consider two norms on dr. For a matrix A, the Frobenius norms is given by A trace(AA") and the operator norm by u1p‖A where ll is the Euclidean norm on RT 2.1. Estimation by empirical risk minimization, By multiplying both side ②2 by the pseudo- inverse A+=A(△A*)-1, we obtain the“ simplified model X=M+E with= XAt, M=uv and a=EAt. In this model. the estimation of m can be done by empirical risk minimization Ms∈ arg min(A where F(A)=‖A-X;A∈Mn,(R). Therefore, we can define the estimator ms= Msa of m PIERRE ALQUIER不 AND NICOLAS MA上 In Section B we study the statistical performances of this estimator. The first step is done in Subsection B I where we derive upper bounds on rresponding upper bounds on S are derived in Subsection 3.2 3. ORACLE INEQUALITIES Throughout this section, assume that s fulfills the following Assumption 3.1. The rows of e are independent and have the same T-dimensional sub-Gaussian distribution, with sccond moment matriz 2e. Morcover, c1.2=1/2 is isotropic and has a finished sub-Gaussian norm, 凡 E(1,∑=12,x)P)p< lx‖l=1p∈[1,∞ Wc remind(scc c g Chaptcr 1 in 13)that when X N(O, In), sup sup p-E((X, 1)7=C or some universal constant C >0(that is, C does not depend on n). Thus, for Gaussian noise, Assumption 3 l is satisfied and Re =c does not depend on the di T 3. 1. The case A-IT. In this subsection only, we assume that A=Ir(and thus T=T). So the simplified model is actually the original model X=X, M=M, c=c and m Theorem 3.2. Under Assumption 3. 1 for every A E0, 1[ anseR 21+入 4 (d+T+sR MS-M mIn A-Ⅳ 1-入A∈S F with probabilily larger than 1-2e As a consequence. if we have indeed M E S, then with large probability, elop k(d+T+s#) d‖Ms Is-M≤ A(1入 dT Thus, we recover the rate O(at )claimed in the introduction, up to the constant Remark 3.3. Since the bound relies on the constant 2 llop, let us provide its value in some special cases (1)If cov(c1, t, C1,t)=olft=t then More generally, whcn C1. l,..., C1. T are uncorrelated, cop= max varel. MATRIX FACTORIZATION FOR MULTIVARIATE TIME SERIES ANALYSIS (2)Let (nttez be a white noise of standard deviation o>0 and assume that there exist s B E R* such that E1.t =tAnt I for everytE 1TI.In other words,(E1t)=1.T is the restriction of a MA(1) process to 1, TI. So 1+6 0 61+620 0 鲁垂 01+62-6 0 061+02 and then Eallop-=o2 1+02-20 min,cos ≤a2(1+ 1+T (3)Let(nt)tez be a white noise of standard deviation g>0 and assume that there is a p with p< 1 such that ci,t=p21,t-1+ nt. So(c1,. is the restriction of a AR(1)p2 1 I1+∑n2(J+(1) T-1 1 where 0 J 001 wc havc ∑ 3. 2. The general case. Let us now come back to the general case. An application of theorem③,2 to the“ simplified model(②2 shows that for any∈]0,1[and s∈R+ 2 1+入 4ck min AA-1‖+ (d+r+82)A+| with probability largcr than 1-2 In order to obtain the desired bound on ms F, we must, now understand the behaviour of∑A+| Lenna 3.4. Forany atri CE MT,(C) p≤|∑=‖p|C*C‖ The situation regarding R is different, we are not aware of a general simple upper bound on Re= RA+ in terms of Re and A+. Still, there are two cases where we actually have Me=R. Indeed. in the Gaussian case, Rg=R=C. see (4D above. For non Gaussian noise. we have the following result Lemma 3.5. Assume that there is C(T, T)>0 such that AA*=c(T, T)I. If ∑c=a2 I,, aith.o>0, then R=民 PIERRE ALQUIER不 AND NICOLAS MA上 Note that the assumption on A is fullfilled by the examples covered in Subsec- tions B 3 and (3.4 The previous discussion legitimates the following assumption Assuinlption3.6.R≤民 Finally, note that 2 Ms-M=(Ms-M)A≤‖Ms=M‖‖AA“|o and in the same way (AA-M)A+|z≤‖AA-MAA+(AA)-41p F By Inequality (5 toget her with Lemmas B 4 and B 5] we obtain the following result Corollary 3.7. Fit A EJ0, 1[ and s E R+. Under Assumption .Iand Assumption B.6 AA-M‖p+ 24cΣlpk(l+r+8f S P1-入A∈sdr w'ith probability larger than 1-2e - s We will now apply this corollary to various examples. 3.3. Application: periodic time series. In the case of T-periodic time series remind that we assumed for simplicity that there is an integer p such that Tp= T and we defined 1∴.I-)∈M,(R) Then T AA I→‖AA and I(AA )-l Thcrcforc, by Corollary 3.7 for cvcry A c0, 1 and s E R+, undcr Assumptions BI and 3.6 4c|∑ Ms-M≤ min AA MIIr A(1-A (d+T+sR with probability largcr than 1-2cs. Now. dcfinc S={A∈Mn,(R):rank(A)≤kand.t,A:t+x=A,t} and assume that me s. Then k(d+7) F dT which is indeed an improvement with respect to the rate obtained without taking the periodicity into account, that is 0((d+TI 3.4. Application: timc scrics with smooth trcnd. Assume we are given a dictionary of functions(en)n 0 such that ∑1()-∑()≤cLN2 n≤|N Definition 3.10. We define S(k, B, L)C MdT(IR)as the set of matrices M such that m=UW,U∈M,r(),W∈Ma,k(R)amd 1) r 2∈[1,d,‖U2,‖2≤1, (2) for any∈1, k and t∈1,],Wet-f(÷) for some f∈W(,L) Denote VN, w=(cn(fe)esk, ImsN Then(7 implie ∥M- UVN wAN/≤C(A,L)N=28 Pluging this into 6) gives lpk(d+7+6) d Ms(k,B, L)-M 21+.C(,D)N PIERRE ALQUIER不 AND NICOLAS MAR上 If B is known, an adequate optimization with respect to N gives the following resu Corollary 3. 11. Assume that M E S(K, B, L). Under Assumptions B. 1 and 3.6 the choice N-l(dT/k)1/(25+1)ensures k M≤C dr with probability larger than 1-2e-, here c >0 is some constant depending on L,β,‖l|lp,A, c and fa That is, the rate of convergence is in o( E+(k)224 e Iloweve, in practice, B is not known- nor the rank k. This problem is tackled the next section 4. MODEL SELECTION Assumc that we havc many possiblc matrices Ar, for E C1,., T) and for each 7, many possible S, for different possible ranks k∈KC{1,……,d∧T} Consider s ER+ and the penalized estimator Ms= Ms, with )∈arg X+pen (丌:k)∈Tx人 F a+7+k(7,k pen(T, he Theorem 4.1. Under Assumptions[ I andB.6 for Every A EJ0, 1[ 2 d mIn (+3) ∥AA,-M/2 A∈S 8C|∑llp(1-4)k(d++s A(1-入)2 with probability larger than 1-2e Rcmark 4. 2. The reader might feel uncomfortable with the fact that the modcl selection procedure leads to ke and Ts that depend on the prescribed confidence level s. Note that if k= ko is known, that is k=ko1 then it is clear from the definition chal ts actually does Teol depend oT s As an application, assume that M E S(k, 6, L)where h is known, but B is unknown. Then the model sclcction proccdurc is fcasiblc as it docs not depend on B, and it satisfies exactly the same rate as Msik, B, L) in Corollary B.l REFERENCeS [1 P. Alquier. Bayesian methods for low-rank matrix estimation: short survey and theoretical study In International Conference on Algorithmic Learning Theory, pages 309-323 Springer 12 P. Alquier, V. Cottet, and G. Lecue. Estimation bounds and sharp oracle inequalities of regularized proccdurcs with Lipschitz loss functions. ar Xiv preprint, to appar in thc Annals of Statistics, 2017 3 P Alquier and P. Doukhan Sparsity considerations for dependent variables. Electronic jour nnl of statistics, 5:750-774, 201 MATRIX FACTORIZATION FOR MULTIVARLATE ' TIME SERIES ANALYSIS 4 P. Alquier and B Guedj. An oracle inequality for quasi-Bayesian nonnegative matrix factor ization. Mathematical Methods of Statistics, 26(1): 55-67, 2017 L. Bauwens and M. Lubrano. ldentification restriction and posterior densities in cointegrated Gaussian VAR systems II T. M. FoInby and R. Carter Hill, editors, Aduances in econOTILet rics,vol. 11(B). JAI Press, Greenwich, 1993 6 S Boucheron, G. Lugosi, and P. Massa.rt. Com.centration. inequalities: A nom. asymptotic theory of independence. Oxford university press, 2013 7 T Cai, D. Kim, Y Wang, M. Yuan, and I Zhou Optimal large-scale quantum state tomog raphy with Pauli measurements. The Annals of Statistics, 44(2): 682-712, 2016 18 T. Cai and A. Zhang. Rop: Matrix recovery via rank-one projections. The Annals of Statistics, 43(1):102-138,2015. 9 E J. Candcs and Y Plan Matrix completion with noisc. Procccdings of thC IEEE, 38(6): 925 936.2010 10 E. J. Candes and B. Recht. Exact matrix completion via convex optimization. Foundations of Computational m.thematics, 9 (6): 717, 2009 [11 E. J Candes and T. Tao. The power of convex relaxation: Near-optimal matrix completion IEEE Transactions on Information Theory, 56(5): 2053-2080, 2010 [12 L. Carel and P. Alquier. Non-negative matrix factorization as a pre-processing tool for travel the 25th Ey Artificial Neural NEtworks, ConprutatioTal Intellgence ard Machine Learnning, pages 417-422, 2017 [13 D. Chafai, O A Pa randoin matrices and high diynensiogucl geonetry. Societe Mathematique de France. 2012 jan, G. Severini, A. Turolla, and P. Bonato. Decomposing time series data by a non-negative mat rix factorization algorithm with temporally constrained coeffi cients. In 2015 37th Anneal International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pages 3496-3499 IEE B,2015 15 S. Chretien and B Guedj. Revisiting clustering as matrix factorisation on the Stiefel manifold. arXi preprint ar Xiv: 1903.04479, 2019 16 A.S. Dalalyall. Exponential weights in multivariate regression alld a low-rankness favoring prior. arXiv preprint arXev: 1806.09405, 2018. [17A. S. DalalyaIl, E Grappin, and Q. Paris. On the exponentially weighted aggregate with the Laplace prior. The Annals of Statistics, 16(5): 2452-2478, 2018 118Y. De Castro, Y Goude, G. Hebrail, and Mei Recovering multiple nonnegative time ser from a few temporal aggregates In ICMI 2017-34th Internationol Conference on Machine Learning, pages 1-9, 2017. [19 J Dedecker, P. Doukhan, G. Lang, L R J. Rafael, S. Louhichi, and C. Prieur. Weak depen- dence. In Weak dependence: With ecamples and applications, pages 9-20. Springer, 2007. 20 R. F. Engle and C. W. J. Granger. Co-integration and error correction: repre estimation, and tosting. Economctrica: ournal of the econometric Socicty, pages 1987 21 I A. Gcncvcra, L. Groscnick, and J. Taylor. A gcncralizcd lcast-squarc matrix decomposition Journ al of the Anericam Stati. stical A s sociation, 109(505: 145-159, 2014 22 J. Geweke. Bayesian reduced rank regression in econometrics. Journal of Econometrics, 5:121-146,1996 23 D. Gross. Recovering low-rank matrices from few coefficients in any basis. Information The 09, fEE Transactions on,57(3):1548-1566,2011 24S. Gultekin and J. Paisley. Online forecasting matrix factorization. ar Xiv preprint aXv:171g.08734.2017 25 M. Guta, T. Kypraios, and I Dryden. Rank-based model selection for multiple ions quantum hy New journal of phy 14(10):105002,2012 26 F; Husson,J Josse, B Narasimhan, and G. Robin Imputation of mixed data with multilevel singular value decomposition. ar Xiu reprint ar Xiv: 1804.11087, 2018 27A. Izenman Reduced rank regression for the multivariate linear model. J ournal of Multivar ate Analysis,5(2):248-264,1975 models,< Sen and H. K. van Dijk. On the shape of the likelihood-posterior in cointegration 0omet?ic theory,10:514-551.1994 29 F. Klcibcrgon and H. K. van Dijk. Baycsian simultancous cquation analysis using rcduccd [30O. Klopp, K. Lounici, and A. B. Tsybakov. Robust matrix completion. Probability Theory and Related Tields, 169(1-2): 523-564, 2017 31O. Klopp, Y. Lu, A. B. Tsybakov, and H. H. Zhou Structured matrix estimation and com- pletion. ar Xiv preprint ar Xiv. 1701.02090, 2017 PIERRE ALQUIER不 AND NICOLAS MA上 32 V. Koltchinskii, K. Lounici, and A. B. Tsybakov. Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion. The Annals of Statistics, 39(5): 2302-2329, 2011 33 G. Koop and D. Korobilis. Bayesian multivariate time series methods for empirical macro- conomics. Foundations und Trends( in Econometrics, 3(4): 267-358, 2010 34Y. Koren, R Bell, and C. Volinsky Matrix factorization techniques for recommender systems omputer,42(8):0-37,2009 35 D. D. Lee and H S Seung. Learning the parts of objects by non-negative matrix factorization Nature,401(6755):788-791,1999 [ 36A Lumbreras, L. Filstroff, and C. Fevotte Bayesian mean-parameterized nonnegative binary matrix factorization. ar Xiv preprint arXiu: 1812.06866, 2018 37 T. D. Luu,J. Fadili, and C. Chesneau. Sharp oracle inequalities for low-complexity priors t or xiu:1702.03166,2017 38 T. T Mai and P. Alquier. A Bayesian approach for noisy Matrix completion: Optimal rate under general sampling distribution. Electromic Journal of Statistics, 9(1):823-841,2015 189T.T Mai and P. Alquier. Pseudo-bayesian quant. Im tomography wit h rank-adaptat ion Jo1? nal of statistical Planning and Inference, 184: 62-76, 2017. 140 J. Mei, Y. De Castro, Y. Goude, -M. Azais, and G. Hebrail. Nonnegative matrix factor ization with side information for time series recovery and prediction. IEEE Transactions on ledge and data n 41 K. Moridoni, K. Hatano, and E. Takinoto. Tighter generalization bounds for matrix coIl- plction via factorization into constrained matrices. IEICE Transactions on Information and sterns,101(8):1997-2004,2018 42 A. Ozerov and C. Fevotte. Multichannel nonnegative matrix factorization in convolutive mixtures for andio source separation. /FFF Tmnsaction.s om. A7.dio, Speech, and Language Processing,18(3):50-563,2010 [43 Paisley, D. Blei, and M.I. Jordan. Bayesian nonnegative matrix factorization with sto chastic variational inference, volume Handbook of Mixed Membership Models and Their Applications, chapter 11. Chapman and Hall/CRC, 2015 44A. Saha and V. Sindhiwani. Learning evolving and eMerging topics in social nedia: a dymamic nmf approach with temporal regularization. In Procccdings of the fifth 1CM international cor ference or Web search anld data MirvinG, Pages 693-702. ACM, 2012 45 F. Shahnaz, M. W. Berry, V. P. Pauca, and R. J. Plemmons. Document clustering using nonnegative matrix factorization. Information. Processing B Management, 42(2): 373-386 2006 461. Suzuki. Convergence rate of Bayesian tensor estimator and its minimax optimality. In International Conference on Mach [47 E 'lonelier, N. Baskiotis, V. Guigue, and P. Gallinari. Anomaly detection in smart card logs and distant evaluation with twitter: a robust framework. Neurocomputing, 298: 109-121 2018. 48J. A. Tropp. User-friendly tail bounds for suns of randoin matrices. Fouycationns of commupi tational mathematics, 12(4): 389 434, 2012 19 A. B. Tsy bakov. Introduction to Nonparametric Estimation. 2009 50 C. Vernade and O. Cappe. Learning from missing data using selection bias in movie recom- mendation In 2015 IEEE International Conference on Data Science and Aduanced Analytics DSA.A), pages 1-9. IEEE, 2015 [51]R Vershynin. Introduction to the non-asymptotic analysis of random matrices. cr Xiu preprint arXi:1011.3027,2010 52 D. Xia and V. Koltchinskii. Estimation of low rank dcnsity matrices: bounds in schatten llorIs and other distances. Electronic JourNal of Statistics, 10(2): 2717-2745, 2016 53 H F. Yu, N. Rao, and I. S Dhillon. Temporal regularized matrix factorization for high dimensional time series prediction In D. D. Lee, M. Sugiyama, U. V. Luxburg, I Guyon, and R. Garnett, editors, Adcances in Neural Information Processing Systems 29, pages 8417-855 Curran associates. Inc. 2016 5. PROOFS 5.1 Additional notations let us first introduce a few additional notations First. for the sa ke of shortness. we introduce the estimation risk R and the empirical risk r. These notations also make clear the fact that our estimator can be seen as an enpirical risk ininiinizer R(A)=|A-M|}andr(A)=‖A-X|P;A∈Ma(R)

(系统自动生成,下载前可以参看下载内容)