文件名称:
Matrix factorization for multivariate time series analysis
开发工具:
文件大小: 288kb
下载次数: 0
上传时间: 2019-03-16
详细说明:Matrix factorization is a powerful data analysis tool. It has been used in multivariate time series analysis, leading to the decomposition of the series in a small set of latent factors. However, little is known on the statistical performances of matrix factorization for time series. In this paper, we extend the results known for matrix estimation in the i.i.d setting to time series. Moreover, we prove that when the series exhibit some additional structure like periodicity or smoothness, it is possible to improve on the classical rates of convergence.MATRIX FACTORIZATION FOR MULTIVARLATE ' TIME SERIES ANALYSIS
Example 2. 1(Periodic series). Assume that T=pr with p e N"for the sake of
:implicity To as sum.e that the latent series in, the dictionary w are T-periodic is
exactly equivalent to writing W=vA where V E MK, (R)andA=(1
1. T(R) is defined by blocks, I being the indentity matri.c in M.(IR)
Example 2.2(Smooth series. We can assume that the series in w are smooth
For example, say that they belong aa Sobolev space ith smoothness B, ae h.ve
where(en)men is the Fourier basis (the definition of a Sobolev space is reminded
is Section 3 g below ). Of course, there are infinitely many coefficients Ui.n and to
estimate them all is not feasible, howcver, for T large enough, the appro mimation
Wita
U
will be suitable, and can be rewritten as W= UA where Ai, t =c:(t/T). More
details will be given in Section 3. 2 where we actually cover more general basis of
nnction,s
So our complctc modcl will finally be writtcn as
X=M+E=UV∧+E,
where U∈Mnk(R)andV∈Mk,r(R) are unknown,bur≤ T and a∈Mr,7(C
such that rank(A)=r are known(note that the unstructured case corresponds to
Tand△=Ir)
Note that more constraint can be imposed on the estimator. For example, in
nonnegative matrix factorization 35. one imposes that all the entries in U and w
are nonnegative. Here, we will more generally assume that uv belongs to some
prescribed subsct S Md,T(IR)
In what follows, we will consider two norms on dr. For a matrix A, the
Frobenius norms is given by
A
trace(AA")
and the operator norm by
u1p‖A
where ll is the Euclidean norm on RT
2.1. Estimation by empirical risk minimization, By multiplying both side
②2 by the pseudo- inverse A+=A(△A*)-1, we obtain the“ simplified model
X=M+E
with= XAt, M=uv and a=EAt. In this model. the estimation of m can
be done by empirical risk minimization
Ms∈ arg min(A
where
F(A)=‖A-X;A∈Mn,(R).
Therefore, we can define the estimator ms= Msa of m
PIERRE ALQUIER不 AND NICOLAS MA上
In Section B we study the statistical performances of this estimator. The first
step is done in Subsection B I where we derive upper bounds on
rresponding upper bounds on
S
are derived in Subsection 3.2
3. ORACLE INEQUALITIES
Throughout this section, assume that s fulfills the following
Assumption 3.1. The rows of e are independent and have the same T-dimensional
sub-Gaussian distribution, with sccond moment matriz 2e. Morcover, c1.2=1/2 is
isotropic and has a finished sub-Gaussian norm,
凡
E(1,∑=12,x)P)p<
lx‖l=1p∈[1,∞
Wc remind(scc c g Chaptcr 1 in 13)that when X N(O, In),
sup sup p-E((X, 1)7=C
or some universal constant C >0(that is, C does not depend on n). Thus, for
Gaussian noise, Assumption 3 l is satisfied and Re =c does not depend on the
di
T
3. 1. The case A-IT. In this subsection only, we assume that A=Ir(and thus
T=T). So the simplified model is actually the original model X=X, M=M,
c=c and m
Theorem 3.2. Under Assumption 3. 1 for every A E0, 1[ anseR
21+入
4
(d+T+sR
MS-M
mIn
A-Ⅳ
1-入A∈S
F
with probabilily larger than 1-2e
As a consequence. if we have indeed M E S, then with large probability,
elop k(d+T+s#)
d‖Ms
Is-M≤
A(1入
dT
Thus, we recover the rate O(at )claimed in the introduction, up to the constant
Remark 3.3. Since the bound relies on the constant 2 llop, let us provide its
value in some special cases
(1)If cov(c1, t, C1,t)=olft=t then
More generally, whcn C1. l,..., C1. T are uncorrelated,
cop= max varel.
MATRIX FACTORIZATION FOR MULTIVARIATE TIME SERIES ANALYSIS
(2)Let (nttez be a white noise of standard deviation o>0 and assume that
there exist s B E R* such that E1.t =tAnt I for everytE 1TI.In other
words,(E1t)=1.T is the restriction of a MA(1) process to 1, TI. So
1+6
0
61+620
0
鲁垂
01+62-6
0
061+02
and then
Eallop-=o2 1+02-20 min,cos
≤a2(1+
1+T
(3)Let(nt)tez be a white noise of standard deviation g>0 and assume that
there is a p with p< 1 such that ci,t=p21,t-1+ nt. So(c1,. is
the restriction of a AR(1)p2
1
I1+∑n2(J+(1)
T-1
1
where
0
J
001
wc havc
∑
3. 2. The general case. Let us now come back to the general case. An application
of theorem③,2 to the“ simplified model(②2 shows that for any∈]0,1[and
s∈R+
2
1+入
4ck
min AA-1‖+
(d+r+82)A+|
with probability largcr than 1-2
In order to obtain the desired bound on ms
F, we must, now understand the
behaviour of∑A+|
Lenna 3.4. Forany atri CE MT,(C)
p≤|∑=‖p|C*C‖
The situation regarding R is different, we are not aware of a general simple
upper bound on Re= RA+ in terms of Re and A+. Still, there are two cases where
we actually have Me=R. Indeed. in the Gaussian case, Rg=R=C. see (4D
above. For non Gaussian noise. we have the following result
Lemma 3.5. Assume that there is C(T, T)>0 such that AA*=c(T, T)I. If
∑c=a2 I,, aith.o>0, then R=民
PIERRE ALQUIER不 AND NICOLAS MA上
Note that the assumption on A is fullfilled by the examples covered in Subsec-
tions B 3 and (3.4
The previous discussion legitimates the following assumption
Assuinlption3.6.R≤民
Finally, note that
2
Ms-M=(Ms-M)A≤‖Ms=M‖‖AA“|o
and in the same way
(AA-M)A+|z≤‖AA-MAA+(AA)-41p
F
By Inequality (5 toget her with Lemmas B 4 and B 5] we obtain the following result
Corollary 3.7. Fit A EJ0, 1[ and s E R+. Under Assumption .Iand Assumption
B.6
AA-M‖p+
24cΣlpk(l+r+8f
S
P1-入A∈sdr
w'ith probability larger than 1-2e - s
We will now apply this corollary to various examples.
3.3. Application: periodic time series. In the case of T-periodic time series
remind that we assumed for simplicity that there is an integer p such that Tp= T
and we defined
1∴.I-)∈M,(R)
Then
T
AA
I→‖AA
and I(AA )-l
Thcrcforc, by Corollary 3.7 for cvcry A c0, 1 and s E R+, undcr Assumptions BI
and 3.6
4c|∑
Ms-M≤
min AA MIIr A(1-A
(d+T+sR
with probability largcr than 1-2cs. Now. dcfinc
S={A∈Mn,(R):rank(A)≤kand.t,A:t+x=A,t}
and assume that me s. Then
k(d+7)
F
dT
which is indeed an improvement with respect to the rate obtained without taking
the periodicity into account, that is 0((d+TI
3.4. Application: timc scrics with smooth trcnd. Assume we are given a
dictionary of functions(en)n 0 such that
∑1()-∑()≤cLN2
n≤|N
Definition 3.10. We define S(k, B, L)C MdT(IR)as the set of matrices M such
that m=UW,U∈M,r(),W∈Ma,k(R)amd
1) r 2∈[1,d,‖U2,‖2≤1,
(2) for any∈1, k and t∈1,],Wet-f(÷) for some f∈W(,L)
Denote VN, w=(cn(fe)esk, ImsN
Then(7 implie
∥M- UVN wAN/≤C(A,L)N=28
Pluging this into 6) gives
lpk(d+7+6)
d
Ms(k,B, L)-M
21+.C(,D)N
PIERRE ALQUIER不 AND NICOLAS MAR上
If B is known, an adequate optimization with respect to N gives the following
resu
Corollary 3. 11. Assume that M E S(K, B, L). Under Assumptions B. 1 and 3.6
the choice N-l(dT/k)1/(25+1)ensures
k
M≤C
dr
with probability larger than 1-2e-, here c >0 is some constant depending on
L,β,‖l|lp,A, c and fa
That is, the rate of convergence is in o( E+(k)224
e Iloweve, in practice, B is not known- nor the rank k. This problem is tackled
the next section
4. MODEL SELECTION
Assumc that we havc many possiblc matrices Ar, for E C1,., T) and
for each 7, many possible S, for different possible ranks k∈KC{1,……,d∧T}
Consider s ER+ and the penalized estimator Ms= Ms, with
)∈arg
X+pen
(丌:k)∈Tx人
F
a+7+k(7,k
pen(T, he
Theorem 4.1. Under Assumptions[ I andB.6 for Every A EJ0, 1[
2
d
mIn
(+3)
∥AA,-M/2
A∈S
8C|∑llp(1-4)k(d++s
A(1-入)2
with probability larger than 1-2e
Rcmark 4. 2. The reader might feel uncomfortable with the fact that the modcl
selection procedure leads to ke and Ts that depend on the prescribed confidence level
s. Note that if k= ko is known, that is k=ko1 then it is clear from the definition
chal ts actually does Teol depend oT s
As an application, assume that M E S(k, 6, L)where h is known, but B is
unknown. Then the model sclcction proccdurc is fcasiblc as it docs not depend on
B, and it satisfies exactly the same rate as Msik, B, L) in Corollary B.l
REFERENCeS
[1 P. Alquier. Bayesian methods for low-rank matrix estimation: short survey and theoretical
study In International Conference on Algorithmic Learning Theory, pages 309-323 Springer
12 P. Alquier, V. Cottet, and G. Lecue. Estimation bounds and sharp oracle inequalities of
regularized proccdurcs with Lipschitz loss functions. ar Xiv preprint, to appar in thc Annals
of Statistics, 2017
3 P Alquier and P. Doukhan Sparsity considerations for dependent variables. Electronic jour
nnl of statistics, 5:750-774, 201
MATRIX FACTORIZATION FOR MULTIVARLATE ' TIME SERIES ANALYSIS
4 P. Alquier and B Guedj. An oracle inequality for quasi-Bayesian nonnegative matrix factor
ization. Mathematical Methods of Statistics, 26(1): 55-67, 2017
L. Bauwens and M. Lubrano. ldentification restriction and posterior densities in cointegrated
Gaussian VAR systems II T. M. FoInby and R. Carter Hill, editors, Aduances in econOTILet
rics,vol. 11(B). JAI Press, Greenwich, 1993
6 S Boucheron, G. Lugosi, and P. Massa.rt. Com.centration. inequalities: A nom. asymptotic theory
of independence. Oxford university press, 2013
7 T Cai, D. Kim, Y Wang, M. Yuan, and I Zhou Optimal large-scale quantum state tomog
raphy with Pauli measurements. The Annals of Statistics, 44(2): 682-712, 2016
18 T. Cai and A. Zhang. Rop: Matrix recovery via rank-one projections. The Annals of Statistics,
43(1):102-138,2015.
9 E J. Candcs and Y Plan Matrix completion with noisc. Procccdings of thC IEEE, 38(6): 925
936.2010
10 E. J. Candes and B. Recht. Exact matrix completion via convex optimization. Foundations
of Computational m.thematics, 9 (6): 717, 2009
[11 E. J Candes and T. Tao. The power of convex relaxation: Near-optimal matrix completion
IEEE Transactions on Information Theory, 56(5): 2053-2080, 2010
[12 L. Carel and P. Alquier. Non-negative matrix factorization as a pre-processing tool for travel
the 25th Ey
Artificial
Neural NEtworks, ConprutatioTal Intellgence ard Machine Learnning, pages 417-422, 2017
[13 D. Chafai, O
A Pa
randoin matrices and high diynensiogucl geonetry. Societe Mathematique de France. 2012
jan, G. Severini, A. Turolla, and P. Bonato. Decomposing time series
data by a non-negative mat rix factorization algorithm with temporally constrained coeffi
cients. In 2015 37th Anneal International Conference of the IEEE Engineering in Medicine
and Biology Society (EMBC), pages 3496-3499 IEE
B,2015
15 S. Chretien and B Guedj. Revisiting clustering as matrix factorisation on the Stiefel manifold.
arXi preprint ar Xiv: 1903.04479, 2019
16 A.S. Dalalyall. Exponential weights in multivariate regression alld a low-rankness favoring
prior. arXiv preprint arXev: 1806.09405, 2018.
[17A. S. DalalyaIl, E Grappin, and Q. Paris. On the exponentially weighted aggregate with the
Laplace prior. The Annals of Statistics, 16(5): 2452-2478, 2018
118Y. De Castro, Y Goude, G. Hebrail, and Mei Recovering multiple nonnegative time ser
from a few temporal aggregates In ICMI 2017-34th Internationol Conference on Machine
Learning, pages 1-9, 2017.
[19 J Dedecker, P. Doukhan, G. Lang, L R J. Rafael, S. Louhichi, and C. Prieur. Weak depen-
dence. In Weak dependence: With ecamples and applications, pages 9-20. Springer, 2007.
20 R. F. Engle and C. W. J. Granger. Co-integration and error correction: repre
estimation, and tosting. Economctrica: ournal of the econometric Socicty, pages
1987
21 I A. Gcncvcra, L. Groscnick, and J. Taylor. A gcncralizcd lcast-squarc matrix decomposition
Journ al of the Anericam Stati. stical A s sociation, 109(505: 145-159, 2014
22 J. Geweke. Bayesian reduced rank regression in econometrics. Journal of Econometrics,
5:121-146,1996
23 D. Gross. Recovering low-rank matrices from few coefficients in any basis. Information The
09, fEE Transactions on,57(3):1548-1566,2011
24S. Gultekin and J. Paisley. Online forecasting matrix factorization. ar Xiv preprint
aXv:171g.08734.2017
25 M. Guta, T. Kypraios, and I Dryden. Rank-based model selection for multiple ions quantum
hy New journal of phy
14(10):105002,2012
26 F; Husson,J Josse, B Narasimhan, and G. Robin Imputation of mixed data with multilevel
singular value decomposition. ar Xiu reprint ar Xiv: 1804.11087, 2018
27A. Izenman Reduced rank regression for the multivariate linear model. J ournal of Multivar
ate Analysis,5(2):248-264,1975
models,< Sen and H. K. van Dijk. On the shape of the likelihood-posterior in cointegration
0omet?ic theory,10:514-551.1994
29 F. Klcibcrgon and H. K. van Dijk. Baycsian simultancous cquation analysis using rcduccd
[30O. Klopp, K. Lounici, and A. B. Tsybakov. Robust matrix completion. Probability Theory
and Related Tields, 169(1-2): 523-564, 2017
31O. Klopp, Y. Lu, A. B. Tsybakov, and H. H. Zhou Structured matrix estimation and com-
pletion. ar Xiv preprint ar Xiv. 1701.02090, 2017
PIERRE ALQUIER不 AND NICOLAS MA上
32 V. Koltchinskii, K. Lounici, and A. B. Tsybakov. Nuclear-norm penalization and optimal
rates for noisy low-rank matrix completion. The Annals of Statistics, 39(5): 2302-2329, 2011
33 G. Koop and D. Korobilis. Bayesian multivariate time series methods for empirical macro-
conomics. Foundations und Trends( in Econometrics, 3(4): 267-358, 2010
34Y. Koren, R Bell, and C. Volinsky Matrix factorization techniques for recommender systems
omputer,42(8):0-37,2009
35 D. D. Lee and H S Seung. Learning the parts of objects by non-negative matrix factorization
Nature,401(6755):788-791,1999
[ 36A Lumbreras, L. Filstroff, and C. Fevotte Bayesian mean-parameterized nonnegative binary
matrix factorization. ar Xiv preprint arXiu: 1812.06866, 2018
37 T. D. Luu,J. Fadili, and C. Chesneau. Sharp oracle inequalities for low-complexity priors
t or xiu:1702.03166,2017
38 T. T Mai and P. Alquier. A Bayesian approach for noisy Matrix completion: Optimal rate
under general sampling distribution. Electromic Journal of Statistics, 9(1):823-841,2015
189T.T Mai and P. Alquier. Pseudo-bayesian quant. Im tomography wit h rank-adaptat ion Jo1?
nal of statistical Planning and Inference, 184: 62-76, 2017.
140 J. Mei, Y. De Castro, Y. Goude, -M. Azais, and G. Hebrail. Nonnegative matrix factor
ization with side information for time series recovery and prediction. IEEE Transactions on
ledge and data n
41 K. Moridoni, K. Hatano, and E. Takinoto. Tighter generalization bounds for matrix coIl-
plction via factorization into constrained matrices. IEICE Transactions on Information and
sterns,101(8):1997-2004,2018
42 A. Ozerov and C. Fevotte. Multichannel nonnegative matrix factorization in convolutive
mixtures for andio source separation. /FFF Tmnsaction.s om. A7.dio, Speech, and Language
Processing,18(3):50-563,2010
[43 Paisley, D. Blei, and M.I. Jordan. Bayesian nonnegative matrix factorization with sto
chastic variational inference, volume Handbook of Mixed Membership Models and Their
Applications, chapter 11. Chapman and Hall/CRC, 2015
44A. Saha and V. Sindhiwani. Learning evolving and eMerging topics in social nedia: a dymamic
nmf approach with temporal regularization. In Procccdings of the fifth 1CM international
cor ference or Web search anld data MirvinG, Pages 693-702. ACM, 2012
45 F. Shahnaz, M. W. Berry, V. P. Pauca, and R. J. Plemmons. Document clustering using
nonnegative matrix factorization. Information. Processing B Management, 42(2): 373-386
2006
461. Suzuki. Convergence rate of Bayesian tensor estimator and its minimax optimality. In
International Conference on Mach
[47 E 'lonelier, N. Baskiotis, V. Guigue, and P. Gallinari. Anomaly detection in smart card
logs and distant evaluation with twitter: a robust framework. Neurocomputing, 298: 109-121
2018.
48J. A. Tropp. User-friendly tail bounds for suns of randoin matrices. Fouycationns of commupi
tational mathematics, 12(4): 389 434, 2012
19 A. B. Tsy bakov. Introduction to Nonparametric Estimation. 2009
50 C. Vernade and O. Cappe. Learning from missing data using selection bias in movie recom-
mendation In 2015 IEEE International Conference on Data Science and Aduanced Analytics
DSA.A), pages 1-9. IEEE, 2015
[51]R Vershynin. Introduction to the non-asymptotic analysis of random matrices. cr Xiu preprint
arXi:1011.3027,2010
52 D. Xia and V. Koltchinskii. Estimation of low rank dcnsity matrices: bounds in schatten
llorIs and other distances. Electronic JourNal of Statistics, 10(2): 2717-2745, 2016
53 H F. Yu, N. Rao, and I. S Dhillon. Temporal regularized matrix factorization for high
dimensional time series prediction In D. D. Lee, M. Sugiyama, U. V. Luxburg, I Guyon, and
R. Garnett, editors, Adcances in Neural Information Processing Systems 29, pages 8417-855
Curran associates. Inc. 2016
5. PROOFS
5.1 Additional notations let us first introduce a few additional notations
First. for the sa ke of shortness. we introduce the estimation risk R and the
empirical risk r. These notations also make clear the fact that our estimator can
be seen as an enpirical risk ininiinizer
R(A)=|A-M|}andr(A)=‖A-X|P;A∈Ma(R)
(系统自动生成,下载前可以参看下载内容)
下载文件列表
相关说明
- 本站资源为会员上传分享交流与学习,如有侵犯您的权益,请联系我们删除.
- 本站是交换下载平台,提供交流渠道,下载内容来自于网络,除下载问题外,其它问题请自行百度。
- 本站已设置防盗链,请勿用迅雷、QQ旋风等多线程下载软件下载资源,下载后用WinRAR最新版进行解压.
- 如果您发现内容无法下载,请稍后再次尝试;或者到消费记录里找到下载记录反馈给我们.
- 下载后发现下载的内容跟说明不相乎,请到消费记录里找到下载记录反馈给我们,经确认后退回积分.
- 如下载前有疑问,可以通过点击"提供者"的名字,查看对方的联系方式,联系对方咨询.