apresentacao_CIDMA_2010

Embed Size (px)

Citation preview

  • 8/7/2019 apresentacao_CIDMA_2010

    1/58

    Isabel Silva Principal Component Analysis for Time Series

    Principal Component Analysis for Time

    Series

    Isabel Silva

    Departamento de Engenharia Civil, Faculdade de Engenharia da Universidade do Porto

    Centro de Investigao e Desenvolvimento em Matemtica e Aplicaes (CIDMA), Universidade de Aveiro

    Seminrio do Grupo de Probabilidades e Estatstica

    21 de Abril de 2010

    Seminrio do Grupo de Probabilidades e Estatstica 1 / 24

  • 8/7/2019 apresentacao_CIDMA_2010

    2/58

    Isabel Silva Principal Component Analysis for Time Series

    Outline

    Motivation

    Principal Component Analysis for time series

    Classic Principal Component Analysis

    Weighted Principal Component Analysis

    Dynamics Principal Component Analysis

    Singular Spectrum Analysis / Multi-Channel Singular Spectrum Analysis

    Illustration

    Final remarks

    Seminrio do Grupo de Probabilidades e Estatstica 2 / 24

  • 8/7/2019 apresentacao_CIDMA_2010

    3/58

    Isabel Silva Principal Component Analysis for Time Series

    Motivation

    Multidimensional time and space-time series

    Motivation Seminrio do Grupo de Probabilidades e Estatstica 3 / 24

  • 8/7/2019 apresentacao_CIDMA_2010

    4/58

    Isabel Silva Principal Component Analysis for Time Series

    Motivation

    Multidimensional time and space-time seriesNumber of observations (T) > Number of series (n)

    Dimensionality reduction

    Motivation Seminrio do Grupo de Probabilidades e Estatstica 3 / 24

    I b l Sil P i i l C A l i f Ti S i

  • 8/7/2019 apresentacao_CIDMA_2010

    5/58

    Isabel Silva Principal Component Analysis for Time Series

    Motivation

    Multidimensional time and space-time seriesNumber of observations (T) > Number of series (n)

    Dimensionality reduction

    Principal Components Analysis (PCA)

    Motivation Seminrio do Grupo de Probabilidades e Estatstica 3 / 24

    Isabel Sil a Principal Component Anal sis for Time Series

  • 8/7/2019 apresentacao_CIDMA_2010

    6/58

    Isabel Silva Principal Component Analysis for Time Series

    Motivation

    Multidimensional time and space-time seriesNumber of observations (T) > Number of series (n)

    Dimensionality reduction

    Principal Components Analysis (PCA)

    T original variables

    (observation times)

    lineartransformation

    M uncorrelated variables:

    Principal Components (PC)

    M

    T retain most of the variation presented in the dataset [Jolliffe, 2002]

    Motivation Seminrio do Grupo de Probabilidades e Estatstica 3 / 24

    Isabel Silva Principal Component Analysis for Time Series

  • 8/7/2019 apresentacao_CIDMA_2010

    7/58

    Isabel Silva Principal Component Analysis for Time Series

    Classic Principal Component Analysis

    n measurements on T VARIABLES:

    {Y1,Y2, . . . ,YT

    }, Yj

    R

    n, j = 1, . . . ,T

    n time series, each one with T OBSERVATIONS: {y1,y2, . . . ,yn}, yi RT, i = 1, . . . ,n

    Principal Component Analysis for time series Seminrio do Grupo de Probabilidades e Estatstica 4 / 24

    Isabel Silva Principal Component Analysis for Time Series

  • 8/7/2019 apresentacao_CIDMA_2010

    8/58

    Isabel Silva Principal Component Analysis for Time Series

    Classic Principal Component Analysis

    n measurements on T VARIABLES:

    {Y1,Y2, . . . ,YT

    }, Yj

    R

    n, j = 1, . . . ,T

    n time series, each one with T OBSERVATIONS: {y1,y2, . . . ,yn}, yi RT, i = 1, . . . ,n

    xij = yijYj = yij 1n

    n

    i=1

    yij, i = 1, . . . ,n; j = 1, . . . ,T

    X =

    x1

    x2...

    xn

    =

    X1 X2 XT

    =

    x11 x12 x1Tx21 x22 x2T

    ......

    . . ....

    xn1 xn2 xnT

    Principal Component Analysis for time series Seminrio do Grupo de Probabilidades e Estatstica 4 / 24

  • 8/7/2019 apresentacao_CIDMA_2010

    9/58

    Isabel Silva Principal Component Analysis for Time Series

  • 8/7/2019 apresentacao_CIDMA_2010

    10/58

    p p y

    Classic Principal Component Analysis

    Sample variance-covariance matrix (TT) ofX : S = 1n

    XTX

    Diagonalizing S

    1 2 T > 0 ||j||= 1, j = 1, . . . ,T

    jth Principal Component

    Zj = Xj = j1X1 +j2X2 + . . .+jTXT, j = 1, . . . ,T

    Var(Zj) = j, j = 1, . . . ,T

    Proportion of variance due to Zj :j

    1 + +T , j = 1, . . . ,T

    Principal Component Analysis for time series Seminrio do Grupo de Probabilidades e Estatstica 5 / 24

    Isabel Silva Principal Component Analysis for Time Series

  • 8/7/2019 apresentacao_CIDMA_2010

    11/58

    Classic Principal Component Analysis

    Sample variance-covariance matrix (TT) ofX : S = 1n

    XTX

    Diagonalizing S

    1 2 T > 0 ||j||= 1, j = 1, . . . ,T

    jth Principal Component

    Zj = Xj = j1X1 +j2X2 + . . .+jTXT, j = 1, . . . ,T

    Var(Zj) = j, j = 1, . . . ,T

    Proportion of variance due to Zj :j

    1 + +T , j = 1, . . . ,T

    Variables with different scales initial data standardization

    uij =1

    sjj(yij

    Yj)

    Principal Component Analysis for time series Seminrio do Grupo de Probabilidades e Estatstica 5 / 24

    Isabel Silva Principal Component Analysis for Time Series

  • 8/7/2019 apresentacao_CIDMA_2010

    12/58

    Classic Principal Component Analysis

    Sample variance-covariance matrix (TT) ofX : S = 1n

    XTX

    Diagonalizing S

    1 2 T > 0 ||j||= 1, j = 1, . . . ,T

    jth Principal Component

    Zj = Xj = j1X1 +j2X2 + . . .+jTXT, j = 1, . . . ,T

    Var(Zj) = j, j = 1, . . . ,T

    Proportion of variance due to Zj :j

    1 + +T , j = 1, . . . ,T

    Variables with different scales initial data standardization

    PCA uses the Pearsons correlation matrix of original variables

    uij =1

    sjj(yij

    Yj)

    Principal Component Analysis for time series Seminrio do Grupo de Probabilidades e Estatstica 5 / 24

    Isabel Silva Principal Component Analysis for Time Series

  • 8/7/2019 apresentacao_CIDMA_2010

    13/58

    Weighted Principal Component Analysis (WPCA) [Pinto da Costa, Silva and

    Silva, 2009]

    uij =j (yijYj), for i = 1, . . . ,n; j = 1, . . . ,T

    Weights: j, such that j 0,T

    j=1

    j = 1

    Principal Component Analysis for time series Seminrio do Grupo de Probabilidades e Estatstica 6 / 24

    Isabel Silva Principal Component Analysis for Time Series

  • 8/7/2019 apresentacao_CIDMA_2010

    14/58

    Weighted Principal Component Analysis (WPCA) [Pinto da Costa, Silva and

    Silva, 2009]

    uij =j (yijYj), for i = 1, . . . ,n; j = 1, . . . ,T

    Weights: j, such that j 0,T

    j=1

    j = 1

    Weighted matrix of covariances of data

    Principal Component Analysis for time series Seminrio do Grupo de Probabilidades e Estatstica 6 / 24

  • 8/7/2019 apresentacao_CIDMA_2010

    15/58

    Isabel Silva Principal Component Analysis for Time Series

  • 8/7/2019 apresentacao_CIDMA_2010

    16/58

    Dynamic Principal Component Analysis (DPCA) [Brillinger, 2001]

    PCA for stationary time series in the frequency domain

    DPCA approximate a p vector-valued time series Xt by a set ofkuncorrelated

    time series Yt which is the best approximation ofXt in m.s.e. sense.

    PCA at each frequency uncorrelated principal components series

    inferential procedures

    Principal Component Analysis for time series Seminrio do Grupo de Probabilidades e Estatstica 7 / 24

    Isabel Silva Principal Component Analysis for Time Series

  • 8/7/2019 apresentacao_CIDMA_2010

    17/58

    Dynamic Principal Component Analysis (DPCA) [Brillinger, 2001]

    PCA for stationary time series in the frequency domain

    DPCA approximate a p vector-valued time series Xt by a set ofkuncorrelated

    time series Yt which is the best approximation ofXt in m.s.e. sense.

    PCA at each frequency uncorrelated principal components series

    inferential procedures

    k=

    |(k)|

  • 8/7/2019 apresentacao_CIDMA_2010

    18/58

    Dynamic Principal Component Analysis (DPCA) [Brillinger, 2001]

    PCA for stationary time series in the frequency domain

    DPCA approximate a p vector-valued time series Xt by a set ofkuncorrelated

    time series Yt which is the best approximation ofXt in m.s.e. sense.

    PCA at each frequency uncorrelated principal components series

    inferential procedures

    k=

    |(k)|

  • 8/7/2019 apresentacao_CIDMA_2010

    19/58

    Dynamic Principal Component Analysis (DPCA) [Brillinger, 2001]

    DPCA [Shumway and Stoffer, 2000]

    X = [xij](i = 1, . . . ,n, j = 1, . . . ,T) : matrix with n (zero-mean) stationary time series

    f() : sample (TT) spectral density matrix ofX complex-valued, nonnnegativedefinite and Hermitian matrix

    Principal Component Analysis for time series Seminrio do Grupo de Probabilidades e Estatstica 8 / 24

    Isabel Silva Principal Component Analysis for Time Series

  • 8/7/2019 apresentacao_CIDMA_2010

    20/58

    Dynamic Principal Component Analysis (DPCA) [Brillinger, 2001]

    DPCA [Shumway and Stoffer, 2000]

    X = [xij](i = 1, . . . ,n, j = 1, . . . ,T) : matrix with n (zero-mean) stationary time series

    f() : sample (TT) spectral density matrix ofX complex-valued, nonnnegativedefinite and Hermitian matrix

    (1(),e1()), . . . ,(T(),eT()) be (eigenvalue, eigenvector) pairs of f() :

    1() T() 0 ||ej()||= 1, j = 1, . . . ,T

    Principal Component Analysis for time series Seminrio do Grupo de Probabilidades e Estatstica 8 / 24

    Isabel Silva Principal Component Analysis for Time Series

  • 8/7/2019 apresentacao_CIDMA_2010

    21/58

    Dynamic Principal Component Analysis (DPCA) [Brillinger, 2001]

    DPCA [Shumway and Stoffer, 2000]

    X = [xij](i = 1, . . . ,n, j = 1, . . . ,T) : matrix with n (zero-mean) stationary time series

    f() : sample (TT) spectral density matrix ofX complex-valued, nonnnegativedefinite and Hermitian matrix

    (1(),e1()), . . . ,(T(),eT()) be (eigenvalue, eigenvector) pairs of f() :

    1() T() 0 ||ej()||= 1, j = 1, . . . ,T

    jth principal component series at frequency :

    ytj() = ej() X, j = 1, . . . ,TVar(ytj()) = j()

    Principal Component Analysis for time series Seminrio do Grupo de Probabilidades e Estatstica 8 / 24

  • 8/7/2019 apresentacao_CIDMA_2010

    22/58

    Isabel Silva Principal Component Analysis for Time Series

  • 8/7/2019 apresentacao_CIDMA_2010

    23/58

    Singular Spectrum Analysis (SSA) [Golyandina, Nekrutkin and Zhigljavsky, 2001]

    Carry out a PCA on a suitable chosen lagged version of the original time series

    Decompose the original series in a small number of independent and

    interpretable components that can be considered as trend and oscillatory

    components and a structureless noise

    No stationarity assumptions for the time series are needed

    Principal Component Analysis for time series Seminrio do Grupo de Probabilidades e Estatstica 9 / 24

    Isabel Silva Principal Component Analysis for Time Series

  • 8/7/2019 apresentacao_CIDMA_2010

    24/58

    Singular Spectrum Analysis (SSA) [Golyandina, Nekrutkin and Zhigljavsky, 2001]

    Carry out a PCA on a suitable chosen lagged version of the original time series

    Decompose the original series in a small number of independent and

    interpretable components that can be considered as trend and oscillatory

    components and a structureless noise

    No stationarity assumptions for the time series are needed

    Basic SSA

    Decomposition stage

    Embedding

    Singular Value Decomposition (SVD)

    Reconstruction stage

    Grouping

    Diagonal averaging

    Principal Component Analysis for time series Seminrio do Grupo de Probabilidades e Estatstica 9 / 24

    Isabel Silva Principal Component Analysis for Time Series

  • 8/7/2019 apresentacao_CIDMA_2010

    25/58

    Singular Spectrum Analysis (SSA) [Golyandina, Nekrutkin and Zhigljavsky, 2001]

    Embedding

    Time series: y = {y0,y1, . . . ,yn1} L : window length (1 < L < n)

    Principal Component Analysis for time series Seminrio do Grupo de Probabilidades e Estatstica 10 / 24

    Isabel Silva Principal Component Analysis for Time Series

  • 8/7/2019 apresentacao_CIDMA_2010

    26/58

    Singular Spectrum Analysis (SSA) [Golyandina, Nekrutkin and Zhigljavsky, 2001]

    Embedding

    Time series: y = {y0,y1, . . . ,yn1} L : window length (1 < L < n) Trajectory matrix (KL, K = nL + 1)

    X =

    X1 X2 X3 XL =

    y0 y1 y2 yL1y1 y2 y3 yLy

    2y

    3y

    4 y

    L+1......

    .... . .

    ...

    yK yK+1 yK+2 yn1

    Principal Component Analysis for time series Seminrio do Grupo de Probabilidades e Estatstica 10 / 24

    Isabel Silva Principal Component Analysis for Time Series

  • 8/7/2019 apresentacao_CIDMA_2010

    27/58

    Singular Spectrum Analysis (SSA) [Golyandina, Nekrutkin and Zhigljavsky, 2001]

    Embedding

    Time series: y = {y0,y1, . . . ,yn1} L : window length (1 < L < n) Trajectory matrix (KL, K = nL + 1)

    X =

    X1 X2 X3 XL =

    y0 y1 y2 yL1y1 y2 y3 yLy

    2y

    3y

    4 y

    L+1......

    .... . .

    ...

    yK yK+1 yK+2 yn1

    SVD

    S = XTX eigenvalues: 1 2 L and eigenvectors: U1,U2, . . . ,UL

    Principal Component Analysis for time series Seminrio do Grupo de Probabilidades e Estatstica 10 / 24

    Isabel Silva Principal Component Analysis for Time Series

  • 8/7/2019 apresentacao_CIDMA_2010

    28/58

    Singular Spectrum Analysis (SSA) [Golyandina, Nekrutkin and Zhigljavsky, 2001]

    Embedding

    Time series: y = {y0,y1, . . . ,yn1} L : window length (1 < L < n) Trajectory matrix (KL, K = nL + 1)

    X =

    X1 X2 X3 XL =

    y0 y1 y2 yL1y1 y2 y3 yLy2 y3 y4

    yL

    +1

    ......

    .... . .

    ...

    yK yK+1 yK+2 yn1

    SVD

    S = XTX eigenvalues: 1 2 L and eigenvectors: U1,U2, . . . ,ULd= rank(X) = max{i : i > 0} L Vi = X Ui/

    i, i = 1, . . . ,d

    Principal Component Analysis for time series Seminrio do Grupo de Probabilidades e Estatstica 10 / 24

    Isabel Silva Principal Component Analysis for Time Series

  • 8/7/2019 apresentacao_CIDMA_2010

    29/58

    Singular Spectrum Analysis (SSA) [Golyandina, Nekrutkin and Zhigljavsky, 2001]

    Embedding

    Time series: y = {y0,y1, . . . ,yn1} L : window length (1 < L < n) Trajectory matrix (KL, K = nL + 1)

    X =

    X1 X2 X3 XL =

    y0 y1 y2 yL1y1 y2 y3 yLy2 y3 y4

    yL+1

    ......

    .... . .

    ...

    yK yK+1 yK+2 yn1

    SVD

    S = XTX eigenvalues: 1 2 L and eigenvectors: U1,U2, . . . ,ULd= rank(X) = max{i : i > 0} L Vi = X Ui/

    i, i = 1, . . . ,d

    X = X1 + X2 +

    + Xd, Xi =

    i Vi Ui

    T, (i,Ui,Vi) : eigentriples

    Principal Component Analysis for time series Seminrio do Grupo de Probabilidades e Estatstica 10 / 24

    Isabel Silva Principal Component Analysis for Time Series

  • 8/7/2019 apresentacao_CIDMA_2010

    30/58

    Singular Spectrum Analysis (SSA) [Golyandina, Nekrutkin and Zhigljavsky, 2001]

    Grouping

    M: number of PC Partition of{1, . . . ,d} into M disjoint subsets I1, . . . , IM,where Ik = {ik1 , . . . , ikp}

    Construct the corresponding resultant matrix XIk = Xik1+ + Xikp

    Principal Component Analysis for time series Seminrio do Grupo de Probabilidades e Estatstica 11 / 24

    Isabel Silva Principal Component Analysis for Time Series

    Si S A i (SSA)

  • 8/7/2019 apresentacao_CIDMA_2010

    31/58

    Singular Spectrum Analysis (SSA) [Golyandina, Nekrutkin and Zhigljavsky, 2001]

    Grouping

    M: number of PC Partition of{1, . . . ,d} into M disjoint subsets I1, . . . , IM,where Ik = {ik1 , . . . , ikp}

    Construct the corresponding resultant matrix XIk = Xik1+ + Xikp

    X

    XI1

    +

    + XIM

    The contribution of the component XIk :iIki

    di=1 i

    Principal Component Analysis for time series Seminrio do Grupo de Probabilidades e Estatstica 11 / 24

    Isabel Silva Principal Component Analysis for Time Series

    Si l S t A l i (SSA)

  • 8/7/2019 apresentacao_CIDMA_2010

    32/58

    Singular Spectrum Analysis (SSA) [Golyandina, Nekrutkin and Zhigljavsky, 2001]

    Grouping

    M: number of PC Partition of{1, . . . ,d} into M disjoint subsets I1, . . . , IM,where Ik = {ik1 , . . . , ikp}

    Construct the corresponding resultant matrix XIk = Xik1+ + Xikp

    X

    XI1

    +

    + XIM

    The contribution of the component XIk :iIki

    di=1 i

    Depend on the objective of the studyInspection of the singular values (i) and vectors (Ui,Vi)

    To use supplementary information for the parameter choice [Hassani, 2007]:

    Periodicity on dataset, periodogram analysis, pairwise scatterplots of singular

    vectors, . . .

    Principal Component Analysis for time series Seminrio do Grupo de Probabilidades e Estatstica 11 / 24

    Isabel Silva Principal Component Analysis for Time Series

    Si l S t A l i (SSA)

  • 8/7/2019 apresentacao_CIDMA_2010

    33/58

    Singular Spectrum Analysis (SSA) [Golyandina, Nekrutkin and Zhigljavsky, 2001]

    Diagonal Averaging

    Transform XIk =

    xij(k)L,K

    i,j=1 ,k= 1, . . . ,M, into a new series XIk = {y(k)0 , . . . , y(k)n1},

    y

    (k)t is obtained by averaging xij

    (k) over all i, j : i + j = t+ 2, t= 0, . . .n1

    Principal Component Analysis for time series Seminrio do Grupo de Probabilidades e Estatstica 12 / 24

    Isabel Silva Principal Component Analysis for Time Series

    Si l S t A l i (SSA)

  • 8/7/2019 apresentacao_CIDMA_2010

    34/58

    Singular Spectrum Analysis (SSA) [Golyandina, Nekrutkin and Zhigljavsky, 2001]

    Diagonal Averaging

    Transform XIk =

    xij(k)L,K

    i,j=1 ,k= 1, . . . ,M, into a new series XIk = {y(

    k)0 , . . . , y(

    k)n1},

    y

    (k)t is obtained by averaging xij

    (k) over all i, j : i + j = t+ 2, t= 0, . . .n1

    L = min{L,K}; K = max{L,K}; xij(k)

    = xij(k)

    ifL < K; xij(k)

    = xji(k)

    ifL K

    y(k)t =

    1

    t+ 1t+1

    p=1xp,tp+2

    (k), if 0 t< L11

    L

    Lp=1

    xp,tp+2(k), ifL1 t< K

    1n t

    nK+1p=tK+2 x

    p,tp+2

    (k), ifK t< n

    Principal Component Analysis for time series Seminrio do Grupo de Probabilidades e Estatstica 12 / 24

    Isabel Silva Principal Component Analysis for Time Series

    Singular Spectrum Analysis (SSA)

  • 8/7/2019 apresentacao_CIDMA_2010

    35/58

    Singular Spectrum Analysis (SSA) [Golyandina, Nekrutkin and Zhigljavsky, 2001]

    Diagonal Averaging

    Transform XIk =

    xij(k)L,K

    i,j=1 ,k= 1, . . . ,M, into a new series XIk = {y(

    k)0 , . . . , y(

    k)n1},

    y

    (k)t is obtained by averaging xij

    (k) over all i, j : i + j = t+ 2, t= 0, . . .n1

    L = min{L,K}; K = max{L,K}; xij(k)

    = xij(k)

    ifL < K; xij(k)

    = xji(k)

    ifL K

    y(k)t =

    1

    t+ 1t+1

    p=1xp,tp+2

    (k), if 0 t< L11

    L

    Lp=1

    xp,tp+2(k), ifL1 t< K

    1n t

    nK+1p=tK+2 x

    p,tp+2

    (k), ifK t< n

    y = XI1 + + XIM yt =M

    k=1

    y(k)t , t= 0, . . . ,n1

    Principal Component Analysis for time series Seminrio do Grupo de Probabilidades e Estatstica 12 / 24

    Isabel Silva Principal Component Analysis for Time Series

    Singular Spectrum Analysis (SSA) l di k ki d hi lj k

  • 8/7/2019 apresentacao_CIDMA_2010

    36/58

    Singular Spectrum Analysis (SSA) [Golyandina, Nekrutkin and Zhigljavsky, 2001]

    Multichannel SSA [Golyandina and Stepanov, 2005]

    Extension of SSA to p time series of length n :

    {y1, . . . ,yp} where yi = {yi,0,yi,1, . . . ,yi,n1}, i = 1, . . . ,p

    Principal Component Analysis for time series Seminrio do Grupo de Probabilidades e Estatstica 13 / 24

    Isabel Silva Principal Component Analysis for Time Series

    Singular Spectrum Analysis (SSA) [G l di N k tki d Zhi lj k 2001]

  • 8/7/2019 apresentacao_CIDMA_2010

    37/58

    Singular Spectrum Analysis (SSA) [Golyandina, Nekrutkin and Zhigljavsky, 2001]

    Multichannel SSA [Golyandina and Stepanov, 2005]

    Extension of SSA to p time series of length n :

    {y1, . . . ,yp} where yi = {yi,0,yi,1, . . . ,yi,n1}, i = 1, . . . ,p

    Apply SSA to a large trajectory matrix (KLp)

    X =

    y1,0 y1,L1 y2,0 y2,L1 yp,0 yp,L1

    y1,1 y1,L y2,1 y2,L yp,1 yp,L... . . ....

    .... . .

    ......

    .... . .

    ...

    y1,K y1,n1 y2,K y2,n1 yp,K yp,n1

    Principal Component Analysis for time series Seminrio do Grupo de Probabilidades e Estatstica 13 / 24

    Isabel Silva Principal Component Analysis for Time Series

    Ill stration

  • 8/7/2019 apresentacao_CIDMA_2010

    38/58

    Illustration

    Practical problems

    Choice of the dimension L

    L

    n/2 or depending of the periodicity of data

    Selection ofM and the way of grouping the indices

    Illustration Seminrio do Grupo de Probabilidades e Estatstica 14 / 24

    Isabel Silva Principal Component Analysis for Time Series

    Illustration

  • 8/7/2019 apresentacao_CIDMA_2010

    39/58

    Illustration

    Practical problems

    Choice of the dimension L

    L

    n/2 or depending of the periodicity of data

    Selection ofM and the way of grouping the indices

    Rodrigues and de Carvalho (2008): carefully choice ofL and M they cancompromise the analysis results

    Illustration Seminrio do Grupo de Probabilidades e Estatstica 14 / 24

    Isabel Silva Principal Component Analysis for Time Series

    Illustration

  • 8/7/2019 apresentacao_CIDMA_2010

    40/58

    Illustration

    Practical problems

    Choice of the dimension L

    L

    n/2 or depending of the periodicity of data

    Selection ofM and the way of grouping the indices

    Rodrigues and de Carvalho (2008): carefully choice ofL and M they cancompromise the analysis results

    Software: SSA - Matlab Tools for

    SSA (Eric Breitenberger) and ssa.m

    (Francisco Alonso)

    Dataset: Monthly average number of

    occupied hotel rooms, from 1963 to

    1976 (Source: Time Series Data Library,

    http://robjhyndman.com/TSDL//) Jan1963 Dec1976400

    500

    600

    700

    800

    900

    1000

    1100

    1200

    month

    numberofoccu

    piedrooms

    Illustration Seminrio do Grupo de Probabilidades e Estatstica 14 / 24

    Isabel Silva Principal Component Analysis for Time Series

    Illustration

  • 8/7/2019 apresentacao_CIDMA_2010

    41/58

    Illustration

    Example (L = 4,K = 8

    4 + 1 = 5,M= 1)

    Illustration Seminrio do Grupo de Probabilidades e Estatstica 15 / 24

    Isabel Silva Principal Component Analysis for Time Series

    Illustration

  • 8/7/2019 apresentacao_CIDMA_2010

    42/58

    Illustration

    Example (L = 4,K = 8

    4 + 1 = 5,M= 1)

    y =

    501 488 504 578 545 632 728 725

    ,

    Illustration Seminrio do Grupo de Probabilidades e Estatstica 15 / 24

    Isabel Silva Principal Component Analysis for Time Series

    Illustration

  • 8/7/2019 apresentacao_CIDMA_2010

    43/58

    Illustration

    Example (L = 4,K = 8

    4 + 1 = 5,M= 1)

    y =

    501 488 504 578 545 632 728 725

    ,

    X =

    501 488 504 578

    488 504 578 545

    504 578 545 632

    578 545 632 728

    545 632 728 725

    ,

    Illustration Seminrio do Grupo de Probabilidades e Estatstica 15 / 24

    Isabel Silva Principal Component Analysis for Time Series

    Illustration

  • 8/7/2019 apresentacao_CIDMA_2010

    44/58

    Illustration

    Example (L = 4,K = 8

    4 + 1 = 5,M= 1)

    y =

    501 488 504 578 545 632 728 725

    ,

    X =

    501 488 504 578

    488 504 578 545

    504 578 545 632

    578 545 632 728

    545 632 728 725

    , X1 = XI1 =

    466.1 490.7 535.5 574.7

    475.8 501.0 546.7 586.7

    508.8 535.7 584.6 627.3

    560.8 590.5 644.4 691.5

    594.2 625.6 682.8 732.7

    ,

    Illustration Seminrio do Grupo de Probabilidades e Estatstica 15 / 24

    Isabel Silva Principal Component Analysis for Time Series

    Illustration

  • 8/7/2019 apresentacao_CIDMA_2010

    45/58

    Illustration

    Example (L = 4,K = 8

    4 + 1 = 5,M= 1)

    y =

    501 488 504 578 545 632 728 725

    ,

    X =

    501 488 504 578

    488 504 578 545

    504 578 545 632

    578 545 632 728

    545 632 728 725

    , X1 = XI1 =

    466.1 490.7 535.5 574.7

    475.8 501.0 546.7 586.7

    508.8 535.7 584.6 627.3

    560.8 590.5 644.4 691.5

    594.2 625.6 682.8 732.7

    ,

    The contribution of the component XI1 : 99.75%

    XI1 =

    466.1 483.6 515.1 554.5 589.0 632.5 687.1 732.7

    Illustration Seminrio do Grupo de Probabilidades e Estatstica 15 / 24

    Isabel Silva Principal Component Analysis for Time Series

    Illustration

  • 8/7/2019 apresentacao_CIDMA_2010

    46/58

    Illustration

    1 2 3 4 5 6 7 8400

    500

    600

    700

    800

    1 2 3 4 5 6 7 850

    0

    50

    residual=yy_reconstructed

    yy_reconstructed

    Illustration Seminrio do Grupo de Probabilidades e Estatstica 16 / 24

    Isabel Silva Principal Component Analysis for Time Series

    Illustration

  • 8/7/2019 apresentacao_CIDMA_2010

    47/58

    Principal Components of the monthly number of occupied rooms (L = 12)

    0 50 100 1501000

    500

    0

    500

    1000

    0 50 100 150400

    200

    0

    200

    400

    0 50 100 150400

    200

    0

    200

    400

    0 50 100 150300

    200

    100

    0

    100

    200

    0 50 100 150200

    100

    0

    100

    200

    0 50 100 150150

    100

    50

    0

    50

    100

    0 50 100 150200

    100

    0

    100

    200

    0 50 100 150200

    100

    0

    100

    200

    0 50 100 150150

    100

    50

    0

    50

    100

    0 50 100 150100

    50

    0

    50

    100

    150

    0 50 100 150100

    50

    0

    50

    100

    0 50 100 150100

    50

    0

    50

    100

    Illustration Seminrio do Grupo de Probabilidades e Estatstica 17 / 24

    Isabel Silva Principal Component Analysis for Time Series

    Illustration

  • 8/7/2019 apresentacao_CIDMA_2010

    48/58

    Normalized singular values of the monthly number of occupied rooms

    Ifn,L and K are sufficiently large, each harmonic produces two eigentriples withclose singular values

    0 2 4 6 8 10 120

    10

    20

    30

    40

    50

    60

    70

    80

    90

    100

    i

    normalizedi

    Illustration Seminrio do Grupo de Probabilidades e Estatstica 18 / 24

    Isabel Silva Principal Component Analysis for Time Series

    Illustration

  • 8/7/2019 apresentacao_CIDMA_2010

    49/58

    Normalized singular values of the monthly number of occupied rooms

    Ifn,L and K are sufficiently large, each harmonic produces two eigentriples withclose singular values

    0 2 4 6 8 10 120

    10

    20

    30

    40

    50

    60

    70

    80

    90

    100

    i

    normalizedi

    2 4 6 8 10 120

    0.5

    1

    1.5

    2

    2.5

    3

    i

    normalizedi

    Illustration Seminrio do Grupo de Probabilidades e Estatstica 18 / 24

    Isabel Silva Principal Component Analysis for Time Series

    Illustration

  • 8/7/2019 apresentacao_CIDMA_2010

    50/58

    The contribution of the components XI1 : 97.96%, XI2_3 : 1.42%, XI4_5 : 0,32%

    20 40 60 80 100 120 140 160400

    600

    800

    1000

    1200

    20 40 60 80 100 120 140 160500

    0

    500

    1000

    1500

    20 40 60 80 100 120 140 160500

    0

    500

    1000

    1500

    y

    y_rec_PC1

    yy_rec_PC_2_3

    y

    y_rec_PC_4_5

    Illustration Seminrio do Grupo de Probabilidades e Estatstica 19 / 24

    Isabel Silva Principal Component Analysis for Time Series

    Illustration

  • 8/7/2019 apresentacao_CIDMA_2010

    51/58

    The contribution of the component XI1_5 : 99.70%

    20 40 60 80 100 120 140 160400

    500

    600

    700

    800

    900

    1000

    1100

    1200

    20 40 60 80 100 120 140 160100

    50

    0

    50

    100

    y

    y_rec_PC_1_to_5

    residuals

    Illustration Seminrio do Grupo de Probabilidades e Estatstica 20 / 24

    Isabel Silva Principal Component Analysis for Time Series

    Illustration

  • 8/7/2019 apresentacao_CIDMA_2010

    52/58

    Choice ofL

    Contribution of

    L PC1 PC2 PC3 % var. PC1-PC3

    12 97.96 0.71 0.71 99.38

    Illustration Seminrio do Grupo de Probabilidades e Estatstica 21 / 24

  • 8/7/2019 apresentacao_CIDMA_2010

    53/58

    Isabel Silva Principal Component Analysis for Time Series

    Illustration

  • 8/7/2019 apresentacao_CIDMA_2010

    54/58

    Choice ofL

    Contribution of

    L PC1 PC2 PC3 % var. PC1-PC3

    12 97.96 0.71 0.71 99.3824 97.96 0.71 0.71 99.38

    36 97.95 0.72 0.71 99.38

    80 97.92 0.74 0.72 99.38

    Illustration Seminrio do Grupo de Probabilidades e Estatstica 21 / 24

    Isabel Silva Principal Component Analysis for Time Series

    Illustration

  • 8/7/2019 apresentacao_CIDMA_2010

    55/58

    Choice ofL

    Contribution of

    L PC1 PC2 PC3 % var. PC1-PC3

    12 97.96 0.71 0.71 99.3824 97.96 0.71 0.71 99.38

    36 97.95 0.72 0.71 99.38

    80 97.92 0.74 0.72 99.38

    6 98.55 0.86 0.33 99.73

    Illustration Seminrio do Grupo de Probabilidades e Estatstica 21 / 24

    Isabel Silva Principal Component Analysis for Time Series

    Illustration

  • 8/7/2019 apresentacao_CIDMA_2010

    56/58

    Principal Components of the monthly number of occupied rooms (L = 6)

    0 50 100 1501200

    1400

    1600

    1800

    2000

    2200

    2400

    0 50 100 150400

    300

    200

    100

    0

    100

    200

    300

    400

    0 50 100 150200

    100

    0

    100

    200

    300

    0 50 100 150200

    150

    100

    50

    0

    50

    100

    150

    0 50 100 150150

    100

    50

    0

    50

    100

    150

    0 50 100 150150

    100

    50

    0

    50

    100

    Illustration Seminrio do Grupo de Probabilidades e Estatstica 22 / 24

  • 8/7/2019 apresentacao_CIDMA_2010

    57/58

  • 8/7/2019 apresentacao_CIDMA_2010

    58/58