--- title: "A-short-introduction-to-tensorMiss" output: rmarkdown::html_vignette bibliography: ref.bib vignette: > %\VignetteIndexEntry{A-short-introduction-to-tensorMiss} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r, warning=F, message=FALSE, eval=TRUE} library(tensorMiss) ``` ## 1 Quick start: unfolding, refolding, and k-mode matrix product We go through some basic functions related to tensor in 'tensorMiss' in this vignette. First, we start with constructing a tensor, which is by using the base class 'array' in R. The following tensor is an order-$3$ tensor with dimensions $3\times 4\times 2$. ```{r} example <- array(1:24, dim=c(3,4,2)) ``` As we can see, subsetting and truncating in multi-dimensional arrays are trivial, and we display them below by an example to inject missingness inside the tensor. ```{r} example[3,1,1] <- NA print(example) ``` We now quickly go through the basic unfolding and refolding functions as an example. ```{r} example.1 <- unfold(example, 1) print(example.1) ``` Without doubt, the refolding of an unfolding matrix returns back to the original tensor, given the correct dimension. ```{r} refold(example.1, 1, dim(example))==example ``` Lastly, k-mode matrix product is performed in the following. See @Kolda_Bader for more details on tensor data. ```{r} ttm(example, matrix(1:4,nrow=2), 3) ``` ## 2 Missing value imputation for tensor factor models A factor-based imputation method is proposed by @Cen_Lam on tensor time series. The 'tensor_gen' function initialises a zero-mean mode-$K$ tensor time series with factor structure, so that at each time $t\in\{1,2,\dots,T\}$, $$ \mathcal{Y}_t = \mathcal{F}_t \times_1 \mathbf{A}_1 \times_2 \mathbf{A}_2 \times_3 \cdots \times_K \mathbf{A}_K + \mathcal{E}_t , $$ where $\mathcal{Y}_t\in\mathbb{R}^{d_1\times d_2\times\cdots\times d_K}$ is the generated order-$K$ tensor data, $\mathcal{F}_t\in\mathbb{R}^{r_1\times r_2\times\cdots\times r_K}$ is the (possibly weak) core factor series, each $\mathbf{A}_k\in\mathbb{R}^{d_k\times r_k}$ for $k\in\{1,\dots,K\}$ is the mode-$k$ factor loading matrix, and $\mathcal{E}_t$ is the error series with the same dimension as $\mathcal{Y}_t$. Weak cross-sectional and serial correlations are allowed in the fibres of the error series. See @Cen_Lam for the details.\ \ The purpose of the imputation given a tensor time series with missingness is to estimate/impute the missing entries. First, the data can be initialised by the 'tensor_gen' function. For reproducibility, a seed parameter is required, which is 2023 by default. ```{r} K <- 3 #order 3 at each time TT <- 20 #time length d <- c(30,30,30) #spatial dimensions r <- c(2,2,2) #rank of core tensor re <- c(2,2,2) #rank of common component in error eta <- list(c(0,0), c(0,0), c(0,0)) #strong factors coef_f <- c(0.7, 0.3, -0.4, 0.2, -0.1) #AR(5) coefficients for core factor coef_fe <- c(-0.7, -0.3, -0.4, 0.2, 0.1) #AR(5) coefficients for common component in error coef_e <- c(0.8, 0.4, -0.4, 0.2, -0.1) #AR(5) coefficients for idiosyncratic part in error data_test <- tensor_gen(K,TT,d,r,re,eta, coef_f, coef_fe, coef_e) ``` Missing entries are represented by 'NA's in the data. For example, each entry is randomly missing with probability 0.3 using the 'miss_gen' function. More missing patterns are available with the function. ```{r} data_miss <- miss_gen(data_test$X) ``` With the 'miss_factor_est' function, the factor structure can be estimated in one go. The number of factors could be either provided or not provided, in the latter case the function estimates the number of factors using the eigenvalue-ratio-based estimator. For the details of estimation, see @Cen_Lam.\ \ As an example, the factor loading error measured by column space distance are computed using the 'fle' function. ```{r} est_result <- miss_factor_est(data_miss, r) fle(est_result$A[[1]], data_test$A[[1]]) ``` Lastly, we can gauge the imputation performance using relative MSE (rMSE) or even the introduced q-MSE. Setting q as the length of the input vector would essentially output the relative MSE. Examples of rMSE and q-MSE with q as 100 are demonstrated. ```{r} qMSE(c(data_test$C), c(est_result$imputation), length(c(data_test$C))) #rMSE qMSE(c(data_test$C), c(est_result$imputation), 100) #q-MSE ``` ## 3 Asymptotic normality of estimated loading matrix rows Under certain conditions [@Cen_Lam], the residue between the row of the estimated loading matrix and its corresponding true row under some rotation can be shown to be asymptotically normal. A consistent covariance matrix estimator can be computed by the 'sigmaD' function. For instance, to compute the residue on the first row of the estimated mode-2 loading matrix, the covariance matrix estimator is obtained below. The rotation matrix is also computed afterwards. ```{r} # computing the covariance matrix estimator r2 <- r[2] A2 <- data_test$A[[2]] beta <- floor(0.2*(TT^0.25 * (d[2])^0.25)) D2 <- diag(x=(svd(est_result$covMatrix[[2]])$d)[1:r2], nrow=r2, ncol=r2) # HAC_cov: HAC-type covariance matrix estimator HAC_cov <- sigmaD(2, D2, est_result$A[[2]], est_result$imputation, data_miss, 1, beta) # computing the rotation matrix Amk <- data_test$A[[3]] %x% data_test$A[[1]] R_ast <- 0 for (t in 1:TT){ R_ast <- R_ast + unfold(data_test$Ft[t,,,],2) %*% t(Amk) %*% Amk %*% t(unfold(data_test$Ft[t,,,],2)) } R_ast <- A2 %*% R_ast %*% t(A2) R_ast <- R_ast/TT Z2 <- diag(x = diag(t(A2) %*% A2), nrow=r2, ncol=r2) Q2 <- A2 %*% diag(x=diag(solve(Z2))^0.5, nrow=r2, ncol=r2) # H2: rotation matrix H2 <- solve(D2) %*% t(est_result$A[[2]]) %*% R_ast %*% Q2 %*% solve(t(Q2)%*% Q2) ``` Eventually, the standardised residue is shown below and should follow a standard normal distribution when dimensions are increased. ```{r} HAC_cov.eigen <- eigen(HAC_cov) HAC_cov.sqrt <- HAC_cov.eigen$vectors %*% diag(sqrt(HAC_cov.eigen$values)) %*% solve(HAC_cov.eigen$vectors) A2_1 <- (solve(HAC_cov.sqrt) %*% D2) %*% (matrix(est_result$A[[2]], nrow=d[2], ncol=r2)[1,] - (H2 %*% Q2[1,])) A2_1 ``` ## References