We go through some basic functions related to tensor in ‘tensorMiss’ in this vignette. First, we start with constructing a tensor, which is by using the base class ‘array’ in R. The following tensor is an order-3 tensor with dimensions 3 × 4 × 2.
As we can see, subsetting and truncating in multi-dimensional arrays are trivial, and we display them below by an example to inject missingness inside the tensor.
example[3,1,1] <- NA
print(example)
#> , , 1
#>
#> [,1] [,2] [,3] [,4]
#> [1,] 1 4 7 10
#> [2,] 2 5 8 11
#> [3,] NA 6 9 12
#>
#> , , 2
#>
#> [,1] [,2] [,3] [,4]
#> [1,] 13 16 19 22
#> [2,] 14 17 20 23
#> [3,] 15 18 21 24
We now quickly go through the basic unfolding and refolding functions as an example.
example.1 <- unfold(example, 1)
print(example.1)
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,] 1 4 7 10 13 16 19 22
#> [2,] 2 5 8 11 14 17 20 23
#> [3,] NA 6 9 12 15 18 21 24
Without doubt, the refolding of an unfolding matrix returns back to the original tensor, given the correct dimension.
refold(example.1, 1, dim(example))==example
#> , , 1
#>
#> [,1] [,2] [,3] [,4]
#> [1,] TRUE TRUE TRUE TRUE
#> [2,] TRUE TRUE TRUE TRUE
#> [3,] NA TRUE TRUE TRUE
#>
#> , , 2
#>
#> [,1] [,2] [,3] [,4]
#> [1,] TRUE TRUE TRUE TRUE
#> [2,] TRUE TRUE TRUE TRUE
#> [3,] TRUE TRUE TRUE TRUE
Lastly, k-mode matrix product is performed in the following. See Kolda and Bader (2009) for more details on tensor data.
A factor-based imputation method is proposed by Cen and Lam (2024) on tensor time series. The ‘tensor_gen’ function initialises a zero-mean mode-K tensor time series with factor structure, so that at each time t ∈ {1, 2, …, T},
𝒴t = ℱt×1A1×2A2×3⋯×KAK + ℰt,
where 𝒴t ∈ ℝd1 × d2 × ⋯ × dK
is the generated order-K
tensor data, ℱt ∈ ℝr1 × r2 × ⋯ × rK
is the (possibly weak) core factor series, each Ak ∈ ℝdk × rk
for k ∈ {1, …, K} is
the mode-k factor loading
matrix, and ℰt is
the error series with the same dimension as 𝒴t. Weak cross-sectional
and serial correlations are allowed in the fibres of the error series.
See Cen and Lam (2024) for the
details.
The purpose of the imputation given a tensor time series with
missingness is to estimate/impute the missing entries. First, the data
can be initialised by the ‘tensor_gen’ function. For reproducibility, a
seed parameter is required, which is 2023 by default.
K <- 3 #order 3 at each time
TT <- 20 #time length
d <- c(30,30,30) #spatial dimensions
r <- c(2,2,2) #rank of core tensor
re <- c(2,2,2) #rank of common component in error
eta <- list(c(0,0), c(0,0), c(0,0)) #strong factors
coef_f <- c(0.7, 0.3, -0.4, 0.2, -0.1) #AR(5) coefficients for core factor
coef_fe <- c(-0.7, -0.3, -0.4, 0.2, 0.1) #AR(5) coefficients for common component in error
coef_e <- c(0.8, 0.4, -0.4, 0.2, -0.1) #AR(5) coefficients for idiosyncratic part in error
data_test <- tensor_gen(K,TT,d,r,re,eta, coef_f, coef_fe, coef_e)
Missing entries are represented by ‘NA’s in the data. For example, each entry is randomly missing with probability 0.3 using the ’miss_gen’ function. More missing patterns are available with the function.
With the ‘miss_factor_est’ function, the factor structure can be
estimated in one go. The number of factors could be either provided or
not provided, in the latter case the function estimates the number of
factors using the eigenvalue-ratio-based estimator. For the details of
estimation, see Cen and Lam (2024).
As an example, the factor loading error measured by column space
distance are computed using the ‘fle’ function.
est_result <- miss_factor_est(data_miss, r)
fle(est_result$A[[1]], data_test$A[[1]])
#> [1] 0.01637416
Lastly, we can gauge the imputation performance using relative MSE (rMSE) or even the introduced q-MSE. Setting q as the length of the input vector would essentially output the relative MSE. Examples of rMSE and q-MSE with q as 100 are demonstrated.
Under certain conditions (Cen and Lam 2024), the residue between the row of the estimated loading matrix and its corresponding true row under some rotation can be shown to be asymptotically normal. A consistent covariance matrix estimator can be computed by the ‘sigmaD’ function. For instance, to compute the residue on the first row of the estimated mode-2 loading matrix, the covariance matrix estimator is obtained below. The rotation matrix is also computed afterwards.
# computing the covariance matrix estimator
r2 <- r[2]
A2 <- data_test$A[[2]]
beta <- floor(0.2*(TT^0.25 * (d[2])^0.25))
D2 <- diag(x=(svd(est_result$covMatrix[[2]])$d)[1:r2], nrow=r2, ncol=r2)
# HAC_cov: HAC-type covariance matrix estimator
HAC_cov <- sigmaD(2, D2, est_result$A[[2]], est_result$imputation, data_miss, 1, beta)
# computing the rotation matrix
Amk <- data_test$A[[3]] %x% data_test$A[[1]]
R_ast <- 0
for (t in 1:TT){
R_ast <- R_ast + unfold(data_test$Ft[t,,,],2) %*% t(Amk) %*% Amk %*% t(unfold(data_test$Ft[t,,,],2))
}
R_ast <- A2 %*% R_ast %*% t(A2)
R_ast <- R_ast/TT
Z2 <- diag(x = diag(t(A2) %*% A2), nrow=r2, ncol=r2)
Q2 <- A2 %*% diag(x=diag(solve(Z2))^0.5, nrow=r2, ncol=r2)
# H2: rotation matrix
H2 <- solve(D2) %*% t(est_result$A[[2]]) %*% R_ast %*% Q2 %*% solve(t(Q2)%*% Q2)
Eventually, the standardised residue is shown below and should follow a standard normal distribution when dimensions are increased.
HAC_cov.eigen <- eigen(HAC_cov)
HAC_cov.sqrt <- HAC_cov.eigen$vectors %*% diag(sqrt(HAC_cov.eigen$values)) %*% solve(HAC_cov.eigen$vectors)
A2_1 <- (solve(HAC_cov.sqrt) %*% D2) %*% (matrix(est_result$A[[2]], nrow=d[2], ncol=r2)[1,] - (H2 %*% Q2[1,]))
A2_1
#> [,1]
#> [1,] -0.5738481
#> [2,] -4.6558420