A-short-introduction-to-KOFM

library(KOFM)

1. A testing problem in tensor factor models

A tensor time series is a series of tensor-valued (Kolda and Bader 2009) observations at each timestamp. The focus of ‘KOFM’ is to test if a given tensor time series 𝒴t for t = 1, 2, …, T has a Kronecker product structure, i.e., $$ H_0: \{\mathcal{Y}_t\} \text{ has a Kronecker product structure};\\ H_1: \{\mathcal{Y}_t\} \text{ has no Kronecker product structure along a set of indices } \mathcal{A}. $$

As a simple non-trivial example, consider an order-2 tensor time series Yt for t = 1, 2, …, T, i.e., a matrix time series, with the set 𝒜 = {1, 2}. The above hypothesis test is equivalent to $$ H_0: \{\mathbf{Y}_t\} \text{ follows a matrix factor model such that } \mathbf{Y}_t= \mathbf{A}_1 \mathbf{F}_t \mathbf{A}_2 + \mathbf{E}_t ; \\ H_1: \text{Only the vectorised data } \{\textbf{vec}(\mathbf{Y}_t)\} \text{ has a factor structure such that } \textbf{vec}(\mathbf{Y}_t) = \mathbf{A}_V \mathbf{f}_t + \mathbf{e}_t . $$

Under both H0 and H1, the reshaped data follows a Tucker-decomposition tensor factor model. Hence we quickly demonstrate the tensor reshape operator, with the ‘ten_reshape’ and ‘ten_reshape_back’ functions, after which we will showcase how the test works using the ‘testKron_A’ and ‘testKron_auto’ functions.

2. Tensor reshape

Let us initialise a simple order-3 tensor as follows,

simple_tensor <- array(1:24, dim=c(3,4,2))

which can be reshaped along its modes 1 and 2, achieved by ‘ten_reshape’. As we want to reshape the entire tensor, rather than a tensor time series with the first mode being the time mode, we set ‘time.mode=FALSE’ below; notice the order of mode indices in ‘AA’ does not matter, as the indices are in ascending order by convention.

cat('Reshape along modes 1 and 2:\n')
#> Reshape along modes 1 and 2:
ten_reshape(ten=simple_tensor, AA=c(1,2), time.mode=FALSE)
#>      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
#> [1,]    1    2    3    4    5    6    7    8    9    10    11    12
#> [2,]   13   14   15   16   17   18   19   20   21    22    23    24
cat('\nThe same results:\n')
#> 
#> The same results:
ten_reshape(ten=simple_tensor, AA=c(2,1), time.mode=FALSE)
#>      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
#> [1,]    1    2    3    4    5    6    7    8    9    10    11    12
#> [2,]   13   14   15   16   17   18   19   20   21    22    23    24

Certainly we can also reshape our ‘simple_tensor’ along modes 2 and 3, or even along all modes 1, 2 and 3.

cat('Reshape along modes 2 and 3:\n')
#> Reshape along modes 2 and 3:
simple_tensor_23 <- ten_reshape(ten=simple_tensor, AA=c(2,3), time.mode=FALSE)
simple_tensor_23
#>      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,]    1    4    7   10   13   16   19   22
#> [2,]    2    5    8   11   14   17   20   23
#> [3,]    3    6    9   12   15   18   21   24
cat('\nReshape along modes 1, 2 and 3:\n')
#> 
#> Reshape along modes 1, 2 and 3:
ten_reshape(ten=simple_tensor, AA=c(1,2,3), time.mode=FALSE)
#>  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

If we read ‘simple_tensor’ as a time series with its mode-1 acting as the time mode, the series is an order-2 time series with t = 1, 2, 3. In this case, we may only reshape along modes 1 and 2, obtaining a vector time series

cat('With the time series "simple_tensor", reshape along modes 1 and 2:\n')
#> With the time series "simple_tensor", reshape along modes 1 and 2:
ten_reshape(ten=simple_tensor, AA=c(1,2), time.mode=TRUE)
#>      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,]    1    4    7   10   13   16   19   22
#> [2,]    2    5    8   11   14   17   20   23
#> [3,]    3    6    9   12   15   18   21   24


Given a reshaped tensor (time series), we may reshape back and recover the original tensor (time series) given its original dimensions and the set of modes along which it reshapes. This is implemented by the ‘ten_reshape_back’ function. For example, using ‘simple_tensor_23’ which is the tensor resulted from reshaping our ‘simple_reshape’ along modes 2 and 3, we have

cat('Recover "simple_tensor" by reshaping back "simple_tensor_23":\n')
#> Recover "simple_tensor" by reshaping back "simple_tensor_23":
simple_tensor_recover <- ten_reshape_back(ten=simple_tensor_23, AA=c(2,3), original.dim=dim(simple_tensor), time.mode=FALSE)
simple_tensor_recover
#> , , 1
#> 
#>      [,1] [,2] [,3] [,4]
#> [1,]    1    4    7   10
#> [2,]    2    5    8   11
#> [3,]    3    6    9   12
#> 
#> , , 2
#> 
#>      [,1] [,2] [,3] [,4]
#> [1,]   13   16   19   22
#> [2,]   14   17   20   23
#> [3,]   15   18   21   24
cat(paste0('\nTo see that "simple_tensor" is indeed recovered, the sum of their squared entry differences is: ', sum((simple_tensor_recover - simple_tensor)^2), '.'))
#> 
#> To see that "simple_tensor" is indeed recovered, the sum of their squared entry differences is: 0.

3. An example (under the null) to perform the test

To demonstrate the test, let us generate an example data using the ‘tensorMiss’ package. For details of the data generation mechanism, see Cen and Lam (2024). For reproducibility, a seed parameter is required which we set as 2025. We generate an order-3 tensor time series 𝒴t ∈ ℝ10 × 10 × 10 for t = 1, 2, …, 100.

Under H0, {𝒴t} has a Kronecker product structure and hence follows a Tucker-decomposition tensor factor model. The time series is ‘data_H0’ as follows,

K <- 3 #order-3 tensor observed at each time
TT <- 100 #time length
d <- c(10,10,10) #spatial dimensions
r <- c(2,2,2) #rank of core tensor
re <- c(2,2,2) #rank of common component in error
eta <- list(c(0,0), c(0,0), c(0,0)) #strong factors
coef_f <- c(0.3) #AR(1) coefficients for core factor
coef_fe <- c(-0.3) #AR(1) coefficients for common component in error
coef_e <- c(0.5) #AR(1) coefficients for idiosyncratic part in error
data_H0 <- tensorMiss::tensor_gen(K, TT, d, r, re, eta, coef_f, coef_fe, coef_e, seed=2025)$X

With the underlying Kronecker product structure, the null should not be rejected for any set of modes. For instance, we perform the test using ‘testKron_A’ below, given the set of mode indices 𝒜 = {1, 2}. In particular, the parameter ‘r’ takes the true number of factors which can be estimated in practice. Since (r1, r2, r3) = (2, 2, 2), the rank for the reshaped data (where the last mode is the reshaped mode) is (rreshape, 1, rV) = (2, 4). Setting ‘r.exact=FALSE’ enables the test to attempt all possible rank combinations of rV, which is (1, 4), (2, 2) and (4, 1). (Otherwise only the test using the parameter ‘r’ is performed.)

cat('Test "data_H0" along modes 1 and 2:\n')
#> Test "data_H0" along modes 1 and 2:
testKron_A(ten=data_H0, AA=c(1,2), r=r, alpha=c(0.01, 0.05), q=0.05, r.exact=FALSE)
#> $level
#>      [,1] [,2]
#> [1,] 0.04 0.12
#> [2,] 0.01 0.06
#> [3,] 0.01 0.08
#> 
#> $decision
#>        [,1]   [,2]
#> [1,] 0.0100 0.0495
#> [2,] 0.0095 0.0400
#> [3,] 0.0100 0.0400
#> 
#> $rank
#>      [,1] [,2] [,3]
#> [1,]    2    1    4
#> [2,]    2    2    2
#> [3,]    2    4    1
The resulted list contains three matrices. The matrix ‘rank’ is the rank combinations attempted, which is ordered by all modes not in 𝒜 followed by those modes in 𝒜. Rows of the ‘level’ and ‘decision’ matrices correspond to the rank combinations in ‘rank’, while columns of the two matrices correspond to the results under different levels of ‘alpha’ in order. For the other two matrices,
‘level’
Each result of ‘level’ reports an example test size; under H0, the minimum of each column should be close to the ‘alpha’ level corresponding to the column.
‘decision’
Each result of ‘decision’ reports the decision statistic (using a ‘q’-quantile rule); under H0, the minimum of each column should be upper bounded by the ‘alpha’ level corresponding to the column.

Since it is heuristic to determine if each ‘level’ column is close enough to the corresponding ‘alpha’, the test decision relies on the ‘decision’ matrix.

The test can also be performed using the mode indices {1, 3}, {2, 3} or {1, 2, 3}. We present the minimum of each ‘decision’ column and expect them to be no larger than 0.01 and 0.05, respectively.

cat('Test "data_H0" along modes 1 and 3:\n')
#> Test "data_H0" along modes 1 and 3:
apply(testKron_A(ten=data_H0, AA=c(1,3), r=r, alpha=c(0.01, 0.05), q=0.05, r.exact=FALSE)$decision, 2, min)
#> [1] 0.01 0.04
cat('\nTest "data_H0" along modes 2 and 3:\n')
#> 
#> Test "data_H0" along modes 2 and 3:
apply(testKron_A(ten=data_H0, AA=c(2,3), r=r, alpha=c(0.01, 0.05), q=0.05, r.exact=FALSE)$decision, 2, min)
#> [1] 0.01 0.04
cat('\nTest "data_H0" along modes 1, 2 and 3:\n')
#> 
#> Test "data_H0" along modes 1, 2 and 3:
apply(testKron_A(ten=data_H0, AA=c(1,2,3), r=r, alpha=c(0.01, 0.05), q=0.05, r.exact=FALSE)$decision, 2, min)
#> [1] 0.01 0.06

With more modes 1, 2 and 3 to test, the result is less satisfying, which can be improved if more observations are available, as shown below.

TT_large <- 300 #larger time length
data_H0_large <- tensorMiss::tensor_gen(K, TT_large, d, r, re, eta, coef_f, coef_fe, coef_e, seed=2025)$X
cat('Test "data_H0_large" along modes 1, 2 and 3:\n')
#> Test "data_H0_large" along modes 1, 2 and 3:
apply(testKron_A(ten=data_H0_large, AA=c(1,2,3), r=r, alpha=c(0.01, 0.05), q=0.05, r.exact=FALSE)$decision, 2, min)
#> [1] 0.01000000 0.04983333

For an order-K tensor time series with K ≥ 3, it is not so direct to specify the modes to test. When we have no prior information, ‘testKron_auto’ provides two options: ‘all=TRUE’ tests for all possible set of mode indices, and ‘all=FALSE’ runs an algorithm testing if each mode is in the set 𝒜 along which the data has no Kronecker product structure. Moreover, the number of factors will be estimated based on eigenvalue ratio if ‘r=0’. Similar to ‘testKron_A’, the resulted list contains three parts:
‘decision’
A data frame indicating if each mode is in 𝒜 according to each ‘alpha’ level if ‘all’ is FALSE, otherwise indicates if each test for a set of mode indices is rejected.
‘level’
A data frame reporting an example test size according to each ‘alpha’ level.
‘rank’
A vector reporting the specified/estimated rank for the parameter ‘ten’ used in the test.

For demonstration, we present the ‘decision’ data frames and ‘rank’:

H0_test_all <- testKron_auto(ten=data_H0, r=0, alpha=c(0.01, 0.05), q=0.05, r.exact=FALSE, all=TRUE)
H0_test_algo <- testKron_auto(ten=data_H0, r=0, alpha=c(0.01, 0.05), q=0.05, r.exact=FALSE, all=FALSE)
cat(paste0('Testing "data_H0" for all possible set of modes. The estimated rank is: ', H0_test_all$rank[1], ',', H0_test_all$rank[2], ',', H0_test_all$rank[3], '; the decision data frame is:\n'))
#> Testing "data_H0" for all possible set of modes. The estimated rank is: 2,2,2; the decision data frame is:
print.data.frame(H0_test_all$decision)
#>   Set of Modes to be tested 0.01 0.05
#> 1                     {1,2}    0    0
#> 2                     {1,3}    0    0
#> 3                     {2,3}    0    0
#> 4                   {1,2,3}    0    1
cat(paste0('\nTesting "data_H0" with the algorithm to identify each mode. The estimated rank is: ', H0_test_algo$rank[1], ',', H0_test_algo$rank[2], ',', H0_test_algo$rank[3], '; the decision data frame is:\n'))
#> 
#> Testing "data_H0" with the algorithm to identify each mode. The estimated rank is: 2,2,2; the decision data frame is:
print.data.frame(H0_test_algo$decision)
#>   Mode to be identified 0.01 0.05
#> 1                     1    0    1
#> 2                     2    0    1
#> 3                     3    0    1

Again, the test can be improved with more observations:

H0_test_all_large <- testKron_auto(ten=data_H0_large, r=0, alpha=c(0.01, 0.05), q=0.05, r.exact=FALSE, all=TRUE)
H0_test_algo_large <- testKron_auto(ten=data_H0_large, r=0, alpha=c(0.01, 0.05), q=0.05, r.exact=FALSE, all=FALSE)
cat(paste0('Testing "data_H0_large" for all possible set of modes. The estimated rank is: ', H0_test_all_large$rank[1], ',', H0_test_all_large$rank[2], ',', H0_test_all_large$rank[3], '; the decision data frame is:\n'))
#> Testing "data_H0_large" for all possible set of modes. The estimated rank is: 2,2,2; the decision data frame is:
print.data.frame(H0_test_all_large$decision)
#>   Set of Modes to be tested 0.01 0.05
#> 1                     {1,2}    0    0
#> 2                     {1,3}    0    0
#> 3                     {2,3}    0    0
#> 4                   {1,2,3}    0    0
cat(paste0('\nTesting "data_H0_large" with the algorithm to identify each mode. The estimated rank is: ', H0_test_algo_large$rank[1], ',', H0_test_algo_large$rank[2], ',', H0_test_algo_large$rank[3], '; the decision data frame is:\n'))
#> 
#> Testing "data_H0_large" with the algorithm to identify each mode. The estimated rank is: 2,2,2; the decision data frame is:
print.data.frame(H0_test_algo_large$decision)
#>   Mode to be identified 0.01 0.05
#> 1                     1    0    0
#> 2                     2    0    0
#> 3                     3    0    0

4. Another example (under the alternative)

Lastly, we demonstrate the performance of the test if the alternative is true. Suppose an observed order-3 tensor time series {𝒴t} has no Kronecker product structure along {2, 3}, i.e., {𝒴t} does not follow a Tucker-decomposition factor model in general while the series reshaped along {2, 3} does. In this case, we generate the reshaped data and only the data reshaped back is observed; notice that the noise is still naturally order-3 tensor-valued which is weakly dependent across time and mode fibers.

noise_H1 <- tensorMiss::tensor_gen(K, TT, d, r, re, eta, coef_f, coef_fe, coef_e, seed=2025)$X - tensorMiss::tensor_gen(K, TT, d, r, re, eta, coef_f, coef_fe, coef_e, seed=2025)$C #noise series
reshape_H1 <- tensorMiss::tensor_gen(K-1, TT, c(d[-c(2,3)],prod(d[c(2,3)])), c(r[-c(2,3)],prod(r[c(2,3)])), c(re[-c(2,3)],prod(re[c(2,3)])), eta[1:(K-1)], coef_f, coef_fe, coef_e, seed=2025) #common component for the reshaped data
data_H1 <- ten_reshape_back(reshape_H1$C , c(2,3), c(TT, d)) + noise_H1


Using ‘testKron_auto’, we present the ‘decision’ data frames and ‘rank’:

H1_test_all <- testKron_auto(ten=data_H1, r=0, alpha=c(0.01, 0.05), q=0.05, r.exact=FALSE, all=TRUE)
H1_test_algo <- testKron_auto(ten=data_H1, r=0, alpha=c(0.01, 0.05), q=0.05, r.exact=FALSE, all=FALSE)
cat(paste0('Testing "data_H1" for all possible set of modes. The estimated rank is: ', H1_test_all$rank[1], ',', H1_test_all$rank[2], ',', H1_test_all$rank[3], '; the decision data frame is:\n'))
#> Testing "data_H1" for all possible set of modes. The estimated rank is: 2,4,3; the decision data frame is:
print.data.frame(H1_test_all$decision)
#>   Set of Modes to be tested 0.01 0.05
#> 1                     {1,2}    0    0
#> 2                     {1,3}    0    0
#> 3                     {2,3}    1    1
#> 4                   {1,2,3}    1    1
cat(paste0('\nTesting "data_H1" with the algorithm to identify each mode. The estimated rank is: ', H1_test_algo$rank[1], ',', H1_test_algo$rank[2], ',', H1_test_algo$rank[3], '; the decision data frame is:\n'))
#> 
#> Testing "data_H1" with the algorithm to identify each mode. The estimated rank is: 2,4,3; the decision data frame is:
print.data.frame(H1_test_algo$decision)
#>   Mode to be identified 0.01 0.05
#> 1                     1    1    1
#> 2                     2    1    1
#> 3                     3    1    1

It worths to note the estimated ranks for modes 2 and 3, which are in fact invalid ranks due to the lost of Kronecker product structure along those modes. On the second result above, the algorithm incorrectly identifies mode-1, which is salvaged by more observations:

noise_H1_large <- tensorMiss::tensor_gen(K, TT_large, d, r, re, eta, coef_f, coef_fe, coef_e, seed=2025)$X - tensorMiss::tensor_gen(K, TT_large, d, r, re, eta, coef_f, coef_fe, coef_e, seed=2025)$C #noise series
reshape_H1_large <- tensorMiss::tensor_gen(K-1, TT_large, c(d[-c(2,3)],prod(d[c(2,3)])), c(r[-c(2,3)],prod(r[c(2,3)])), c(re[-c(2,3)],prod(re[c(2,3)])), eta[1:(K-1)], coef_f, coef_fe, coef_e, seed=2025) #common component for the reshaped data
data_H1_large <- ten_reshape_back(reshape_H1_large$C , c(2,3), c(TT_large, d)) + noise_H1_large
H1_test_algo_large <- testKron_auto(ten=data_H1_large, r=0, alpha=c(0.01, 0.05), q=0.05, r.exact=FALSE, all=FALSE)
cat(paste0('Testing "data_H1_large" with the algorithm to identify each mode. The estimated rank is: ', H1_test_algo_large$rank[1], ',', H1_test_algo_large$rank[2], ',', H1_test_algo_large$rank[3], '; the decision data frame is:\n'))
#> Testing "data_H1_large" with the algorithm to identify each mode. The estimated rank is: 2,1,3; the decision data frame is:
print.data.frame(H1_test_algo_large$decision)
#>   Mode to be identified 0.01 0.05
#> 1                     1    0    0
#> 2                     2    1    1
#> 3                     3    1    1

References

Cen, Zetai, and Clifford Lam. 2024. “Tensor Time Series Imputation Through Tensor Factor Modelling.”
Kolda, Tamara G., and Brett W. Bader. 2009. “Tensor Decompositions and Applications.” SIAM Review 51 (3): 455–500.