Computational and Statistical Guarantees for Tensor-on-Tensor Regression With Tensor Train Decomposition

IF 18.6

IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-07-31 DOI:10.1109/TPAMI.2025.3593840

Zhen Qin;Zhihui Zhu

{"title":"Computational and Statistical Guarantees for Tensor-on-Tensor Regression With Tensor Train Decomposition","authors":"Zhen Qin;Zhihui Zhu","doi":"10.1109/TPAMI.2025.3593840","DOIUrl":null,"url":null,"abstract":"Recently, a tensor-on-tensor (ToT) regression model has been proposed to generalize tensor recovery, encompassing scenarios like scalar-on-tensor regression and tensor-on-vector regression. However, the exponential growth in tensor complexity poses challenges for storage and computation in ToT regression. To overcome this hurdle, tensor decompositions have been introduced, with the tensor train (TT)-based ToT model proving efficient in practice due to reduced memory requirements, enhanced computational efficiency, and decreased sampling complexity. Despite these practical benefits, a disparity exists between theoretical analysis and real-world performance. In this paper, we delve into the theoretical and algorithmic aspects of the TT-based ToT regression model. Assuming the regression operator satisfies the restricted isometry property (RIP), we conduct an error analysis for the solution to a constrained least-squares optimization problem. This analysis includes upper error bound and minimax lower bound, revealing that such error bounds polynomially depend on the order <inline-formula><tex-math>$N+M$</tex-math></inline-formula>. To efficiently find solutions meeting such error bounds, we propose two optimization algorithms: the iterative hard thresholding (IHT) algorithm (employing gradient descent with TT-singular value decomposition (TT-SVD)) and the factorization approach using the Riemannian gradient descent (RGD) algorithm. When RIP is satisfied, spectral initialization facilitates proper initialization, and we establish the linear convergence rate of both IHT and RGD. Notably, compared to the IHT, which optimizes the entire tensor in each iteration while maintaining the TT structure through TT-SVD and poses a challenge for storage memory in practice, the RGD optimizes factors in the so-called left-orthogonal TT format, enforcing orthonormality among most of the factors, over the Stiefel manifold, thereby reducing the storage complexity of the IHT. However, this reduction in storage memory comes at a cost: the recovery of RGD is worse than that of IHT, while the error bounds of both algorithms depend on <inline-formula><tex-math>$N+M$</tex-math></inline-formula> polynomially. Experimental validation substantiates the validity of our theoretical findings.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 11","pages":"10577-10587"},"PeriodicalIF":18.6000,"publicationDate":"2025-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11106186/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Recently, a tensor-on-tensor (ToT) regression model has been proposed to generalize tensor recovery, encompassing scenarios like scalar-on-tensor regression and tensor-on-vector regression. However, the exponential growth in tensor complexity poses challenges for storage and computation in ToT regression. To overcome this hurdle, tensor decompositions have been introduced, with the tensor train (TT)-based ToT model proving efficient in practice due to reduced memory requirements, enhanced computational efficiency, and decreased sampling complexity. Despite these practical benefits, a disparity exists between theoretical analysis and real-world performance. In this paper, we delve into the theoretical and algorithmic aspects of the TT-based ToT regression model. Assuming the regression operator satisfies the restricted isometry property (RIP), we conduct an error analysis for the solution to a constrained least-squares optimization problem. This analysis includes upper error bound and minimax lower bound, revealing that such error bounds polynomially depend on the order

$N+M$

. To efficiently find solutions meeting such error bounds, we propose two optimization algorithms: the iterative hard thresholding (IHT) algorithm (employing gradient descent with TT-singular value decomposition (TT-SVD)) and the factorization approach using the Riemannian gradient descent (RGD) algorithm. When RIP is satisfied, spectral initialization facilitates proper initialization, and we establish the linear convergence rate of both IHT and RGD. Notably, compared to the IHT, which optimizes the entire tensor in each iteration while maintaining the TT structure through TT-SVD and poses a challenge for storage memory in practice, the RGD optimizes factors in the so-called left-orthogonal TT format, enforcing orthonormality among most of the factors, over the Stiefel manifold, thereby reducing the storage complexity of the IHT. However, this reduction in storage memory comes at a cost: the recovery of RGD is worse than that of IHT, while the error bounds of both algorithms depend on

$N+M$

polynomially. Experimental validation substantiates the validity of our theoretical findings.

查看原文本刊更多论文

基于张量列分解的张量-张量回归的计算和统计保证

最近，人们提出了一种广义张量恢复的张量-张量回归模型，包括标量-张量回归和张量-向量回归。然而，张量复杂度的指数增长给ToT回归中的存储和计算带来了挑战。为了克服这一障碍，引入了张量分解，基于张量序列（TT）的ToT模型在实践中被证明是有效的，因为它减少了内存需求，提高了计算效率，降低了采样复杂度。尽管有这些实际好处，但理论分析和实际表现之间存在差距。在本文中，我们深入研究了基于t的ToT回归模型的理论和算法方面。假设回归算子满足约束等距性（RIP），对约束最小二乘优化问题的解进行误差分析。该分析包括误差上界和极大极小下界，揭示了误差上界多项式地依赖于N+M阶。为了有效地找到满足这些误差边界的解，我们提出了两种优化算法：迭代硬阈值（IHT）算法（采用tt -奇异值分解（TT-SVD）的梯度下降）和使用riemanian梯度下降（RGD）算法的分解方法。当RIP满足时，光谱初始化有利于初始化，我们建立了IHT和RGD的线性收敛速率。值得注意的是，与IHT相比，RGD在所谓的左正交TT格式中优化因子，在Stiefel流形上强制大多数因子之间的正交性，从而降低了IHT的存储复杂性。IHT在每次迭代中优化整个张量，同时通过TT- svd保持TT结构，这在实践中对存储内存提出了挑战。然而，这种存储内存的减少是有代价的：RGD的恢复比IHT的恢复差，而两种算法的错误界都多项式地依赖于$N+M$。实验验证证实了我们理论发现的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on pattern analysis and machine intelligence

自引率

0.00%

发文量