{"title":"Hardware Architecture and Implementation of Clustered Tensor Approximation for Multi-Dimensional Visual Data","authors":"C. Yang, Yang-Ming Yeh, Yi-Chang Lu","doi":"10.1109/VLSI-DAT49148.2020.9196449","DOIUrl":null,"url":null,"abstract":"Tensor approximation has been proven to be an efficient and flexible dimensionality reduction method. However, for applications which require rapid image rendering, the computational cost of data reconstruction after applying tensor approximation is still high. As a result, several modified tensor approximation algorithms supporting fast reconstruction have been proposed, where clustered tensor approximation (CTA) is one of those which are often mentioned. In this paper, we design a hardware accelerator for CTA. The processor can handle a tensor of size $12\\mathrm{S}\\times 12\\mathrm{S}\\times 12\\mathrm{S}\\times 12\\mathrm{S}$. With parallel processing techniques, the performance of the processor can achieve a $ 9.41\\times $ speed-up when compared to the software.","PeriodicalId":235460,"journal":{"name":"2020 International Symposium on VLSI Design, Automation and Test (VLSI-DAT)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Symposium on VLSI Design, Automation and Test (VLSI-DAT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/VLSI-DAT49148.2020.9196449","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Tensor approximation has been proven to be an efficient and flexible dimensionality reduction method. However, for applications which require rapid image rendering, the computational cost of data reconstruction after applying tensor approximation is still high. As a result, several modified tensor approximation algorithms supporting fast reconstruction have been proposed, where clustered tensor approximation (CTA) is one of those which are often mentioned. In this paper, we design a hardware accelerator for CTA. The processor can handle a tensor of size $12\mathrm{S}\times 12\mathrm{S}\times 12\mathrm{S}\times 12\mathrm{S}$. With parallel processing techniques, the performance of the processor can achieve a $ 9.41\times $ speed-up when compared to the software.