{"title":"多维视觉数据聚类张量逼近的硬件结构与实现","authors":"C. Yang, Yang-Ming Yeh, Yi-Chang Lu","doi":"10.1109/VLSI-DAT49148.2020.9196449","DOIUrl":null,"url":null,"abstract":"Tensor approximation has been proven to be an efficient and flexible dimensionality reduction method. However, for applications which require rapid image rendering, the computational cost of data reconstruction after applying tensor approximation is still high. As a result, several modified tensor approximation algorithms supporting fast reconstruction have been proposed, where clustered tensor approximation (CTA) is one of those which are often mentioned. In this paper, we design a hardware accelerator for CTA. The processor can handle a tensor of size $12\\mathrm{S}\\times 12\\mathrm{S}\\times 12\\mathrm{S}\\times 12\\mathrm{S}$. With parallel processing techniques, the performance of the processor can achieve a $ 9.41\\times $ speed-up when compared to the software.","PeriodicalId":235460,"journal":{"name":"2020 International Symposium on VLSI Design, Automation and Test (VLSI-DAT)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Hardware Architecture and Implementation of Clustered Tensor Approximation for Multi-Dimensional Visual Data\",\"authors\":\"C. Yang, Yang-Ming Yeh, Yi-Chang Lu\",\"doi\":\"10.1109/VLSI-DAT49148.2020.9196449\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Tensor approximation has been proven to be an efficient and flexible dimensionality reduction method. However, for applications which require rapid image rendering, the computational cost of data reconstruction after applying tensor approximation is still high. As a result, several modified tensor approximation algorithms supporting fast reconstruction have been proposed, where clustered tensor approximation (CTA) is one of those which are often mentioned. In this paper, we design a hardware accelerator for CTA. The processor can handle a tensor of size $12\\\\mathrm{S}\\\\times 12\\\\mathrm{S}\\\\times 12\\\\mathrm{S}\\\\times 12\\\\mathrm{S}$. With parallel processing techniques, the performance of the processor can achieve a $ 9.41\\\\times $ speed-up when compared to the software.\",\"PeriodicalId\":235460,\"journal\":{\"name\":\"2020 International Symposium on VLSI Design, Automation and Test (VLSI-DAT)\",\"volume\":\"49 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 International Symposium on VLSI Design, Automation and Test (VLSI-DAT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/VLSI-DAT49148.2020.9196449\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Symposium on VLSI Design, Automation and Test (VLSI-DAT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/VLSI-DAT49148.2020.9196449","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Hardware Architecture and Implementation of Clustered Tensor Approximation for Multi-Dimensional Visual Data
Tensor approximation has been proven to be an efficient and flexible dimensionality reduction method. However, for applications which require rapid image rendering, the computational cost of data reconstruction after applying tensor approximation is still high. As a result, several modified tensor approximation algorithms supporting fast reconstruction have been proposed, where clustered tensor approximation (CTA) is one of those which are often mentioned. In this paper, we design a hardware accelerator for CTA. The processor can handle a tensor of size $12\mathrm{S}\times 12\mathrm{S}\times 12\mathrm{S}\times 12\mathrm{S}$. With parallel processing techniques, the performance of the processor can achieve a $ 9.41\times $ speed-up when compared to the software.