{"title":"一种用于实时人体动作识别的改进密集轨迹的203fps VLSI架构","authors":"Zhi-Yi Lin, Jia-Lin Chen, Liang-Gee Chen","doi":"10.1109/ICASSP.2018.8461988","DOIUrl":null,"url":null,"abstract":"This paper introduces architecture with high throughput, low on-chip memory, and efficient data access for Improved Dense Trajectories (iDT) as video representations for realtime action recognition. The iDT feature can capture longterm motion cues better than any existing deep feature, which makes it crucial in state-of-the-art action recognition systems. There are three major features in our architecture design, including a low bandwidth frame-wise feature extraction, low on-chip memory architecture for point tracking, and two-stage trajectory pruning architecture for low bandwidth. Using TSMC 40nm technology, our chip area is 3.1 mm2, and the size of on-chip memory is 40.8 kB. The chip can support videos in resolution of 320×240 with a throughput of 203 fps under 215 MHz, which is a 81.2 times speedup compared with CPU. Under the same operating frequency, it can also provide feature extraction for six windows of size 320 × 240 in higher resolution videos with a throughput of 34 fps.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"33 1","pages":"1115-1119"},"PeriodicalIF":0.0000,"publicationDate":"2018-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A 203 FPS VLSI Architecture of Improved Dense Trajectories for Real-Time Human Action Recognition\",\"authors\":\"Zhi-Yi Lin, Jia-Lin Chen, Liang-Gee Chen\",\"doi\":\"10.1109/ICASSP.2018.8461988\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper introduces architecture with high throughput, low on-chip memory, and efficient data access for Improved Dense Trajectories (iDT) as video representations for realtime action recognition. The iDT feature can capture longterm motion cues better than any existing deep feature, which makes it crucial in state-of-the-art action recognition systems. There are three major features in our architecture design, including a low bandwidth frame-wise feature extraction, low on-chip memory architecture for point tracking, and two-stage trajectory pruning architecture for low bandwidth. Using TSMC 40nm technology, our chip area is 3.1 mm2, and the size of on-chip memory is 40.8 kB. The chip can support videos in resolution of 320×240 with a throughput of 203 fps under 215 MHz, which is a 81.2 times speedup compared with CPU. Under the same operating frequency, it can also provide feature extraction for six windows of size 320 × 240 in higher resolution videos with a throughput of 34 fps.\",\"PeriodicalId\":6638,\"journal\":{\"name\":\"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"volume\":\"33 1\",\"pages\":\"1115-1119\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-04-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICASSP.2018.8461988\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP.2018.8461988","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A 203 FPS VLSI Architecture of Improved Dense Trajectories for Real-Time Human Action Recognition
This paper introduces architecture with high throughput, low on-chip memory, and efficient data access for Improved Dense Trajectories (iDT) as video representations for realtime action recognition. The iDT feature can capture longterm motion cues better than any existing deep feature, which makes it crucial in state-of-the-art action recognition systems. There are three major features in our architecture design, including a low bandwidth frame-wise feature extraction, low on-chip memory architecture for point tracking, and two-stage trajectory pruning architecture for low bandwidth. Using TSMC 40nm technology, our chip area is 3.1 mm2, and the size of on-chip memory is 40.8 kB. The chip can support videos in resolution of 320×240 with a throughput of 203 fps under 215 MHz, which is a 81.2 times speedup compared with CPU. Under the same operating frequency, it can also provide feature extraction for six windows of size 320 × 240 in higher resolution videos with a throughput of 34 fps.