Jaekang Shin;Myeonggu Kang;Yunki Han;Junyoung Park;Lee-Sup Kim
{"title":"AToM:用于视觉转换器高效加速的自适应令牌合并","authors":"Jaekang Shin;Myeonggu Kang;Yunki Han;Junyoung Park;Lee-Sup Kim","doi":"10.1109/TC.2025.3540638","DOIUrl":null,"url":null,"abstract":"Recently, Vision Transformers (ViTs) have set a new standard in computer vision (CV), showing unparalleled image processing performance. However, their substantial computational requirements hinder practical deployment, especially on resource-limited devices common in CV applications. Token merging has emerged as a solution, condensing tokens with similar features to cut computational and memory demands. Yet, existing applications on ViTs often miss the mark in token compression, with rigid merging strategies and a lack of in-depth analysis of ViT merging characteristics. To overcome these issues, this paper introduces Adaptive Token Merging (AToM), a comprehensive algorithm-architecture co-design for accelerating ViTs. The AToM algorithm employs an image-adaptive, fine-grained merging strategy, significantly boosting computational efficiency. We also optimize the merging and unmerging processes to minimize overhead, employing techniques like First-Come-First-Merge mapping and Linear Distance Calculation. On the hardware side, the AToM architecture is tailor-made to exploit the AToM algorithm's benefits, with specialized engines for efficient merge and unmerge operations. Our pipeline architecture ensures end-to-end ViT processing, minimizing latency and memory overhead from the AToM algorithm. Across various hardware platforms including CPU, EdgeGPU, and GPU, AToM achieves average end-to-end speedups of 10.9<inline-formula><tex-math>$\\boldsymbol{\\times}$</tex-math></inline-formula>, 7.7<inline-formula><tex-math>$\\boldsymbol{\\times}$</tex-math></inline-formula>, and 5.4<inline-formula><tex-math>$\\boldsymbol{\\times}$</tex-math></inline-formula>, alongside energy savings of 24.9<inline-formula><tex-math>$\\boldsymbol{\\times}$</tex-math></inline-formula>, 1.8<inline-formula><tex-math>$\\boldsymbol{\\times}$</tex-math></inline-formula>, and 16.7<inline-formula><tex-math>$\\boldsymbol{\\times}$</tex-math></inline-formula>. Moreover, AToM offers 1.2<inline-formula><tex-math>$\\boldsymbol{\\times}$</tex-math></inline-formula> 1.9<inline-formula><tex-math>$\\boldsymbol{\\times}$</tex-math></inline-formula> higher effective throughput compared to existing transformer accelerators.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 5","pages":"1620-1633"},"PeriodicalIF":3.6000,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"AToM: Adaptive Token Merging for Efficient Acceleration of Vision Transformer\",\"authors\":\"Jaekang Shin;Myeonggu Kang;Yunki Han;Junyoung Park;Lee-Sup Kim\",\"doi\":\"10.1109/TC.2025.3540638\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recently, Vision Transformers (ViTs) have set a new standard in computer vision (CV), showing unparalleled image processing performance. However, their substantial computational requirements hinder practical deployment, especially on resource-limited devices common in CV applications. Token merging has emerged as a solution, condensing tokens with similar features to cut computational and memory demands. Yet, existing applications on ViTs often miss the mark in token compression, with rigid merging strategies and a lack of in-depth analysis of ViT merging characteristics. To overcome these issues, this paper introduces Adaptive Token Merging (AToM), a comprehensive algorithm-architecture co-design for accelerating ViTs. The AToM algorithm employs an image-adaptive, fine-grained merging strategy, significantly boosting computational efficiency. We also optimize the merging and unmerging processes to minimize overhead, employing techniques like First-Come-First-Merge mapping and Linear Distance Calculation. On the hardware side, the AToM architecture is tailor-made to exploit the AToM algorithm's benefits, with specialized engines for efficient merge and unmerge operations. Our pipeline architecture ensures end-to-end ViT processing, minimizing latency and memory overhead from the AToM algorithm. Across various hardware platforms including CPU, EdgeGPU, and GPU, AToM achieves average end-to-end speedups of 10.9<inline-formula><tex-math>$\\\\boldsymbol{\\\\times}$</tex-math></inline-formula>, 7.7<inline-formula><tex-math>$\\\\boldsymbol{\\\\times}$</tex-math></inline-formula>, and 5.4<inline-formula><tex-math>$\\\\boldsymbol{\\\\times}$</tex-math></inline-formula>, alongside energy savings of 24.9<inline-formula><tex-math>$\\\\boldsymbol{\\\\times}$</tex-math></inline-formula>, 1.8<inline-formula><tex-math>$\\\\boldsymbol{\\\\times}$</tex-math></inline-formula>, and 16.7<inline-formula><tex-math>$\\\\boldsymbol{\\\\times}$</tex-math></inline-formula>. Moreover, AToM offers 1.2<inline-formula><tex-math>$\\\\boldsymbol{\\\\times}$</tex-math></inline-formula> 1.9<inline-formula><tex-math>$\\\\boldsymbol{\\\\times}$</tex-math></inline-formula> higher effective throughput compared to existing transformer accelerators.\",\"PeriodicalId\":13087,\"journal\":{\"name\":\"IEEE Transactions on Computers\",\"volume\":\"74 5\",\"pages\":\"1620-1633\"},\"PeriodicalIF\":3.6000,\"publicationDate\":\"2025-02-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Computers\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10880106/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computers","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10880106/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
AToM: Adaptive Token Merging for Efficient Acceleration of Vision Transformer
Recently, Vision Transformers (ViTs) have set a new standard in computer vision (CV), showing unparalleled image processing performance. However, their substantial computational requirements hinder practical deployment, especially on resource-limited devices common in CV applications. Token merging has emerged as a solution, condensing tokens with similar features to cut computational and memory demands. Yet, existing applications on ViTs often miss the mark in token compression, with rigid merging strategies and a lack of in-depth analysis of ViT merging characteristics. To overcome these issues, this paper introduces Adaptive Token Merging (AToM), a comprehensive algorithm-architecture co-design for accelerating ViTs. The AToM algorithm employs an image-adaptive, fine-grained merging strategy, significantly boosting computational efficiency. We also optimize the merging and unmerging processes to minimize overhead, employing techniques like First-Come-First-Merge mapping and Linear Distance Calculation. On the hardware side, the AToM architecture is tailor-made to exploit the AToM algorithm's benefits, with specialized engines for efficient merge and unmerge operations. Our pipeline architecture ensures end-to-end ViT processing, minimizing latency and memory overhead from the AToM algorithm. Across various hardware platforms including CPU, EdgeGPU, and GPU, AToM achieves average end-to-end speedups of 10.9$\boldsymbol{\times}$, 7.7$\boldsymbol{\times}$, and 5.4$\boldsymbol{\times}$, alongside energy savings of 24.9$\boldsymbol{\times}$, 1.8$\boldsymbol{\times}$, and 16.7$\boldsymbol{\times}$. Moreover, AToM offers 1.2$\boldsymbol{\times}$ 1.9$\boldsymbol{\times}$ higher effective throughput compared to existing transformer accelerators.
期刊介绍:
The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field. It publishes papers on research in areas of current interest to the readers. These areas include, but are not limited to, the following: a) computer organizations and architectures; b) operating systems, software systems, and communication protocols; c) real-time systems and embedded systems; d) digital devices, computer components, and interconnection networks; e) specification, design, prototyping, and testing methods and tools; f) performance, fault tolerance, reliability, security, and testability; g) case studies and experimental and theoretical evaluations; and h) new and important applications and trends.