AToM：用于视觉转换器高效加速的自适应令牌合并

IF 3.6 2区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Computers Pub Date : 2025-02-11 DOI:10.1109/TC.2025.3540638

Jaekang Shin;Myeonggu Kang;Yunki Han;Junyoung Park;Lee-Sup Kim

{"title":"AToM：用于视觉转换器高效加速的自适应令牌合并","authors":"Jaekang Shin;Myeonggu Kang;Yunki Han;Junyoung Park;Lee-Sup Kim","doi":"10.1109/TC.2025.3540638","DOIUrl":null,"url":null,"abstract":"Recently, Vision Transformers (ViTs) have set a new standard in computer vision (CV), showing unparalleled image processing performance. However, their substantial computational requirements hinder practical deployment, especially on resource-limited devices common in CV applications. Token merging has emerged as a solution, condensing tokens with similar features to cut computational and memory demands. Yet, existing applications on ViTs often miss the mark in token compression, with rigid merging strategies and a lack of in-depth analysis of ViT merging characteristics. To overcome these issues, this paper introduces Adaptive Token Merging (AToM), a comprehensive algorithm-architecture co-design for accelerating ViTs. The AToM algorithm employs an image-adaptive, fine-grained merging strategy, significantly boosting computational efficiency. We also optimize the merging and unmerging processes to minimize overhead, employing techniques like First-Come-First-Merge mapping and Linear Distance Calculation. On the hardware side, the AToM architecture is tailor-made to exploit the AToM algorithm's benefits, with specialized engines for efficient merge and unmerge operations. Our pipeline architecture ensures end-to-end ViT processing, minimizing latency and memory overhead from the AToM algorithm. Across various hardware platforms including CPU, EdgeGPU, and GPU, AToM achieves average end-to-end speedups of 10.9<inline-formula><tex-math>$\\boldsymbol{\\times}$</tex-math></inline-formula>, 7.7<inline-formula><tex-math>$\\boldsymbol{\\times}$</tex-math></inline-formula>, and 5.4<inline-formula><tex-math>$\\boldsymbol{\\times}$</tex-math></inline-formula>, alongside energy savings of 24.9<inline-formula><tex-math>$\\boldsymbol{\\times}$</tex-math></inline-formula>, 1.8<inline-formula><tex-math>$\\boldsymbol{\\times}$</tex-math></inline-formula>, and 16.7<inline-formula><tex-math>$\\boldsymbol{\\times}$</tex-math></inline-formula>. Moreover, AToM offers 1.2<inline-formula><tex-math>$\\boldsymbol{\\times}$</tex-math></inline-formula> 1.9<inline-formula><tex-math>$\\boldsymbol{\\times}$</tex-math></inline-formula> higher effective throughput compared to existing transformer accelerators.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 5","pages":"1620-1633"},"PeriodicalIF":3.6000,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"AToM: Adaptive Token Merging for Efficient Acceleration of Vision Transformer\",\"authors\":\"Jaekang Shin;Myeonggu Kang;Yunki Han;Junyoung Park;Lee-Sup Kim\",\"doi\":\"10.1109/TC.2025.3540638\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recently, Vision Transformers (ViTs) have set a new standard in computer vision (CV), showing unparalleled image processing performance. However, their substantial computational requirements hinder practical deployment, especially on resource-limited devices common in CV applications. Token merging has emerged as a solution, condensing tokens with similar features to cut computational and memory demands. Yet, existing applications on ViTs often miss the mark in token compression, with rigid merging strategies and a lack of in-depth analysis of ViT merging characteristics. To overcome these issues, this paper introduces Adaptive Token Merging (AToM), a comprehensive algorithm-architecture co-design for accelerating ViTs. The AToM algorithm employs an image-adaptive, fine-grained merging strategy, significantly boosting computational efficiency. We also optimize the merging and unmerging processes to minimize overhead, employing techniques like First-Come-First-Merge mapping and Linear Distance Calculation. On the hardware side, the AToM architecture is tailor-made to exploit the AToM algorithm's benefits, with specialized engines for efficient merge and unmerge operations. Our pipeline architecture ensures end-to-end ViT processing, minimizing latency and memory overhead from the AToM algorithm. Across various hardware platforms including CPU, EdgeGPU, and GPU, AToM achieves average end-to-end speedups of 10.9<inline-formula><tex-math>$\\\\boldsymbol{\\\\times}$</tex-math></inline-formula>, 7.7<inline-formula><tex-math>$\\\\boldsymbol{\\\\times}$</tex-math></inline-formula>, and 5.4<inline-formula><tex-math>$\\\\boldsymbol{\\\\times}$</tex-math></inline-formula>, alongside energy savings of 24.9<inline-formula><tex-math>$\\\\boldsymbol{\\\\times}$</tex-math></inline-formula>, 1.8<inline-formula><tex-math>$\\\\boldsymbol{\\\\times}$</tex-math></inline-formula>, and 16.7<inline-formula><tex-math>$\\\\boldsymbol{\\\\times}$</tex-math></inline-formula>. Moreover, AToM offers 1.2<inline-formula><tex-math>$\\\\boldsymbol{\\\\times}$</tex-math></inline-formula> 1.9<inline-formula><tex-math>$\\\\boldsymbol{\\\\times}$</tex-math></inline-formula> higher effective throughput compared to existing transformer accelerators.\",\"PeriodicalId\":13087,\"journal\":{\"name\":\"IEEE Transactions on Computers\",\"volume\":\"74 5\",\"pages\":\"1620-1633\"},\"PeriodicalIF\":3.6000,\"publicationDate\":\"2025-02-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Computers\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10880106/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computers","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10880106/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

摘要

近年来，视觉变压器（ViTs）以其无与伦比的图像处理性能为计算机视觉（CV）树立了新的标准。然而，它们的大量计算需求阻碍了实际部署，特别是在CV应用中常见的资源有限的设备上。令牌合并已经成为一种解决方案，它将具有相似功能的令牌压缩在一起，以减少计算和内存需求。然而，现有的基于ViT的应用在令牌压缩方面往往做得不够，合并策略死板，缺乏对ViT合并特性的深入分析。为了克服这些问题，本文引入了自适应令牌合并（AToM），这是一种加速vit的综合算法架构协同设计。AToM算法采用自适应图像的细粒度合并策略，显著提高了计算效率。我们还优化合并和取消合并过程以最小化开销，使用像先到先合并映射和线性距离计算这样的技术。在硬件方面，AToM架构是为利用AToM算法的优势而量身定制的，它具有用于高效合并和取消合并操作的专用引擎。我们的管道架构确保了端到端的ViT处理，最大限度地减少了AToM算法的延迟和内存开销。在包括CPU、EdgeGPU和GPU在内的各种硬件平台上，AToM实现了10.9$\boldsymbol{\times}$、7.7$\boldsymbol{\times}$和5.4$\boldsymbol{\times}$的平均端到端加速，同时节省了24.9$\boldsymbol{\times}$、1.8$\boldsymbol{\times}$和16.7$\boldsymbol{\times}$的能量。此外，与现有的变压器加速器相比，AToM提供1.2$\boldsymbol{\times}$ 1.9$\boldsymbol{\times}$更高的有效吞吐量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

AToM: Adaptive Token Merging for Efficient Acceleration of Vision Transformer

Recently, Vision Transformers (ViTs) have set a new standard in computer vision (CV), showing unparalleled image processing performance. However, their substantial computational requirements hinder practical deployment, especially on resource-limited devices common in CV applications. Token merging has emerged as a solution, condensing tokens with similar features to cut computational and memory demands. Yet, existing applications on ViTs often miss the mark in token compression, with rigid merging strategies and a lack of in-depth analysis of ViT merging characteristics. To overcome these issues, this paper introduces Adaptive Token Merging (AToM), a comprehensive algorithm-architecture co-design for accelerating ViTs. The AToM algorithm employs an image-adaptive, fine-grained merging strategy, significantly boosting computational efficiency. We also optimize the merging and unmerging processes to minimize overhead, employing techniques like First-Come-First-Merge mapping and Linear Distance Calculation. On the hardware side, the AToM architecture is tailor-made to exploit the AToM algorithm's benefits, with specialized engines for efficient merge and unmerge operations. Our pipeline architecture ensures end-to-end ViT processing, minimizing latency and memory overhead from the AToM algorithm. Across various hardware platforms including CPU, EdgeGPU, and GPU, AToM achieves average end-to-end speedups of 10.9

$\boldsymbol{\times}$

, 7.7

$\boldsymbol{\times}$

, and 5.4

$\boldsymbol{\times}$

, alongside energy savings of 24.9

$\boldsymbol{\times}$

, 1.8

$\boldsymbol{\times}$

, and 16.7

$\boldsymbol{\times}$

. Moreover, AToM offers 1.2

$\boldsymbol{\times}$

1.9

$\boldsymbol{\times}$

higher effective throughput compared to existing transformer accelerators.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Computers 工程技术-工程：电子与电气

CiteScore

6.60

自引率

5.40%

发文量

199

审稿时长

6.0 months

期刊介绍： The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field. It publishes papers on research in areas of current interest to the readers. These areas include, but are not limited to, the following: a) computer organizations and architectures; b) operating systems, software systems, and communication protocols; c) real-time systems and embedded systems; d) digital devices, computer components, and interconnection networks; e) specification, design, prototyping, and testing methods and tools; f) performance, fault tolerance, reliability, security, and testability; g) case studies and experimental and theoretical evaluations; and h) new and important applications and trends.