{"title":"CoaT: Compiler-Assisted Two-Stage Offloading Approach for Data-Intensive Applications Under NMP Framework","authors":"Satanu Maity;Mayank Goel;Manojit Ghose","doi":"10.1109/TETC.2024.3495218","DOIUrl":null,"url":null,"abstract":"As we head toward a data-centric era, conventional computing systems become inadequate to meet the evolving demands of the applications. As a result, the near-memory processing (NMP) computing paradigm emerges as a potential alternative framework where regions of an application are offloaded for execution near the memory. Although some interesting research works have been proposed in recent times, none of them have considered placing processing cores jointly on the primary memories and cache memory. Further, they did not consider the data locality offered by the last level cache (LLC) and the estimated execution time of an application region together while designing the offloading strategy. This paper presents a novel hybrid NMP computation framework comprising a traditional multicore processor, NMP-enabled 3D memories and NMP-enabled LLC. The application source code is processed through a compilation framework to identify potential offloadable regions. The paper further proposes a two-stage offloading strategy, <italic>CoaT</i>, which determines the execution location of the application regions based on the region’s overall execution time and the data locality offered by the LLC. A comprehensive series of experiments conducted using well-established simulators for large data-intensive applications, provides strong evidence of the efficacy of our approach. The results demonstrate significant reductions in execution time (averaging 60% with a maximum reduction of 64%), un-core energy consumption (averaging 34% with a maximum reduction of 44%), and off-chip data block transfer count (averaging 61% with a maximum reduction of 80%) compared to the state-of-the-art policies. The proposed policy achieves a speedup of 2.6x (on average) and 3.1x (maximum) w.r.t. the conventional execution.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"13 3","pages":"753-767"},"PeriodicalIF":5.4000,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Emerging Topics in Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10755004/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
As we head toward a data-centric era, conventional computing systems become inadequate to meet the evolving demands of the applications. As a result, the near-memory processing (NMP) computing paradigm emerges as a potential alternative framework where regions of an application are offloaded for execution near the memory. Although some interesting research works have been proposed in recent times, none of them have considered placing processing cores jointly on the primary memories and cache memory. Further, they did not consider the data locality offered by the last level cache (LLC) and the estimated execution time of an application region together while designing the offloading strategy. This paper presents a novel hybrid NMP computation framework comprising a traditional multicore processor, NMP-enabled 3D memories and NMP-enabled LLC. The application source code is processed through a compilation framework to identify potential offloadable regions. The paper further proposes a two-stage offloading strategy, CoaT, which determines the execution location of the application regions based on the region’s overall execution time and the data locality offered by the LLC. A comprehensive series of experiments conducted using well-established simulators for large data-intensive applications, provides strong evidence of the efficacy of our approach. The results demonstrate significant reductions in execution time (averaging 60% with a maximum reduction of 64%), un-core energy consumption (averaging 34% with a maximum reduction of 44%), and off-chip data block transfer count (averaging 61% with a maximum reduction of 80%) compared to the state-of-the-art policies. The proposed policy achieves a speedup of 2.6x (on average) and 3.1x (maximum) w.r.t. the conventional execution.
期刊介绍:
IEEE Transactions on Emerging Topics in Computing publishes papers on emerging aspects of computer science, computing technology, and computing applications not currently covered by other IEEE Computer Society Transactions. Some examples of emerging topics in computing include: IT for Green, Synthetic and organic computing structures and systems, Advanced analytics, Social/occupational computing, Location-based/client computer systems, Morphic computer design, Electronic game systems, & Health-care IT.