手持平台中的内存流量短路

2014 47th Annual IEEE/ACM International Symposium on Microarchitecture Pub Date : 2014-12-13 DOI:10.1109/MICRO.2014.60

Praveen Yedlapalli, N. Nachiappan, N. Soundararajan, A. Sivasubramaniam, M. Kandemir, C. Das

{"title":"手持平台中的内存流量短路","authors":"Praveen Yedlapalli, N. Nachiappan, N. Soundararajan, A. Sivasubramaniam, M. Kandemir, C. Das","doi":"10.1109/MICRO.2014.60","DOIUrl":null,"url":null,"abstract":"Handheld devices are ubiquitous in today's world. With their advent, we also see a tremendous increase in device-user interactivity and real-time data processing needs. Media (audio/video/camera) and gaming use-cases are gaining substantial user attention and are defining product successes. The combination of increasing demand from these use-cases and having to run them at low power (from a battery) means that architects have to carefully study the applications and optimize the hardware and software stack together to gain significant optimizations. In this work, we study workloads from these domains and identify the memory subsystem (system agent) to be a critical bottleneck to performance scaling. We characterize the lifetime of the \"frame-based\" data used in these workloads through the system and show that, by communicating at frame granularity, we miss significant performance optimization opportunities, caused by large IP-to-IP data reuse distances. By carefully breaking these frames into sub-frames, while maintaining correctness, we demonstrate substantial gains with limited hardware requirements. Specifically, we evaluate two techniques, flow-buffering and IP-IP short-circuiting, and show that these techniques bring both power-performance benefits and enhanced user experience.","PeriodicalId":6591,"journal":{"name":"2014 47th Annual IEEE/ACM International Symposium on Microarchitecture","volume":"106 1 Suppl 1","pages":"166-177"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":"{\"title\":\"Short-Circuiting Memory Traffic in Handheld Platforms\",\"authors\":\"Praveen Yedlapalli, N. Nachiappan, N. Soundararajan, A. Sivasubramaniam, M. Kandemir, C. Das\",\"doi\":\"10.1109/MICRO.2014.60\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Handheld devices are ubiquitous in today's world. With their advent, we also see a tremendous increase in device-user interactivity and real-time data processing needs. Media (audio/video/camera) and gaming use-cases are gaining substantial user attention and are defining product successes. The combination of increasing demand from these use-cases and having to run them at low power (from a battery) means that architects have to carefully study the applications and optimize the hardware and software stack together to gain significant optimizations. In this work, we study workloads from these domains and identify the memory subsystem (system agent) to be a critical bottleneck to performance scaling. We characterize the lifetime of the \\\"frame-based\\\" data used in these workloads through the system and show that, by communicating at frame granularity, we miss significant performance optimization opportunities, caused by large IP-to-IP data reuse distances. By carefully breaking these frames into sub-frames, while maintaining correctness, we demonstrate substantial gains with limited hardware requirements. Specifically, we evaluate two techniques, flow-buffering and IP-IP short-circuiting, and show that these techniques bring both power-performance benefits and enhanced user experience.\",\"PeriodicalId\":6591,\"journal\":{\"name\":\"2014 47th Annual IEEE/ACM International Symposium on Microarchitecture\",\"volume\":\"106 1 Suppl 1\",\"pages\":\"166-177\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-12-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"18\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 47th Annual IEEE/ACM International Symposium on Microarchitecture\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MICRO.2014.60\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 47th Annual IEEE/ACM International Symposium on Microarchitecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MICRO.2014.60","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 18

摘要

手持设备在当今世界无处不在。随着它们的出现，我们也看到了设备-用户交互性和实时数据处理需求的巨大增长。媒体(音频/视频/相机)和游戏用例获得了大量用户的关注，并决定了产品的成功。这些用例不断增长的需求和必须以低功耗(来自电池)运行它们的组合意味着架构师必须仔细研究应用程序，并同时优化硬件和软件堆栈，以获得显著的优化。在这项工作中，我们研究了这些领域的工作负载，并确定内存子系统(系统代理)是性能扩展的关键瓶颈。我们描述了通过系统在这些工作负载中使用的“基于帧”的数据的生命周期，并表明，通过以帧粒度进行通信，我们错过了重要的性能优化机会，这是由大的ip到ip数据重用距离造成的。通过小心地将这些帧分解成子帧，同时保持正确性，我们可以在有限的硬件需求下获得可观的收益。具体来说，我们评估了两种技术，流量缓冲和IP-IP短路，并表明这些技术既带来了功率性能优势，又增强了用户体验。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Short-Circuiting Memory Traffic in Handheld Platforms

Handheld devices are ubiquitous in today's world. With their advent, we also see a tremendous increase in device-user interactivity and real-time data processing needs. Media (audio/video/camera) and gaming use-cases are gaining substantial user attention and are defining product successes. The combination of increasing demand from these use-cases and having to run them at low power (from a battery) means that architects have to carefully study the applications and optimize the hardware and software stack together to gain significant optimizations. In this work, we study workloads from these domains and identify the memory subsystem (system agent) to be a critical bottleneck to performance scaling. We characterize the lifetime of the "frame-based" data used in these workloads through the system and show that, by communicating at frame granularity, we miss significant performance optimization opportunities, caused by large IP-to-IP data reuse distances. By carefully breaking these frames into sub-frames, while maintaining correctness, we demonstrate substantial gains with limited hardware requirements. Specifically, we evaluate two techniques, flow-buffering and IP-IP short-circuiting, and show that these techniques bring both power-performance benefits and enhanced user experience.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2014 47th Annual IEEE/ACM International Symposium on Microarchitecture

自引率

0.00%

发文量