测量NUMA片上系统架构中软件对象的内存访问延迟

Daniela Genius
{"title":"测量NUMA片上系统架构中软件对象的内存访问延迟","authors":"Daniela Genius","doi":"10.1109/ReCoSoC.2013.6581525","DOIUrl":null,"url":null,"abstract":"We consider streaming applications modeled as a set of tasks communicating via channels. These channels are mapped to on-chip memory of a multi-processor system on chip (MPSoC) with non-uniform memory access. In complex applications like advanced packet processing and video streaming, often only part of the data transits through the channels. Tasks also communicate via shared memory; synchronization mechanisms like locks and barriers might be required. Effects of I/O on the traffic on the interconnect also have to be taken into account, all together increasing traffic to and from memory. Our clustered MPSoC architecture is modeled with SoCLib. SocLib's design space exploration tool proposes, among others, communication channels and shared memory for inter-task communication. Each consists of one of several software objects which are mapped to on-chip memory. The difficulty when measuring latency is to find out which (co-)processor issued a request for a particular software object. We intervene early in the design process by monitoring the transfers on the interconnection network caused by the access to these software objects. We identify the software objects by name and trace the corresponding memory accesses. In spite of the cycle accurate bit accurate level of simulation, our method has little overhead and avoids distorting the performance results.","PeriodicalId":354964,"journal":{"name":"2013 8th International Workshop on Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Measuring memory access latency for software objects in a NUMA system-on-chip architecture\",\"authors\":\"Daniela Genius\",\"doi\":\"10.1109/ReCoSoC.2013.6581525\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We consider streaming applications modeled as a set of tasks communicating via channels. These channels are mapped to on-chip memory of a multi-processor system on chip (MPSoC) with non-uniform memory access. In complex applications like advanced packet processing and video streaming, often only part of the data transits through the channels. Tasks also communicate via shared memory; synchronization mechanisms like locks and barriers might be required. Effects of I/O on the traffic on the interconnect also have to be taken into account, all together increasing traffic to and from memory. Our clustered MPSoC architecture is modeled with SoCLib. SocLib's design space exploration tool proposes, among others, communication channels and shared memory for inter-task communication. Each consists of one of several software objects which are mapped to on-chip memory. The difficulty when measuring latency is to find out which (co-)processor issued a request for a particular software object. We intervene early in the design process by monitoring the transfers on the interconnection network caused by the access to these software objects. We identify the software objects by name and trace the corresponding memory accesses. In spite of the cycle accurate bit accurate level of simulation, our method has little overhead and avoids distorting the performance results.\",\"PeriodicalId\":354964,\"journal\":{\"name\":\"2013 8th International Workshop on Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC)\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-07-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 8th International Workshop on Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ReCoSoC.2013.6581525\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 8th International Workshop on Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ReCoSoC.2013.6581525","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

摘要

我们将流应用程序建模为一组通过通道进行通信的任务。这些通道被映射到具有非均匀存储器访问的多处理器片上系统(MPSoC)的片上存储器。在复杂的应用程序中,如高级分组处理和视频流,通常只有部分数据通过通道传输。任务也通过共享内存进行通信;可能需要锁和屏障之类的同步机制。还必须考虑到I/O对互连上的流量的影响,所有这些都会增加进出内存的流量。我们的集群MPSoC架构是用SoCLib建模的。SocLib的设计空间探索工具提出了任务间通信的通信通道和共享内存。每个由几个软件对象中的一个组成,这些对象映射到片上存储器。测量延迟的困难在于找出哪个(协同)处理器发出了对特定软件对象的请求。在设计过程的早期,我们通过监测由于访问这些软件对象而引起的互连网络上的传输来进行干预。我们通过名称识别软件对象,并跟踪相应的内存访问。尽管我们的方法具有周期精确的模拟精度,但我们的方法开销很小,并且避免了对性能结果的扭曲。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Measuring memory access latency for software objects in a NUMA system-on-chip architecture
We consider streaming applications modeled as a set of tasks communicating via channels. These channels are mapped to on-chip memory of a multi-processor system on chip (MPSoC) with non-uniform memory access. In complex applications like advanced packet processing and video streaming, often only part of the data transits through the channels. Tasks also communicate via shared memory; synchronization mechanisms like locks and barriers might be required. Effects of I/O on the traffic on the interconnect also have to be taken into account, all together increasing traffic to and from memory. Our clustered MPSoC architecture is modeled with SoCLib. SocLib's design space exploration tool proposes, among others, communication channels and shared memory for inter-task communication. Each consists of one of several software objects which are mapped to on-chip memory. The difficulty when measuring latency is to find out which (co-)processor issued a request for a particular software object. We intervene early in the design process by monitoring the transfers on the interconnection network caused by the access to these software objects. We identify the software objects by name and trace the corresponding memory accesses. In spite of the cycle accurate bit accurate level of simulation, our method has little overhead and avoids distorting the performance results.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信