Proceedings of the 3rd International Workshop on Many-core Embedded Systems最新文献

Improved Route Selection Approaches using Q-learning framework for 2D NoCs 基于q -学习框架的2D noc改进路径选择方法

Proceedings of the 3rd International Workshop on Many-core Embedded Systems Pub Date : 2015-06-13 DOI: 10.1145/2768177.2768180

Niyati Gupta, Manoj Kumar, Ashish Sharma, M. Gaur, V. Laxmi, M. Daneshtalab, M. Ebrahimi

引用次数: 10

Parallel Programming Model for the Epiphany Many-Core Coprocessor Using Threaded MPI 基于线程MPI的Epiphany多核协处理器并行编程模型

Proceedings of the 3rd International Workshop on Many-core Embedded Systems Pub Date : 2015-06-13 DOI: 10.1145/2768177.2768183

J. Ross, D. Richie, S. Park, D. Shires

引用次数: 25

FOLCS: A Lightweight Implementation of a Cycle-accurate NoC Simulator on FPGAs 基于fpga的周期精确NoC模拟器的轻量级实现

Proceedings of the 3rd International Workshop on Many-core Embedded Systems Pub Date : 2015-06-13 DOI: 10.1145/2768177.2768182

Takahiro Naruko, K. Hiraki

引用次数: 7

On the Feasibility of Advanced Cache Indexing for High-Performance and Energy-Efficient GPGPU Computing 基于高效节能GPGPU计算的高级缓存索引可行性研究

Proceedings of the 3rd International Workshop on Many-core Embedded Systems Pub Date : 2015-06-13 DOI: 10.1145/2768177.2768179

Kyu Yeun Kim, Seunghoe Kim, Woongki Baek

{"title":"On the Feasibility of Advanced Cache Indexing for High-Performance and Energy-Efficient GPGPU Computing","authors":"Kyu Yeun Kim, Seunghoe Kim, Woongki Baek","doi":"10.1145/2768177.2768179","DOIUrl":"https://doi.org/10.1145/2768177.2768179","url":null,"abstract":"To achieve higher performance and energy efficiency, GPGPU architectures have recently begun to employ hardware caches. Adding hardware caches to GPGPUs, however, does not automatically guarantee improved performance and energy efficiency due to the thrashing in small hardware caches shared by thousands of threads. While prior work has proposed warp scheduling and cache bypassing techniques to address this issue, relatively little work has been done in the context of advanced cache indexing. To bridge this gap, this work investigates the feasibility of advanced cache indexing for high-performance and energy-efficient GPGPU computing. We first discuss the design and implementation of static and adaptive cache indexing schemes for GPGPUs. We then quantify the effectiveness of the advanced indexing schemes using GPGPU benchmarks. Our quantitative evaluation demonstrates that the advanced cache indexing schemes are promising in that they significantly outperform the conventional cache indexing scheme. In addition, for a subset of cache-sensitive benchmarks, the adaptive indexing scheme substantially outperforms the static indexing scheme by effectively identifying and utilizing high-quality indexing bits based on runtime information. Finally, our evaluation shows that the effectiveness of advanced cache indexing is sensitive to different warp schedulers, motivating further research on coordinated cache indexing and warp scheduling techniques.","PeriodicalId":374555,"journal":{"name":"Proceedings of the 3rd International Workshop on Many-core Embedded Systems","volume":"226 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134604680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Proceedings of the 3rd International Workshop on Many-core Embedded Systems 第三届多核嵌入式系统国际研讨会论文集

Proceedings of the 3rd International Workshop on Many-core Embedded Systems Pub Date : 2015-06-13 DOI: 10.1145/2768177

M. Ebrahimi, D. Goehringer

引用次数: 0

Hardware Scheduler Performance on the Plural Many-Core Architecture 多重多核架构下的硬件调度器性能

Proceedings of the 3rd International Workshop on Many-core Embedded Systems Pub Date : 2015-06-13 DOI: 10.1145/2768177.2768184

Itai Avron, R. Ginosar

引用次数: 2

Investigating the Viability of Maximum Flexibility Selection Function in Bufferless 2D Meshes 研究无缓冲二维网格中最大灵活性选择函数的可行性

Proceedings of the 3rd International Workshop on Many-core Embedded Systems Pub Date : 2015-06-13 DOI: 10.1145/2768177.2768185

M. A. A. ElMohsen, H. M. El-Boghdadi

{"title":"Investigating the Viability of Maximum Flexibility Selection Function in Bufferless 2D Meshes","authors":"M. A. A. ElMohsen, H. M. El-Boghdadi","doi":"10.1145/2768177.2768185","DOIUrl":"https://doi.org/10.1145/2768177.2768185","url":null,"abstract":"Bufferless NoCs have emerged as a solution to reduce power and area by eliminating buffers used for routing. Such networks handle contention using packet dropping or deflection. In this paper, we study the effect of MaxFlex selection function on 2D bufferless meshes for both a fixed and a variable step size. For fixed step size, we perform an analytical study for the effect of using MaxFlex with different step size on the performance of 2D bufferless meshes. The analysis indicates that, as the step size increases the traffic in the central part of the network bisection relaxes. Simulation results show that, both average packet latency and average deflection count decrease as the step size used increases. Additionally, over different sizes of meshes, the results show that the network performs best if the step size is equal 60--80% of the mesh dimension. Then, we consider using variable step size in which a packet is routed using a step size dependent on the Manhattan distance, d, between the source and destination. Simulation results show that, using MaxFlex, a step size of 60% of the distance d enhances the packet latency over using fixed step size, straight line selection function and random productive port selection function by around 29%, 97% and 99% respectively.","PeriodicalId":374555,"journal":{"name":"Proceedings of the 3rd International Workshop on Many-core Embedded Systems","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125059536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Design Methodology for Performance Maintenance of 3D Network-on-Chip with Multiplexed Through-Silicon Vias 一种具有多路硅通孔的三维片上网络性能维护的设计方法

Proceedings of the 3rd International Workshop on Many-core Embedded Systems Pub Date : 2015-06-13 DOI: 10.1145/2768177.2768178

Mostafa Said, Farhad Mehdipour, K. Murakami, M. El-Sayed

{"title":"A Design Methodology for Performance Maintenance of 3D Network-on-Chip with Multiplexed Through-Silicon Vias","authors":"Mostafa Said, Farhad Mehdipour, K. Murakami, M. El-Sayed","doi":"10.1145/2768177.2768178","DOIUrl":"https://doi.org/10.1145/2768177.2768178","url":null,"abstract":"3D integration is an emerging technology that overcomes 2D integration process limitations. The use of short Through-Silicon Vias (TSVs) introduces a significant reduction in routing area, power consumption, and delay. Though, there are still several challenges in 3D integration technology need to be addressed. It is shown in literature that reducing TSV count has a considerable effect in improving yield. The TSV multiplexing technique called TSVBOX was introduced in [1] to reduce the TSV count without affecting the direct benefits of TSVs. The TSVBOX introduces some delay to the signals to be multiplexed. In this paper, we analyse the TSVBOX timing requirements and deduce a design methodology for TSVBOX-based 3D Network-on-Chip (NoC) to overcome the TSVBOX speed degradation. Performance comparisons under different traffic patterns are conducted to verify our solution. We show that TSVBOX-based 3D NoC performance is highly dependent on the NoC traffic pattern and in most simulation scenarios we tried, it shows almost the same performance of the conventional 3D NoC.","PeriodicalId":374555,"journal":{"name":"Proceedings of the 3rd International Workshop on Many-core Embedded Systems","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124346246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

SoPHy: A Software Platform for Hybrid Resource Management of Homogeneous Many-core Accelerators 同质多核加速器的混合资源管理软件平台

Proceedings of the 3rd International Workshop on Many-core Embedded Systems Pub Date : 2015-06-13 DOI: 10.1145/2768177.2768181

Taeyoung Kim, Jintaek Kang, Sungchan Kim, S. Ha

引用次数: 0