Histoire & mesure最新文献

筛选
英文 中文
Hardware Threading Techniques for Multi-Threaded MPSoCs 多线程mpsoc的硬件线程技术
Histoire & mesure Pub Date : 2014-06-15 DOI: 10.1145/2613908.2613917
D. Watson, A. Ahmadinia, G. Morison, T. Buggy
{"title":"Hardware Threading Techniques for Multi-Threaded MPSoCs","authors":"D. Watson, A. Ahmadinia, G. Morison, T. Buggy","doi":"10.1145/2613908.2613917","DOIUrl":"https://doi.org/10.1145/2613908.2613917","url":null,"abstract":"Adapting software applications to embedded Multiprocessor System on Chips (MPSoCs) typically follows multithreaded design flows. To take advantage of the hardware customisations possible with MPSoCs, HardWare Threads (HWTs) can be used to increase application concurrency and throughput by forking between software and hardware execution. This paper describes how an application can be tailored to use HWTs. Using an application's Task Flow Graph and Kahn Process Networks to model software interactions with HWTs, two scheduling techniques for HWT interaction with software are presented and analysed. The scheduling techniques are evaluated based on system performance and resource consumption with a popular image processing algorithm, where performance increases of up to 3.6x were measured compared to standard implementations.","PeriodicalId":84860,"journal":{"name":"Histoire & mesure","volume":"13 1","pages":"56-59"},"PeriodicalIF":0.0,"publicationDate":"2014-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73934605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Exploring Spiking Neural Network on Coarse-Grain Reconfigurable Architectures 基于粗粒度可重构结构的脉冲神经网络研究
Histoire & mesure Pub Date : 2014-06-15 DOI: 10.1145/2613908.2613916
Hassan Anwar, Syed M. A. H. Jafri, Sergei Dytckov, M. Daneshtalab, M. Ebrahimi, A. Hemani, J. Plosila, G. Beltrame, H. Tenhunen
{"title":"Exploring Spiking Neural Network on Coarse-Grain Reconfigurable Architectures","authors":"Hassan Anwar, Syed M. A. H. Jafri, Sergei Dytckov, M. Daneshtalab, M. Ebrahimi, A. Hemani, J. Plosila, G. Beltrame, H. Tenhunen","doi":"10.1145/2613908.2613916","DOIUrl":"https://doi.org/10.1145/2613908.2613916","url":null,"abstract":"Today, reconfigurable architectures are becoming increasingly popular as the candidate platforms for neural networks. Existing works, that map neural networks on reconfigurable architectures, only address either FPGAs or Networks-on-chip, without any reference to the Coarse-Grain Reconfigurable Architectures (CGRAs). In this paper we investigate the overheads imposed by implementing spiking neural networks on a Coarse Grained Reconfigurable Architecture (CGRAs). Experimental results (using point to point connectivity) reveal that up to 1000 neurons can be connected, with an average response time of 4.4 msec.","PeriodicalId":84860,"journal":{"name":"Histoire & mesure","volume":"10 1","pages":"64-67"},"PeriodicalIF":0.0,"publicationDate":"2014-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90279397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Extending dataflow programs with throughput properties 扩展具有吞吐量属性的数据流程序
Histoire & mesure Pub Date : 2013-06-24 DOI: 10.1145/2489068.2489077
Manuel Selva, L. Morel, K. Marquet, S. Frénot
{"title":"Extending dataflow programs with throughput properties","authors":"Manuel Selva, L. Morel, K. Marquet, S. Frénot","doi":"10.1145/2489068.2489077","DOIUrl":"https://doi.org/10.1145/2489068.2489077","url":null,"abstract":"In the context of multi-core processors and the trend toward many-core, dataflow programming can be used as a solution to the parallelization problem. By decoupling computation from communication, this paradigm naturally exposes parallelism in several ways. In this work we propose language extensions for expressing throughput properties over dataflow programs together with a run-time mechanism for the observation of events meaningful to compute the effective throughput. We show the limited impact of such mechanisms on the application overall performances. We also review existing run-time adaptation mechanisms that may be used in a dataflow context to satisfy throughput requirements.","PeriodicalId":84860,"journal":{"name":"Histoire & mesure","volume":"144 1","pages":"54-57"},"PeriodicalIF":0.0,"publicationDate":"2013-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86400444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Directory based cache coherence verification logic in CMPs cache system CMPs缓存系统中基于目录的一致性验证逻辑
Histoire & mesure Pub Date : 2013-06-24 DOI: 10.1145/2489068.2489073
M. Dalui, K. Gupta, B. Sikdar
{"title":"Directory based cache coherence verification logic in CMPs cache system","authors":"M. Dalui, K. Gupta, B. Sikdar","doi":"10.1145/2489068.2489073","DOIUrl":"https://doi.org/10.1145/2489068.2489073","url":null,"abstract":"This work reports a high speed protocol verificaion logic for Chip Multiprocessors (CMPs) realizing directory based cache coherence system. A special class of cellular automata (CA) referred to as single length cycle 2-attractor CA (TACA), has been introduced to identify the inconsistencies in cache line states of processors private caches. The introduction of CA segmentation logic ensures a better efficiency in the design by reducing the number of computation steps of the verification logic by a factor of the number of segments. The cache coherence verification for a system with limited directory has also been addressed. The TACA keeps trace of the coherence status of the CMPs' cache system and memorizes any inconsistent recording done during the processors' reference. Theory has been developed to realize quick decision on the cache coherency.","PeriodicalId":84860,"journal":{"name":"Histoire & mesure","volume":"46 1","pages":"33-40"},"PeriodicalIF":0.0,"publicationDate":"2013-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88901450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Performance analysis of multi-threaded multi-core CPUs 多线程多核cpu性能分析
Histoire & mesure Pub Date : 2013-06-24 DOI: 10.1145/2489068.2489076
Vijayalakshmi Saravanan, Kaushik S, S. Krishna, P. Iit, Guwahati India, D. Kothari
{"title":"Performance analysis of multi-threaded multi-core CPUs","authors":"Vijayalakshmi Saravanan, Kaushik S, S. Krishna, P. Iit, Guwahati India, D. Kothari","doi":"10.1145/2489068.2489076","DOIUrl":"https://doi.org/10.1145/2489068.2489076","url":null,"abstract":"Processors are constantly changing and becoming more advanced. They incorporate new concepts and ideas into the architecture with each evolution. One such concept is multi-threading. It aims at increasing the processors performance by reducing its idle time. It is the ability of the processor to execute multiple threads simultaneously on different cores present inside. Multi-threading concepts have also been incorporated in embedded systems which employ either a single-core or multi-core architecture. The aim of this study is to evaluate how effectively multi-threading improves processor utilization on multiple cores by taking both single and dual core processors and evaluating the performance of each by comparing the number of instructions executed per second. The results of this study give an edge to multi-threading in a single-core processor when compared to a dual-core processor when performance aspects are considered. Our analysis helps us to design the processor architecture in such a way that we utilize both the concepts of multi-threading and multi-core architecture to achieve maximum performance. The results of Simultaneous Multi-threading (SMT) performance improvement is encouraging when compared with dual-core processors.","PeriodicalId":84860,"journal":{"name":"Histoire & mesure","volume":"72 1","pages":"49-53"},"PeriodicalIF":0.0,"publicationDate":"2013-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84024058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Co-tuning of a hybrid electronic-optical network for reducing energy consumption in embedded CMPs 降低嵌入式cmp中能量消耗的混合电子光网络的共调谐
Histoire & mesure Pub Date : 2013-06-24 DOI: 10.1145/2489068.2489070
S. Bartolini, P. Grani
{"title":"Co-tuning of a hybrid electronic-optical network for reducing energy consumption in embedded CMPs","authors":"S. Bartolini, P. Grani","doi":"10.1145/2489068.2489070","DOIUrl":"https://doi.org/10.1145/2489068.2489070","url":null,"abstract":"Nanophotonic is a promising solution for on-chip interconnection due to its intrinsic low-latency and especially low-power features, desirable especially in future chip multiprocessors (CMPs) for rich client devices. In this paper we address the co-design of the parameters of a hybrid on-chip network featuring a traditional 2D mesh and a simple photonic helper ring aimed to improve performance and reduce energy consumption. As all the CMP traffic cannot be sustained in the considered simple optical interconnection without saturating the available bandwidth, and thus inducing performance and energy degradations, we identify the subset of coherency messages that are most worth to be accelerated through the low-energy optical path.\u0000 We investigate the management/arbitration strategies for the physically shared photonic path as they are crucial for reaching an effective exploitation of optical bandwidth according to their overhead and parallelism achieved in message transmission. Our results on multithreaded benchmarks, highlight that a careful selection of the most latency-critical messages to be routed on the photonic-path along with a Multiple-Writers-Single-Reader access scheme allows execution time and energy improvements up to 19% and 5%, respectively, for the 8-core setup and up to 16% and 13% for the 16-core configuration.\u0000 Furthermore, we show that the most aggressive ring access schemes allow the adoption of a four times slower electronic NoC that trades the achieved average speedup margin to obtain 70% overall energy savings, which is extremely important in energy constrained devices.","PeriodicalId":84860,"journal":{"name":"Histoire & mesure","volume":"10 1","pages":"9-16"},"PeriodicalIF":0.0,"publicationDate":"2013-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89316950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Proposing a new task model towards many-core architecture 提出了一种面向多核架构的任务模型
Histoire & mesure Pub Date : 2013-06-24 DOI: 10.1145/2489068.2489075
A. Shimada, Balazs Gerofi, A. Hori, Y. Ishikawa
{"title":"Proposing a new task model towards many-core architecture","authors":"A. Shimada, Balazs Gerofi, A. Hori, Y. Ishikawa","doi":"10.1145/2489068.2489075","DOIUrl":"https://doi.org/10.1145/2489068.2489075","url":null,"abstract":"Many-core processors are gathering attention in the areas of embedded systems due to their power-performance ratios. To utilize cores of a many-core processor in parallel, programmers build multi-task applications that use the task models provided by operating systems. However, the conventional task models cause some scalability problems when multi-task applications are executed on many-core processors. In this paper, a new task model named Partitioned Virtual Address Space (PVAS), which solves the problems, is proposed. PVAS enhances inter-task communications of multi-task applications and averts serialization of concurrent virtual memory operations. Preliminary evaluations by using micro benchmarks showed that PVAS has the potential to promote the performance of multi-task applications that run on many-core processors.","PeriodicalId":84860,"journal":{"name":"Histoire & mesure","volume":"99 1","pages":"45-48"},"PeriodicalIF":0.0,"publicationDate":"2013-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78003885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Transparent and energy-efficient speculation on NUMA architectures for embedded MPSoCs 嵌入式mpsoc的NUMA架构的透明和节能推测
Histoire & mesure Pub Date : 2013-06-24 DOI: 10.1145/2489068.2489078
Dimitra Papagiannopoulou, R. I. Bahar, T. Moreshet, M. Herlihy, A. Marongiu, L. Benini
{"title":"Transparent and energy-efficient speculation on NUMA architectures for embedded MPSoCs","authors":"Dimitra Papagiannopoulou, R. I. Bahar, T. Moreshet, M. Herlihy, A. Marongiu, L. Benini","doi":"10.1145/2489068.2489078","DOIUrl":"https://doi.org/10.1145/2489068.2489078","url":null,"abstract":"High-end embedded systems such as smart phones, game consoles, GPS-enabled automotive systems, and home entertainment centers, are becoming ubiquitous. Like their general-purpose counterparts, and for many of the same energy-related reasons, embedded systems are turning to multicore architectures. Moreover, as the demand for more compute-intensive capabilities for embedded systems increases, these multicore architectures will evolve into many-core systems for improved performance or performance/area/Watt. These systems are often organized as cluster based Non-Uniform Memory Access (NUMA) architectures that provide the programmer with a shared-memory abstraction, with the cost of sharing memory (in terms of performance, energy, and complexity) varying substantially depending on the locations of the communicating processes. This paper investigates one of the principal challenges presented by these emerging NUMA architectures for embedded systems: providing efficient, energy-effective and convenient mechanisms for synchronization and communication. In this paper, we propose an initial solution based on hardware support for speculative synchronization.","PeriodicalId":84860,"journal":{"name":"Histoire & mesure","volume":"10 1","pages":"58-61"},"PeriodicalIF":0.0,"publicationDate":"2013-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90008809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A code generation method for system-level synthesis on ASIC, FPGA and manycore CGRA 基于ASIC、FPGA和多核CGRA的系统级综合代码生成方法
Histoire & mesure Pub Date : 2013-06-24 DOI: 10.1145/2489068.2489072
Shuo Li, Jamshaid Sarwar Malik, Shaoteng Liu, A. Hemani
{"title":"A code generation method for system-level synthesis on ASIC, FPGA and manycore CGRA","authors":"Shuo Li, Jamshaid Sarwar Malik, Shaoteng Liu, A. Hemani","doi":"10.1145/2489068.2489072","DOIUrl":"https://doi.org/10.1145/2489068.2489072","url":null,"abstract":"This paper presents a code generation method that translates an intermediate Register-Transfer Level (RTL) model of a system into its corresponding VHDL code for ASIC and FPGAs and MATLAB functions for manycores CGRAs. The intermediate representation consists of Function Implementation (FIMPs) and the glue logic. FIMPs are VHDL design units for the ASIC and FPGA implementation styles and MATLAB function templates for the CGRA implementation style, while the glue logic is a compact data structure storing Global Interconnect and Control (GLIC) information.\u0000 The automatically generated implementation codes increase the resource usage by 1.5% on the average while reducing total design effort by two orders of magnitudes.","PeriodicalId":84860,"journal":{"name":"Histoire & mesure","volume":"62 1","pages":"25-32"},"PeriodicalIF":0.0,"publicationDate":"2013-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80557884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Improving the programmability of STHORM-based heterogeneous systems with offload-enabled OpenMP 使用支持卸载的OpenMP改进基于sthorm的异构系统的可编程性
Histoire & mesure Pub Date : 2013-06-24 DOI: 10.1145/2489068.2489069
A. Marongiu, Alessandro Capotondi, Giuseppe Tagliavini, L. Benini
{"title":"Improving the programmability of STHORM-based heterogeneous systems with offload-enabled OpenMP","authors":"A. Marongiu, Alessandro Capotondi, Giuseppe Tagliavini, L. Benini","doi":"10.1145/2489068.2489069","DOIUrl":"https://doi.org/10.1145/2489068.2489069","url":null,"abstract":"Heterogeneous architectures based on one fast-clocked, moderately multicore \"host\" processor plus a many-core accelerator represent one promising way to satisfy the ever-increasing GOps/W requirements of embedded systems-on-chip. However, heterogeneous computing comes at the cost of increased programming complexity, requiring major rewrite of the applications with low-level programming style (e.g, OpenCL). In this paper we present a programming model, compiler and runtime system for a prototype board from STMicroelectronics featuring a ARM9 host and a STHORM many-core accelerator. The programming model is based on OpenMP, with additional directives to efficiently program the accelerator from a single host program. The proposed multi-ISA compilation toolchain hides all the process of outlining an accelerator program, compiling and loading it to the STHORM platform and implementing data sharing between the host and the accelerator. Our experimental results show that we achieve very close performance to hand-optimized OpenCL codes, at a significantly lower programming complexity.","PeriodicalId":84860,"journal":{"name":"Histoire & mesure","volume":"175 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2013-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79703560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信