2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)最新文献

筛选
英文 中文
Workload-dependent relative fault sensitivity and error contribution factor of GPU onchip memory structures GPU片上存储结构与工作负载相关的相对故障灵敏度和误差贡献因子
Ronak Shah, Minsu Choi, B. Jang
{"title":"Workload-dependent relative fault sensitivity and error contribution factor of GPU onchip memory structures","authors":"Ronak Shah, Minsu Choi, B. Jang","doi":"10.1109/SAMOS.2013.6621134","DOIUrl":"https://doi.org/10.1109/SAMOS.2013.6621134","url":null,"abstract":"GPU (Graphics Processing Unit) is emerging as an efficient and scalable accelerator for data-parallel workloads in various applications ranging from tablet PCs to HPC (High Performance Computing) mainframes. Unlike traditional 3D graphics rendering, general-purpose compute applications demand stringent assurance of reliability. Therefore, single error tolerance schemes such as SECDED (Single Error Correcting Double Error Detecting) code are being rapidly introduced to high-end GPUs targeting high-performance general-purpose computing. However, relative fault sensitivity and error contribution of critical on-chip memory structures such as active mask stack (AMS), register file (REG) and local memory (MEM) are yet to be studied. Also, implications of single error tolerance on various GPGPU (General Purpose computing on GPU) workloads have not been quantitatively analyzed to reveal its relative cost/fault-tolerance efficiency. To address this issue, a novel Monte Carlo simulation framework has been explored in this work to enumerate and analyze well-converged fault injection data. Instead of estimating AVF (Architectural Vulnerability Factor) of each structure individually, we have injected faults to a whole memory (AMS, REG and MEM combined) in a structure-oblivious fashion. Then, we further categorized and analyzed each structure's relative fault sensitivity and error contribution factor. Finally, we have studied implications of single error tolerance on the memory structures by further considering eight different possible ECC profiles. Results show that relative fault sensitivity and error contribution of REG is highest among the considered memory structures; therefore, ECC (Error Correction Code) protection of REG is most critical and cost-effective.","PeriodicalId":382307,"journal":{"name":"2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130605730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
NoC links energy reduction through link voltage scaling NoC通过链路电压缩放来降低链路能量
Andrea Mineo, M. Palesi, G. Ascia, V. Catania
{"title":"NoC links energy reduction through link voltage scaling","authors":"Andrea Mineo, M. Palesi, G. Ascia, V. Catania","doi":"10.1109/SAMOS.2013.6621113","DOIUrl":"https://doi.org/10.1109/SAMOS.2013.6621113","url":null,"abstract":"The power dissipated by the links of a network-on-chip (NoC) accounts for a significant fraction of the overall power dissipated by the on-chip communication fabric. Such fraction becomes more relevant as technology shrinks. This paper presents a technique aimed at reducing the energy consumption of the NoC by means of link voltage swing reduction. The basic idea is run-time varying the link voltage swing based on the communication requirements in terms of reliability. Specifically, the voltage swing of the link is reduced when it has to transmit the flits of a packet belonging to a communication which admits a bit error rate higher than the usual. The experiments carried out on both synthetic and real traffic patterns show the effectiveness of the proposed technique which allows to save more than 20% of energy depending on the communications reliability requirements.","PeriodicalId":382307,"journal":{"name":"2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122582569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Compiler-aided methodology for low overhead on-line testing 用于低开销在线测试的编译器辅助方法
G. Nazarian, R. M. Seepers, C. Strydis, G. Gaydadjiev
{"title":"Compiler-aided methodology for low overhead on-line testing","authors":"G. Nazarian, R. M. Seepers, C. Strydis, G. Gaydadjiev","doi":"10.1109/SAMOS.2013.6621126","DOIUrl":"https://doi.org/10.1109/SAMOS.2013.6621126","url":null,"abstract":"Reliability is emerging as an important design criterion in modern systems due to increasing transient fault rates. Hardware fault-tolerance techniques, commonly used to address this, introduce high design costs. As alternative, software Signature-Monitoring (SM) schemes based on compiler assertions are an efficient method for control-flow-error detection. Existing SM techniques do not consider application-specific-information causing unnecessary overheads. In this paper, compile-time Control-Flow-Graph (CFG) topology analysis is used to place best-suited assertions at optimal locations of the assembly code to reduce overheads. Our evaluation with representative workloads shows fault-coverage increase with overheads close to Assertion-based Control-Flow Correction (ACFC), the method with lowest overhead. Compared to ACFC, our technique improves (on average) fault coverage by 17%, performance overhead by 5% and power-consumption by 3% with equal code-size overhead.","PeriodicalId":382307,"journal":{"name":"2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128306219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
What cloud computing can teach us about embedded many-core programming? 关于嵌入式多核编程,云计算能教给我们什么?
A. Vajda
{"title":"What cloud computing can teach us about embedded many-core programming?","authors":"A. Vajda","doi":"10.1109/SAMOS.2013.6621097","DOIUrl":"https://doi.org/10.1109/SAMOS.2013.6621097","url":null,"abstract":"While seemingly worlds apart, cloud computing is confronted with many of the similar issues than embedded systems: power consumption, energy efficiency, optimal usage of resources such as processing cores and memory etc. This talk will explore how solutions and programming paradigms emerging in the cloud computing space can be used in the embedded space - and vice versa.","PeriodicalId":382307,"journal":{"name":"2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125623249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Abstraction of polychronous dataflow specifications into mode-automata 将多同步数据流规范抽象为模式自动机
J. Ouy, M. Kracht, S. Shukla
{"title":"Abstraction of polychronous dataflow specifications into mode-automata","authors":"J. Ouy, M. Kracht, S. Shukla","doi":"10.1109/SAMOS.2013.6621103","DOIUrl":"https://doi.org/10.1109/SAMOS.2013.6621103","url":null,"abstract":"For reactive embedded software - designed by composing existing reactive components - ensuring correctness is not straightforward. The possibility of deadlock across components, mismatch in temporal behaviors at the connected interface signals, etc., could lead to non-reactivity or subtle bugs. Behavioral interface theories have been proposed for checking compatibility of components when reactive modules are being composed. Depending on the models of computation, various intermediate notions of behavioral interfaces may be defined. In the case of polychronous components, the clock relations and the data dependencies at the interfaces are usually used for checking compatibility. However, if the behavior of a component is time variant, these abstractions are insufficient to establish correctness of composition. To capture time varying behavior, we propose to add an automaton based abstraction based on predicates abstraction. This paper describes the extraction of the abstraction, along with proofs of equivalence and the description of a practical implementation of the technique.","PeriodicalId":382307,"journal":{"name":"2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132865641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Parallelizing general histogram application for CUDA architectures CUDA架构下的并行化通用直方图应用
Ugljesa Milic, Isaac Gelado, Nikola Puzovic, Alex Ramírez, M. Tomasevic
{"title":"Parallelizing general histogram application for CUDA architectures","authors":"Ugljesa Milic, Isaac Gelado, Nikola Puzovic, Alex Ramírez, M. Tomasevic","doi":"10.1109/SAMOS.2013.6621100","DOIUrl":"https://doi.org/10.1109/SAMOS.2013.6621100","url":null,"abstract":"Histogramming is a tool commonly used in data analysis. Although its serial version is simple to implement, providing an efficient and scalable way to parallelize it can be challenging. This especially holds in case of platforms that contain one or several massively parallel devices like CUDA-capable GPUs due to issues with domain decomposition, use of global memory and similar. In this paper we compare two approaches for implementing general purpose histogramming on GPUs. The first algorithm is based on private copies of bin counters stored in shared memory for each block of threads. The second one uses the Thrust library to sort the input elements and then to search for upper bounds according to bin widths. For both algorithms we analyze how the speedup over the sequential version depends on the size of input collection, number of bins, and the type and distribution of input elements. We also implement overlapping of data transfers between host CPU and CUDA device with kernel execution. For both algorithms we analyze the pros and cons in detail. For example, privatization strategy can be up to 2x faster than sort-search with realistic inputs, but can only support a limited number of bins. On the other hand, sort-search strategy has about 50% higher speedup than privatization when we use characters as input and can support unlimited number of bins. Finally, we perform an exploration to determine the optimal algorithm depending on the characteristics and values of input parameters.","PeriodicalId":382307,"journal":{"name":"2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132276214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Parallel implementation of real-time semi-global matching on embedded multi-core architectures 嵌入式多核架构下实时半全局匹配的并行实现
Oliver Jakob Arndt, Daniel Becker, C. Banz, H. Blume
{"title":"Parallel implementation of real-time semi-global matching on embedded multi-core architectures","authors":"Oliver Jakob Arndt, Daniel Becker, C. Banz, H. Blume","doi":"10.1109/SAMOS.2013.6621106","DOIUrl":"https://doi.org/10.1109/SAMOS.2013.6621106","url":null,"abstract":"Embedded real-time algorithms are often realized with dedicated hardware, exhibiting high production costs and low programming flexibility thereafter. For instance, semi-global matching for stereo image processing, including complex data flows, traditionally runs on customized hardware modules. Combining the processing and memory capabilities of multiple individual cores, emerging embedded multi-core technologies address these problems. However, considering concurrency issues (e.g., data races and lock contentions), parallel programming requires experienced programmers and technology-specific techniques (e.g., synchronization libraries) and tools (e.g., parallel profilers), which are often not available on embedded platforms. In this work, we introduce a parallel version of a semi-global matching algorithm and demonstrate within this case study runtime optimizations necessary to meet real-time requirements. We also show structured steps of the applied parallelization workflow, illustrating an efficient migration strategy to multi-core platforms using runtime information (e.g., profiles and hardware counters). Finally, to evaluate the resulting performance characteristics, we compare the runtime behavior of the parallel version running on a Freescale P4080 processor with reference values taken on an Intel i7, a field-programmable logic device, an extended general purpose processor and a GPU.","PeriodicalId":382307,"journal":{"name":"2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117008903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Dataflow computing with Polymorphic Registers 基于多态寄存器的数据流计算
C. Ciobanu, G. Gaydadjiev, C. Pilato, D. Sciuto
{"title":"Dataflow computing with Polymorphic Registers","authors":"C. Ciobanu, G. Gaydadjiev, C. Pilato, D. Sciuto","doi":"10.1109/SAMOS.2013.6621140","DOIUrl":"https://doi.org/10.1109/SAMOS.2013.6621140","url":null,"abstract":"Heterogeneous systems are becoming increasingly popular for data processing. They improve performance of simple kernels applied to large amounts of data. However, sequential data loads may have negative impact. Data parallel solutions such as Polymorphic Register Files (PRFs) can potentially accelerate applications by facilitating high speed, parallel access to performance-critical data. Furthermore, by PRF customization, specific data path features are exposed to the programmer in a very convenient way. PRFs allow additional control over the registers dimensions, and the number of elements which can be simultaneously accessed by computational units. This paper shows how PRFs can be integrated in dataflow computational platforms. In particular, starting from an annotated source code, we present a compiler-based methodology that automatically generates the customized PRFs and the enhanced computational kernels that efficiently exploit them.","PeriodicalId":382307,"journal":{"name":"2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"20 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123572713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SWAN-iCare: A smart wearable and autonomous negative pressure device for wound monitoring and therapy SWAN-iCare:一种智能可穿戴和自主负压设备,用于伤口监测和治疗
I. Texier, P. Marcoux, P. Pham, M. Muller, P. Benhamou, M. Correvon, G. Dudnik, G. Voirin, N. Bue, J. Cristensen, M. Laurenza, G. Gazzara, Andreas Raptopoulos, A. Bartzas, D. Soudris, C. Saxby, T. Navarro, F. Francesco, P. Salvo, M. Romanelli, B. Paggi, L. Lymberopoulos
{"title":"SWAN-iCare: A smart wearable and autonomous negative pressure device for wound monitoring and therapy","authors":"I. Texier, P. Marcoux, P. Pham, M. Muller, P. Benhamou, M. Correvon, G. Dudnik, G. Voirin, N. Bue, J. Cristensen, M. Laurenza, G. Gazzara, Andreas Raptopoulos, A. Bartzas, D. Soudris, C. Saxby, T. Navarro, F. Francesco, P. Salvo, M. Romanelli, B. Paggi, L. Lymberopoulos","doi":"10.1109/SAMOS.2013.6621116","DOIUrl":"https://doi.org/10.1109/SAMOS.2013.6621116","url":null,"abstract":"The EU FP7 SWAN-iCare project aims at developing an integrated autonomous device for the monitoring and the personalized management of chronic wounds, mainly diabetic foot ulcers and venous leg ulcers. Most foot and leg ulcers are caused by diabetes and vascular problems respectively but a remarkable number of them are also due to the co-morbidity influence of many other diseases (e.g. kidney disease, congestive heart failure, high blood pressure, inflammatory bowel disease). More than 10 million people in Europe suffer from chronic wounds, a number which is expected to grow due to the aging of the population. The core of the project is the fabrication of a conceptually new wearable negative pressure device equipped with Information and Communication Technologies. Such device will allow users to: (a) accurately monitor many wound parameters via non-invasive integrated micro-sensors, (b) early identify infections and (c) remotely provide an innovative personalized two-line therapy via non-invasive micro-actuators to supplement the negative pressure wound therapy. This paper describes the main components of the SWAN-iCare system and its potential impact in the area of wound management.","PeriodicalId":382307,"journal":{"name":"2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117229611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Verilog-based simulation of hardware support for data-flow concurrency on multicore systems 基于verilog的多核系统数据流并发硬件支持仿真
George Matheou, P. Evripidou
{"title":"Verilog-based simulation of hardware support for data-flow concurrency on multicore systems","authors":"George Matheou, P. Evripidou","doi":"10.1109/SAMOS.2013.6621136","DOIUrl":"https://doi.org/10.1109/SAMOS.2013.6621136","url":null,"abstract":"Data-Driven Multithreading (DDM) is a threaded data-flow model that schedules threads for execution based on data availability. DDM is utilizing a Thread Scheduling Unit (TSU) for the management of the threads on sequential processors. In this work we present the hardware implementation of the TSU with synthesizable code using the Verilog HDL and its evaluation using the ISim simulator. The evaluation results show that the TSU is able to run at a maximum frequency of 180 MHz and consumes only 5% of the Xilinx Virtex-6 FPGA resources. The initial results obtained in this work will enable us to design an FPGA based DDM multicore chip consisting of several Microblaze cores driven by the TSU. Thus, we will be able to evaluate the performance of the novel threaded data-flow model and have direct comparison with the sequential model on the same hardware.","PeriodicalId":382307,"journal":{"name":"2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132500815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信