Taemin Lee, Dongki Kim, Hyunsun Park, S. Yoo, Sunggu Lee
{"title":"FPGA-based prototyping systems for emerging memory technologies","authors":"Taemin Lee, Dongki Kim, Hyunsun Park, S. Yoo, Sunggu Lee","doi":"10.1109/RSP.2014.6966901","DOIUrl":"https://doi.org/10.1109/RSP.2014.6966901","url":null,"abstract":"As DRAM faces scaling limit, several new memory technologies are considered as candidates for replacing or complementing DRAM main memory. Compared to DRAM, the new memories have two major differences, non-volatility and write overhead in terms of endurance, latency and power. We built two different FPGA-based evaluation boards to evaluate hardware and software designs for new-memory based main memory; one with a DRAM subsystem having parameterizable latency and non-volatility emulation, and the other with the real chips of new memory namely phase-change RAM (PRAM). We experimented primitive functions and SQLite-based benchmarks on Linux, verifying the workings of new functionalities, e.g., nonvolatility and evaluating the impacts of new memory on software performance. In our experiments, we also demonstrated the impact of new memory-aware software/hardware designs on program performance on a DRAM/PRAM hybrid memory.","PeriodicalId":394637,"journal":{"name":"2014 25nd IEEE International Symposium on Rapid System Prototyping","volume":"64 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132360896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploration and assessment of memory architectures for densely-deployed embedded sensor networks","authors":"Azim Abdool, C. Radix, Sean Rocke","doi":"10.1109/RSP.2014.6966894","DOIUrl":"https://doi.org/10.1109/RSP.2014.6966894","url":null,"abstract":"Densely-deployed embedded sensor networks are susceptible to constraints associated with contention across a shared transport medium. To improve channel reliability, as well as average power consumption across the system, densely-deployed embedded sensor networks often leverage node-based neighbourhood data aggregation strategies. The tradeoff is that individual sensor nodes will have increased memory capacity and access requirements; where access requirements are determined by the memory transport bandwidth, the nature and frequency of the memory accesses, and the latencies associated with the memory storage mechanism. Individual sensor nodes consume power both directly based on the number/nature of memory operations, and indirectly through leakage current through latent circuitry. This paper considers the impact of different memory archetypes on performance of aggregation-related algorithms by individual nodes - specifically the scalability of number of required bus transactions and memory-related latencies with data-set size. The archetypes under consideration were: linear-addressing (RAM), content-based addressing (ternary CAM), and multi-dimensional addressing (Parks'). VHDL-specified MicroBlaze-based nodes, a 32 bit data-bus, and archetypical memories were implemented on a Virtex-5 development board. Operations central to aggregation algorithms (min, sum, count) were run using each type of memory on data-sets of 8 different sizes between 8 and 1024 data-points. Results suggest that appropriate selection of local-node memory architecture, can offer performance benefits in densely deployed sensor networks.","PeriodicalId":394637,"journal":{"name":"2014 25nd IEEE International Symposium on Rapid System Prototyping","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115375202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mostafa Rizk, A. Baghdadi, M. Jézéquel, Y. Mohanna, Y. Atat
{"title":"Design and prototyping flow of NISC-based flexible MIMO turbo-equalizer","authors":"Mostafa Rizk, A. Baghdadi, M. Jézéquel, Y. Mohanna, Y. Atat","doi":"10.1109/RSP.2014.6966687","DOIUrl":"https://doi.org/10.1109/RSP.2014.6966687","url":null,"abstract":"Flexible design implementations are increasingly explored in digital communication applications to cope with diverse configurations imposed by the emerging communication standards. On the other hand, rapid hardware prototyping is a crucial requirement in system validation and performance evaluation under various use case scenarios. Adding flexibility, and hence increasing system complexity on one hand, and shrinking design time to meet with market pressure on the other hand, require a productive design approach ensuring final design quality. By eliminating the instruction set overhead, No- Instruction-Set-Computer (NISC) approach fulfills these design requirements offering static scheduling of datapath, automated RTL synthesis and allowing designer to have direct control of hardware resources. This paper presents a case study of an NISC-based implementation of a flexible low-complexity MIMO turboequalizer. The complete design and prototype flow, from architecture specification till FPGA implementation, is described in details. Using VC707 evaluation board integrating Xilinx Virtex-7 FPGA, the prototype of 2×2/4×4 spatially multiplexed MIMO system achieves a throughput of 115.8/62.4 Mega symbols per second at a clock cycle frequency of 202.67 MHz. Furthermore, the flexibility of the demonstrated prototype allows to support all communication modes defined in LTE, WiFi, WiMAX, and DVB-RCS wireless communication standards.","PeriodicalId":394637,"journal":{"name":"2014 25nd IEEE International Symposium on Rapid System Prototyping","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115227534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Lightweight task migration in embedded multi-tiled architectures using task code replication","authors":"Ashraf El Antably, Nicolas Fournel, F. Rousseau","doi":"10.1109/RSP.2014.6966898","DOIUrl":"https://doi.org/10.1109/RSP.2014.6966898","url":null,"abstract":"With such ongoing sophistication in embedded applications, higher computational powers are becoming more required. As a result, a wide transition to multi-processor system on chip has been adopted. Our study focuses on fully distributed memory MPSoC which is implemented in multi-tiled architecture. A tile contains at least one processor and associated peripherals with a distributed network processor which is responsible for inter-tile communications. All tiles are connected in a 3D torus network. In this paper, we present a solution for a lightweight task migration on such architectures. It is based on wise task code replication in statically chosen locations (tiles). It provides the system with the ablility to remap its tasks at runtime. This work emphasizes on solving all issues arising from communication inconsistency shedding the light on implementation details. Experiments show the effectiveness of the approach, and detail performances and limitations. The solution has been implemented on a multi-tiled virtual ARM-CortexA9 based platform with an embedded operating system.","PeriodicalId":394637,"journal":{"name":"2014 25nd IEEE International Symposium on Rapid System Prototyping","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115300473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Balasubramanian, A. Dubey, W. Otte, W. Emfinger, P. Kumar, G. Karsai
{"title":"A Rapid Testing Framework for a Mobile Cloud","authors":"D. Balasubramanian, A. Dubey, W. Otte, W. Emfinger, P. Kumar, G. Karsai","doi":"10.1109/RSP.2014.6966903","DOIUrl":"https://doi.org/10.1109/RSP.2014.6966903","url":null,"abstract":"Mobile clouds such as network-connected vehicles and satellite clusters are an emerging class of systems that are extensions to traditional real-time embedded systems: they provide long-term mission platforms made up of dynamic clusters of heterogeneous hardware nodes communicating over ad hoc wireless networks. Besides the inherent complexities entailed by a distributed architecture, developing software and testing these systems is difficult due to a number of other reasons, including the mobile nature of such systems, which can require a model of the physical dynamics of the system for accurate simulation and testing. This paper describes a rapid development and testing framework for a distributed satellite system. Our solutions include a modeling language for configuring and specifying an application's interaction with the middleware layer, a physics simulator integrated with hardware in the loop to provide the system's physical dynamics and the integration of a network traffic tool to dynamically vary the network bandwidth based on the physical dynamics.","PeriodicalId":394637,"journal":{"name":"2014 25nd IEEE International Symposium on Rapid System Prototyping","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129904168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
F. Magalhães, S. J. Filho, Oliver B. Longhi, Fabiano Hessel
{"title":"Embedded cluster-based architecture with high level support - presenting the HC-MPSoC","authors":"F. Magalhães, S. J. Filho, Oliver B. Longhi, Fabiano Hessel","doi":"10.1109/RSP.2014.6966899","DOIUrl":"https://doi.org/10.1109/RSP.2014.6966899","url":null,"abstract":"Multiprocessor System-on-Chip (MPSoC) can be found in almost every market branch and its design typically presents several restrictions such as chip area and energy consumption. State-of-art MPSoCs uses networks-on-chip as the primary communication infrastructure and the tendency is that NoC-based systems will still be used for a long time, thanks to a greater design flexibility and also a high communication bandwidth and parallelism. However, such systems also have certain usage restrictions, such as the location of the tasks that compose the application. Mapping and partitioning techniques seek to solve this problem or at least reduce it to a non critical point by diving tasks along the architecture but are not always completely successful. In this context, cluster-based architectures emerges as a viable alternative to MPSoCs. This type of system typically has a hybrid architecture on its constitution, using more than one communication infrastructure, thus being able to group elements by affinity and still use high-speed communication channels, such as NoCs. In this way, the presented work introduces the HC-MPSoC, an architecture for cluster-based intrachip systems, which uses buses and a NoC in a joint way, forming groups of elements independently distributed throughout the platform. The extensions made on the HellfireOS in order to execute it over the hybrid architecture are also presented. All HC-MPSoC modules as well as the HellfireOS modules and the results obtained using the platform are presented along the text.","PeriodicalId":394637,"journal":{"name":"2014 25nd IEEE International Symposium on Rapid System Prototyping","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116083248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
N. Hili, Christian Fabre, Sophie Dupuy-Chessa, D. Rieu
{"title":"A model-driven approach for embedded system prototyping and design","authors":"N. Hili, Christian Fabre, Sophie Dupuy-Chessa, D. Rieu","doi":"10.1109/RSP.2014.6966688","DOIUrl":"https://doi.org/10.1109/RSP.2014.6966688","url":null,"abstract":"Embedded System (ES) development complexity is increasing. This increase has several cumulative sources: some are directly related to constraints on the ES themselves (dependability, compute intensive, resource constraints) while other sources are related to the industrial context of their development (fast prototyping, early validation, parallelization of developments). Although several Model-Driven Engineering (MDE) processes have been proposed for ES development, most of them are not completely formalized. This has several drawbacks that prevent their use in prototyping where iterations need to be short and focused. Incomplete formalized processes tend to be sidestepped in these situations where quick results are expected to be obtained with limited effort. In this paper we propose a MDE-based process for ES development. This process precisely defines the development tasks and their impact on the models throughout development. In particular we define iterations width and depth for the process that allow for a fined-grained and consistent planning of developments. The short and well defined iterations characterized by the process reduce the gap between rapid prototyping, ad-hoc methods and regular development processes.","PeriodicalId":394637,"journal":{"name":"2014 25nd IEEE International Symposium on Rapid System Prototyping","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116618588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A multi-stage thermal management strategy for 3D multicores","authors":"Dipika Suresh, Ashutosh Kumar Singh, Akash Kumar","doi":"10.1109/RSP.2014.6966896","DOIUrl":"https://doi.org/10.1109/RSP.2014.6966896","url":null,"abstract":"3D integration technology has the potential to enhance IC performance, improve functionality and lessen wiring of ICs. However, it poses several challenges, where the key challenge is heat generation from internal active layers due to power dissipation. To mitigate this challenge, thermal aware design has become a necessity. Towards thermal aware design, this paper proposes a two stage design technique. In the first stage, a temperature-power thermal model is created to calculate power dissipated by an IC at an input temperature. The proposed model calculates power dissipated by 2D and 3D ICs with an average error of 0.37% and 25% respectively. Power calculation helps in process variation, validation of power models and minimization of temperature gradients. In the second stage, thermal aware mapping is performed for the ICs. For thermal aware mapping, three mapping algorithms are proposed to account for different resource (processor) availability scenarios. Each algorithm utilizes temperature-power thermal model (from the first design stage) to map applications to processing elements in a 3D IC. The proposed two stage design technique performs faster temperature to power calculations than existing techniques. It provides a simplified approach to mapping compared to existing techniques by utilizing power dissipated by processing elements to map applications.","PeriodicalId":394637,"journal":{"name":"2014 25nd IEEE International Symposium on Rapid System Prototyping","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129300793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}