2008 Symposium on Application Specific Processors最新文献

筛选
英文 中文
Resource Sharing in Custom Instruction Set Extensions 自定义指令集扩展中的资源共享
2008 Symposium on Application Specific Processors Pub Date : 2008-06-08 DOI: 10.1109/SASP.2008.4570779
M. Zuluaga, N. Topham
{"title":"Resource Sharing in Custom Instruction Set Extensions","authors":"M. Zuluaga, N. Topham","doi":"10.1109/SASP.2008.4570779","DOIUrl":"https://doi.org/10.1109/SASP.2008.4570779","url":null,"abstract":"Customised processor performance generally increases as additional custom instructions are added. However, performance is not the only metric that modern systems must take into account; die area and energy efficiency are equally important. Resource sharing during synthesis of instruction set extensions (ISEs) can reduce significantly the die area and energy consumption of a customised processor. This may increase the number of custom instructions that can be synthesized with a given area budget. Resource sharing involves combining the graph representations of two or more ISEs which contain a similar sub-graph. This coupling of multiple sub-graphs, if performed naively, can increase the latency of the extension instructions considerably. And yet, as we show in this paper, an appropriate level of resource sharing provides a significantly simpler design with only modest increases in average latency for extension instructions. Based on existing resource-sharing techniques, this study presents a new heuristic that controls the degree of resource sharing between a given set of custom instructions. Our main contributions are the introduction of a parametric method for exploring the trade-offs that can be achieved between instruction latency and implementation complexity, and the coupling of design-space exploration with fast area-delay models for the operators comprising each ISE. We present experimental evidence that our heuristic exposes a broad range of design points, allowing advantageous trade-offs between die area and latency to be found and exploited.","PeriodicalId":356441,"journal":{"name":"2008 Symposium on Application Specific Processors","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116925237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
An FPGA Design Space Exploration Tool for Matrix Inversion Architectures 矩阵反演体系结构的FPGA设计空间探索工具
2008 Symposium on Application Specific Processors Pub Date : 2008-06-08 DOI: 10.1109/SASP.2008.4570784
A. Irturk, Bridget Benson, Shahnam Mirzaei, R. Kastner
{"title":"An FPGA Design Space Exploration Tool for Matrix Inversion Architectures","authors":"A. Irturk, Bridget Benson, Shahnam Mirzaei, R. Kastner","doi":"10.1109/SASP.2008.4570784","DOIUrl":"https://doi.org/10.1109/SASP.2008.4570784","url":null,"abstract":"Matrix inversion is a common function found in many algorithms used in wireless communication systems. As FPGAs become an increasingly attractive platform for wireless communication, it is important to understand the tradeoffs in designing a matrix inversion core on an FPGA. This paper describes a matrix inversion core generator tool, GUSTO, that we developed to ease the design space exploration across different matrix inversion architectures. GUSTO is the first tool of its kind to provide automatic generation of a variety of general purpose matrix inversion architectures with different parameterization options. GUSTO also provides an optimized application specific architecture with an average of 59% area decrease and 3X throughput increase over its general purpose architecture. The optimized architectures generated by GUSTO provide comparable results to published matrix inversion architecture implementations, but offer the advantage of providing the designer the ability to study the tradeoffs between architectures with different design parameters.","PeriodicalId":356441,"journal":{"name":"2008 Symposium on Application Specific Processors","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132302259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
System-Level Performance Estimation for Application-Specific MPSoC Interconnect Synthesis 特定应用的MPSoC互连综合的系统级性能估计
2008 Symposium on Application Specific Processors Pub Date : 2008-06-08 DOI: 10.1109/SASP.2008.4570792
Po-Kuan Huang, Matin Hashemi, S. Ghiasi
{"title":"System-Level Performance Estimation for Application-Specific MPSoC Interconnect Synthesis","authors":"Po-Kuan Huang, Matin Hashemi, S. Ghiasi","doi":"10.1109/SASP.2008.4570792","DOIUrl":"https://doi.org/10.1109/SASP.2008.4570792","url":null,"abstract":"We present a framework for development of streaming applications as concurrent software modules running on multi-processors system-on-chips (MPSoC). We propose an iterative design space exploration mechanism to customize MPSoC architecture for given applications. Central to the exploration engine is our system-level performance estimation methodology, that both quickly and accurately determine quality of candidate architectures. We implemented a number of streaming applications on candidate architectures that were emulated on an FPGA. Hardware measurements show that our system-level performance estimation method incurs only 15% error in predicting application throughput. More importantly, it always correctly guides design space exploration by achieving 100% fidelity in quality-ranking candidate architectures. Compared to behavioral simulation of compiled code, our system-level estimator runs more than 12 times faster, and requires 7 times less memory.","PeriodicalId":356441,"journal":{"name":"2008 Symposium on Application Specific Processors","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129248596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
TRaX: A Multi-Threaded Architecture for Real-Time Ray Tracing TRaX:用于实时光线追踪的多线程架构
2008 Symposium on Application Specific Processors Pub Date : 2008-06-08 DOI: 10.1109/SASP.2008.4570794
J. Spjut, S. Boulos, D. Kopta, E. Brunvand, Spencer S. Kellis
{"title":"TRaX: A Multi-Threaded Architecture for Real-Time Ray Tracing","authors":"J. Spjut, S. Boulos, D. Kopta, E. Brunvand, Spencer S. Kellis","doi":"10.1109/SASP.2008.4570794","DOIUrl":"https://doi.org/10.1109/SASP.2008.4570794","url":null,"abstract":"Ray tracing is a technique used for generating highly realistic computer graphics images. In this paper, we explore the design of a simple but extremely parallel, multi-threaded, multi-core processor architecture that performs real-time ray tracing. Our architecture, called TRaX for Threaded Ray eXecution, consists of a set of thread states that include commonly used functional units for each thread and share large functional units through a programmable interconnect to maximize utilization. The memory system takes advantage of the application's read-only access to the scene database and write-only access to the frame buffer output to provide efficient data delivery with a relatively simple structure. Preliminary results indicate that a multi-core version of the architecture running at a modest speed of 500 MHz already provides real-time ray traced images for scenes of a complexity found in video games. We also explore the architectural impact of a ray tracer that uses procedural (computed) textures rather than image-based (look-up) textures to trade computation for reduced memory bandwidth.","PeriodicalId":356441,"journal":{"name":"2008 Symposium on Application Specific Processors","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126474407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Retargeting, Evaluating, and Generating Reconfigurable Array-Based Architectures 重定位,评估和生成可重构的基于阵列的体系结构
2008 Symposium on Application Specific Processors Pub Date : 2008-06-08 DOI: 10.1109/SASP.2008.4570783
C. Morra, João MP Cardoso, João Bispo, J. Becker
{"title":"Retargeting, Evaluating, and Generating Reconfigurable Array-Based Architectures","authors":"C. Morra, João MP Cardoso, João Bispo, J. Becker","doi":"10.1109/SASP.2008.4570783","DOIUrl":"https://doi.org/10.1109/SASP.2008.4570783","url":null,"abstract":"Coarse-grained reconfigurable architectures have proven their value as programmable accelerators for general purpose processors. For early evaluation of those architectures, we need an approach able to exploit and retarget different processing elements (PEs) while maintaining the same compilation flow. Bearing in mind those aspects, this paper describes an approach able to map, evaluate and generate reconfigurable architectures based on an array of PEs. We use Rewriting Logic to map computations described by imperative programming languages to the PEs of the target architecture, a VHDL generation step to prototype the architectures being evaluated, and a clock cycle-based simulator to achieve first assessments about the performance of those architectures. In order to show the potential of our approach, we present results of 1D coarse-grained reconfigurable arrays as accelerator softcores implemented in an FPGA, and the effects of different PE's structures and complexities.","PeriodicalId":356441,"journal":{"name":"2008 Symposium on Application Specific Processors","volume":"23 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114109162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Custom Processor Core Construction from C Code 自定义处理器核心构造从C代码
2008 Symposium on Application Specific Processors Pub Date : 2008-06-08 DOI: 10.1109/SASP.2008.4570778
Jelena Trajkovic, D. Gajski
{"title":"Custom Processor Core Construction from C Code","authors":"Jelena Trajkovic, D. Gajski","doi":"10.1109/SASP.2008.4570778","DOIUrl":"https://doi.org/10.1109/SASP.2008.4570778","url":null,"abstract":"In this paper we present a method for construction of application specific processor cores from a given C code. Our approach consists of three phases. We start by quantifying the properties of the C code in terms of operation types, available parallelism and other metrics. We then create an initial data path to exploit the available parallelism. We then apply designer guided constraints to an interactive data path refinement algorithm that attempts to reduce the number of the most expensive components while meeting the constraints. Our experimental results show that our technique scales very well with the size of the C code. We demonstrate the efficiency of our technique on wide range of applications, from standard academic benchmarks to industrial size examples like the MP3 decoder. Each processor core was constructed and refined in under a minute, allowing the designer to explore several different configurations in much less time than needed for manual design. On average, the refined core have only 23% latency overhead, twice as many block RAMs and 36% fewer slices compared to the respective manual designs.","PeriodicalId":356441,"journal":{"name":"2008 Symposium on Application Specific Processors","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126329726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Multi-core Architectures with Dynamically Reconfigurable Array Processors for the WiMAX Physical Layer 具有动态可重构阵列处理器的WiMAX物理层多核架构
2008 Symposium on Application Specific Processors Pub Date : 2008-06-08 DOI: 10.1109/SASP.2008.4570795
Wei Han, Y. Yi, M. Muir, I. Nousias, T. Arslan, A. Erdogan
{"title":"Multi-core Architectures with Dynamically Reconfigurable Array Processors for the WiMAX Physical Layer","authors":"Wei Han, Y. Yi, M. Muir, I. Nousias, T. Arslan, A. Erdogan","doi":"10.1109/SASP.2008.4570795","DOIUrl":"https://doi.org/10.1109/SASP.2008.4570795","url":null,"abstract":"Wireless internet access technologies have significant market potential, especially the WiMAX protocol which can offer data rate of tens of Mbps. A significant demand for embedded high performance WiMAX solutions is forcing designers to seek single-chip multiprocessor or multi-core systems that offer competitive advantages in terms of all performance metrics, such as speed, power and area. Through the provision of a degree of flexibility similar to that of a DSP and performance and power consumption advantages approaching that of an ASIC, emerging dynamically reconfigurable processors are proving to be strong candidates for future high performance multi-core processor systems. This paper presents several new single-chip multi-core architectures, based on newly emerging dynamically reconfigurable processor cores, for the WiMAX physical layer. A simulation platform is proposed in order to explore and implement various multi-core solutions combining different memory architectures and task partitioning schemes. The paper describes the architectures, the simulation environment, and demonstrates that up to 4.2x speedup can be achieved by employing four dynamically reconfigurable processor cores with individual local memory units.","PeriodicalId":356441,"journal":{"name":"2008 Symposium on Application Specific Processors","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124419175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Extensible On-Chip Peripherals 可扩展的片上外设
2008 Symposium on Application Specific Processors Pub Date : 2008-06-08 DOI: 10.1109/SASP.2008.4570786
Bharat Sukhwani, A. Forin, Richard Neil Pittman
{"title":"Extensible On-Chip Peripherals","authors":"Bharat Sukhwani, A. Forin, Richard Neil Pittman","doi":"10.1109/SASP.2008.4570786","DOIUrl":"https://doi.org/10.1109/SASP.2008.4570786","url":null,"abstract":"This paper describes the I/O subsystem of the eMIPS dynamically self-extensible processor. This processor, during execution, can load additional logic blocks that can perform a variety of functions from adding new instructions to the base instruction set to controlling I/O pins. A dynamically loaded logic block that acts as an I/O peripheral to software is what we term an Extensible I/O Peripheral. Additional mechanisms were added to the eMIPS design for a newly loaded Extensible On-Chip Peripheral to connect to the memory controller, to interact with system software in the discovery process, to obtain the I/O space and interrupt resources that it needs to operate correctly and finally to disconnect from it. A general purpose operating system running on eMIPS is able to verify the security level of any processor Extension before it is enabled. Because it only executes in the address space of the application that uses it, other applications are insulated against potentially malicious Extensions. We have extended the security model to Extensible On-Chip Peripherals and their software drivers. Privileged peripherals can request access to additional interface signals that are normally not available to non-privileged Extensions. These signals allow access to physical memory, interrupt lines and I/O pins. Extensible On-Chip Peripherals can interact with system software via memory-mapped I/O and can add new I/O instructions to the processor. For instance, atomic multi-register data transfers can simplify the interaction between software and interrupt routines, especially on multi-core systems.","PeriodicalId":356441,"journal":{"name":"2008 Symposium on Application Specific Processors","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123794501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Proving Functional Correctness of Weakly Programmable IPs - A Case Study with Formal Property Checking 弱可编程ip的功能正确性证明——以形式性质检验为例
2008 Symposium on Application Specific Processors Pub Date : 2008-06-08 DOI: 10.1109/SASP.2008.4570785
Sacha Loitz, Markus Wedler, C. Brehm, Timo Vogt, N. Wehn, W. Kunz
{"title":"Proving Functional Correctness of Weakly Programmable IPs - A Case Study with Formal Property Checking","authors":"Sacha Loitz, Markus Wedler, C. Brehm, Timo Vogt, N. Wehn, W. Kunz","doi":"10.1109/SASP.2008.4570785","DOIUrl":"https://doi.org/10.1109/SASP.2008.4570785","url":null,"abstract":"In recent years, designing systems-on-chip (SoCs) with domain specific and customizable embedded processors (ASIPs) has become standard practice. When compared with general purpose processors on the one hand and dedicated hardwired accelerators on the other hand, these processor cores provide new trade-offs between flexibility, energy and performance. Since they are intended to only run a restricted set of application-specific programs this knowledge is often exploited to further optimize the architecture resulting in weakly programmable IP cores. Such weakly programmable systems raise new challenges for hardware and software verification. The conventional separation of hardware and software verification based on a generic and well-defined instruction set is no longer sustainable. In this paper, we present a case study applying formal property checking to state-of-the-art designs of two weakly programmable IP blocks. A methodology is presented which is oriented at the operations of the ASIP rather than its instructions. As a by-product of our methodology for hardware verification we formalize the software restrictions exploited for optimization of the micro-architecture. We show that an automatic compliance check is feasible which certifies that the software complies with these restrictions. To our best knowledge, this is the first time that functional correctness of ASIP hardware and HW/SW compliance for a realistic design was completely verified using a formal methodology.","PeriodicalId":356441,"journal":{"name":"2008 Symposium on Application Specific Processors","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126047345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Application Acceleration with the Explicitly Parallel Operations System - the EPOS Processor 应用程序加速与显式并行操作系统- EPOS处理器
2008 Symposium on Application Specific Processors Pub Date : 2008-06-08 DOI: 10.1109/SASP.2008.4570781
Alexandros Papakonstantinou, Deming Chen, Wen-mei W. Hwu
{"title":"Application Acceleration with the Explicitly Parallel Operations System - the EPOS Processor","authors":"Alexandros Papakonstantinou, Deming Chen, Wen-mei W. Hwu","doi":"10.1109/SASP.2008.4570781","DOIUrl":"https://doi.org/10.1109/SASP.2008.4570781","url":null,"abstract":"Different approaches have been proposed over the years for automatically transforming high-level-languages (HLL) descriptions of applications into custom hardware implementations. Most of these approaches however are confined by basic block level parallelism described within the CDFGs (control-data flow graphs). In this work we propose a new high-level synthesis flow which can leverage instruction-level parallelism (ILP) beyond the boundary of the basic blocks. We extract statistical parallelism from the applications through the use of Superblocks and Hyperblocks formed by advanced front-end compilation techniques. The output of the front-end compilation is then used in our high-level synthesis in order to map the application onto a new domain-specific architecture named EPOS (explicitly parallel operations system). EPOS is a stylized micro-code driven processor equipped with novel architectural features that help take advantage of the instruction-level parallelism generated in the front-end compilation. A novel forwarding-path optimization engine is also employed during the high-level synthesis flow in order to minimize the long interconnection wires and the multiplexers in the processor. To evaluate the EPOS processor, we compare its performance with a previous domain-specific processor NISC on a common set of benchmarks. Experimental results show that significant performance gain (3.45times on average) is obtained compared to NISC.","PeriodicalId":356441,"journal":{"name":"2008 Symposium on Application Specific Processors","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131504470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信