2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools最新文献

筛选
英文 中文
Composable Dynamic Voltage and Frequency Scaling and Power Management for Dataflow Applications 数据流应用的可组合动态电压和频率缩放和电源管理
K. Goossens, Dongrui She, Aleksandar Milutinovic, A. Molnos
{"title":"Composable Dynamic Voltage and Frequency Scaling and Power Management for Dataflow Applications","authors":"K. Goossens, Dongrui She, Aleksandar Milutinovic, A. Molnos","doi":"10.1109/DSD.2010.61","DOIUrl":"https://doi.org/10.1109/DSD.2010.61","url":null,"abstract":"Composability means that the behaviour of an application, including its timing, is not affected by the absence or presence of other applications. It is required to be able to design, test, and verify applications independently. In this paper we define composable dynamic voltage and frequency scaling (DVFS) hardware, and composable power management. We ensure that the functional and temporal behaviours of an application are not affected by other applications, even when they are power managed. For dataflow applications with worst-case execution times per task, our power management is also predictable, i.e. guarantees end-to-end real-time requirements, even when the application is mapped on multiple processors that are power managed independently. Our method can be used with various DVFS architectures, such as on-chip and off-chip VF regulators. Our FPGA implementation models a system with multiple tiles, each containing a processor with local memory running a real-time operating system (RTOS) and power management. Tiles are interconnected by a network on chip, and communicate using shared memories. Experiments indicate energy savings of 68% w.r.t. no power management, and 40% w.r.t. power gating only. We also demonstrate composability and predictability on the platform in the presence of power management.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124937127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Hardware-Based Speed Up of Face Recognition Towards Real-Time Performance 基于硬件的人脸识别实时性提升
I. Sajid, Sotirios G. Ziavras, M. M. Ahmed
{"title":"Hardware-Based Speed Up of Face Recognition Towards Real-Time Performance","authors":"I. Sajid, Sotirios G. Ziavras, M. M. Ahmed","doi":"10.1109/DSD.2010.45","DOIUrl":"https://doi.org/10.1109/DSD.2010.45","url":null,"abstract":"Real-time face recognition by computer systems is required in many commercial and security applications since it is the only way to protect privacy and security. On the other hand, face recognition generates huge amounts of data in real-time. Filtering out meaningful data from this raw data with high accuracy is a complex task. Most of the existing techniques primarily focus on the accuracy aspect using extensive matrix-oriented computations. Efficient realizations primarily reduce the computational space using eigenvalues. On the other hand, an eigenvalues oriented evaluation has minimum time complexity of O (n3), where n is the rank of the covariance matrix, the computation cost for co-variance generation is extra. Our frequency distribution curve (FDC) technique avoids matrix decomposition and other high computationally intensive matrix operations. FDC is formulated with a bias towards efficient hardware realization and high accuracy by using simple vector operations. FDC requires pattern vector (PV) extraction from an image within O (n2) time. Our enhanced FDC-based architecture proposed in this paper further shifts a computationally expensive component of FDC to the offline layer of the system, thus resulting in very fast online evaluation of the input data. Furthermore, efficient online testing is pursued as well using an adaptive controller (AC) for PV classification utilizing the Euclidian vector norm length. The pipelined AC architecture adapts to the availability of resources in the target silicon device. Our implementation on an XC5VSX50t FPGA demonstrates a high accuracy of 99% in face recognition for 400 images in the ORL database, generally requiring less than 200 nsec per image.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126184990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A New High-Level Methodology for Programming FPGA-Based Smart Camera 一种基于fpga的智能摄像机高级编程方法
Nicolas Roudel, F. Berry, J. Sérot, L. Eck
{"title":"A New High-Level Methodology for Programming FPGA-Based Smart Camera","authors":"Nicolas Roudel, F. Berry, J. Sérot, L. Eck","doi":"10.1109/DSD.2010.68","DOIUrl":"https://doi.org/10.1109/DSD.2010.68","url":null,"abstract":"Due to the various devices composing a smart camera system, various languages have to be known by the designer (like HDL and C/C++). Most of vision applications designers are software programmers and do not have a good knowledge of HDLs (VHDL). This paper presents a new high-level methodology for implementing vision applications on smart camera platforms. This methodology is based on a soft-core approach to manage the whole system and a dataflow (actor-oriented) language to design the processing elements. We discuss in particular interfacing constraints.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121720070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Design of Trace-Based Split Array Caches for Embedded Applications 嵌入式应用中基于跟踪的分割阵列缓存设计
A. Tokarnia, Marina Tachibana
{"title":"Design of Trace-Based Split Array Caches for Embedded Applications","authors":"A. Tokarnia, Marina Tachibana","doi":"10.1109/DSD.2010.33","DOIUrl":"https://doi.org/10.1109/DSD.2010.33","url":null,"abstract":"Since many embedded systems execute a predefined set of programs, tuning system components to application programs and data is the approach chosen by many design techniques to optimize performance and power consumption. In this paper, we propose a method based on the analysis of accesses to vector, arrays, and other complex data structures to design a size-constrained two-partition array cache. This method reorganizes the ways of set-associative arrays caches into partitions with different line sizes and defines array-partition mappings so as to minimize the average memory access energy-delay product. Experimental results have shown that these split array caches have lower average energy-delay product for memory accesses as compared with unified set-associative array caches of the same size. For an MPEG-2 decoder, even with no parallel accesses to cache partitions, the average memory access energy-delay product of an 8K-byte trace-based split array cache is reduced by 50% as compared to that of the unified set-associative array cache with the lowest energy-delay product. If 25% of the accesses occur in pairs, there is an additional reduction of 9%.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128021811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
ALOE-Based Flexible LDPC Decoder 基于芦荟的柔性LDPC解码器
Ismael Gómez Miguelez, Massimo Camatel, J. Bracke, V. Marojevic, A. Gelonch, F. Vacca, G. Masera
{"title":"ALOE-Based Flexible LDPC Decoder","authors":"Ismael Gómez Miguelez, Massimo Camatel, J. Bracke, V. Marojevic, A. Gelonch, F. Vacca, G. Masera","doi":"10.1109/DSD.2010.107","DOIUrl":"https://doi.org/10.1109/DSD.2010.107","url":null,"abstract":"Radio communications terminals and infrastructure tend to support an increasing range of algorithms and radio access technologies. Flexible processing platforms are therefore needed for supporting multi-standard or heterogeneous radios. Channel decoding is one of the most computing demanding digital signal processing blocks of a radio transceiver. At the same time, it provides a high degree of implementation flexibility as well as facilitates dynamic parameter adjustments. This paper presents a flexible LDPC decoder implemented on an FPGA device following the ALOE middleware design paradigm. We analyse the middleware efficiency in terms of flexibility versus resource requirements. The results show a relative middleware area overhead of 32 %.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133345681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Design Methodology for a High Performance Robust DVB-S2 Decoder Implementation 一种高性能稳健DVB-S2解码器实现的设计方法
F. Berthelot, François Charot, Charles Wagner, C. Wolinski
{"title":"Design Methodology for a High Performance Robust DVB-S2 Decoder Implementation","authors":"F. Berthelot, François Charot, Charles Wagner, C. Wolinski","doi":"10.1109/DSD.2010.40","DOIUrl":"https://doi.org/10.1109/DSD.2010.40","url":null,"abstract":"The new Digital Video Broadcasting Satellite (DVB-S2) standard is able to provide capacity gains of about30% over the previous standard by using a powerfull Forward Error Correction (FEC) scheme based on very large LDPC code words and BCH codes. The implementation of the DVBS2FEC decoder is a big challenge. The designer must deal with the overall design complexity and the decoding throughput in order to obtain a high decoding performance in terms of bit error rate (BER). We present in detail a complete design flow allowing a better understanding of the algorithm in terms of complexity, performance and its hardware implementation. We focus on complexity-performance trade-offs due to message quantizations and we compare its effects on several algorithm corrections used to check nodes for DVB-S2 decoding. The simulation results show that the best compromise between complexity and performance is obtained for the FOMS algorithm approximation.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130576842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Class of Recursive Networks on a Chip for Enhancing Intercluster Parallelism 一类增强集群间并行性的芯片递归网络
Masaru Takesue
{"title":"A Class of Recursive Networks on a Chip for Enhancing Intercluster Parallelism","authors":"Masaru Takesue","doi":"10.1109/DSD.2010.46","DOIUrl":"https://doi.org/10.1109/DSD.2010.46","url":null,"abstract":"Future VLSI technologies will allow for multiple clusters each of a number of processing nodes to be put on a single chip. Although we may then be able to select a network topology matching an application assigned to each cluster, it may be difficult to decide the topologies of connections between the (intra)cluster networks for effective parallel processing by the cooperation of clusters. To alleviate the problem, this paper proposes a class of recursive networks, RNs, of which constituent networks can have different topologies and sizes in different recursive levels but also in the same level. In RN, the last-level networks define the cluster networks, and the level-i network associated with a cluster network defines the i-th intercluster network between the cluster and another cluster. The cluster and intercluster networks can be any kinds of standard networks, such as the mesh and bus. The paper presents a partition-based method of generating RN and its routing and layout methods.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132157979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Performance Analysis of 90nm Look Up Table (LUT) for Low Power Application 低功耗90nm查找表(LUT)性能分析
Deepak Kumar, Pankaj Kumar, M. Pattanaik
{"title":"Performance Analysis of 90nm Look Up Table (LUT) for Low Power Application","authors":"Deepak Kumar, Pankaj Kumar, M. Pattanaik","doi":"10.1109/DSD.2010.72","DOIUrl":"https://doi.org/10.1109/DSD.2010.72","url":null,"abstract":"This paper provides a detailed performance analysis of low power and high speed Look up Table (LUT) by using a circuit technique. Proper sizing of all the sleep transistors are done in the LUT to achieve an optimum power –delay relationship so that it can be used for fast growing low power applications. Also, we have implemented a benchmark circuit (8 × 10) encoder in Virtex-4, 90nm FPGA. As compared to the traditional 4-input LUT design, proposed design saves 12.8% of average power in high speed mode and 56.7% in low power mode with a little compromise in its speed.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133606865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
System Level Synthesis for Ultra Low-Power Wireless Sensor Nodes 超低功耗无线传感器节点的系统级综合
Muhammad Adeel Pasha, Steven Derrien, O. Sentieys
{"title":"System Level Synthesis for Ultra Low-Power Wireless Sensor Nodes","authors":"Muhammad Adeel Pasha, Steven Derrien, O. Sentieys","doi":"10.1109/DSD.2010.88","DOIUrl":"https://doi.org/10.1109/DSD.2010.88","url":null,"abstract":"Engineering hardware platform for a Wireless Sensor Network (WSN) node is known to be a tough challenge, as the design must enforce many severe constraints, among which energy dissipation is by far the most challenging one. Today, most of the WSN node platforms are based on low cost and low-power programmable micro controllers, even if it is acknowledged that their energy efficiency remains limited and hinders the wide-spreading of WSN to new applications. In this paper, we propose a complete system level flow for an alternative approach based on the concept of hardware micro-tasks, which relies on hardware specialization and power gating to dramatically improve the energy efficiency of the computational part of the node. Early estimates show power saving by more than one order of magnitude over MCU-based implementations.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126735253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
A Packet Classifier Using a Parallel Branching Program Machine 使用并行分支程序机的包分类器
Hiroki Nakahara, Tsutomu Sasao, M. Matsuura
{"title":"A Packet Classifier Using a Parallel Branching Program Machine","authors":"Hiroki Nakahara, Tsutomu Sasao, M. Matsuura","doi":"10.1109/DSD.2010.18","DOIUrl":"https://doi.org/10.1109/DSD.2010.18","url":null,"abstract":"A branching program machine (BM) is a special purpose processor that uses only two kinds of instructions: Branch and output instructions. Thus, the architecture for the BM is much simpler than that for a general purpose processor (MPU). Since the BM uses the dedicated instructions for a special purpose application, it is faster than the MPU. This paper presents a packet classifier using a parallel branching program machine (PBM). To reduce computation time and code size, first, a set of rules for the packet classifier is partitioned into groups. Then, they are evaluated by the PBM in parallel. Also, this paper shows a method to estimate the number of necessary BMs to realize the packet classifier. The PBM32 consisting of 32 BMs has been implemented on an FPGA, and compared with the Intel's Core2Duo@1.2GHz. The PBM32 is 8.1-11.1 times faster than the Core2Duo, and the PBM32 requires only 0.2-10.3 percent of the memory for the Core2Duo.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115796301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信