Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors最新文献

筛选
英文 中文
Contention-conscious transaction ordering in embedded multiprocessors 嵌入式多处理器中具有竞争意识的事务排序
M. Khandelia, S. Bhattacharyya
{"title":"Contention-conscious transaction ordering in embedded multiprocessors","authors":"M. Khandelia, S. Bhattacharyya","doi":"10.1109/ASAP.2000.862398","DOIUrl":"https://doi.org/10.1109/ASAP.2000.862398","url":null,"abstract":"This paper explores the problem of efficiently ordering interprocessor communication operations in statically-scheduled multiprocessors for iterative dataflow graphs. In most digital signal processing applications, the throughput of the system is significantly affected by communication costs. By explicitly modeling these costs within an effective graph-theoretic analysis framework, we show that ordered transaction schedules can significantly outperform self-timed schedules even when synchronization costs are low. However, we also show that when communication latencies are non-negligible, finding an optimal transaction order given a static schedule is an NP-complete problem, and that this intractability holds both under iterative and non-iterative execution. We develop new heuristics for finding efficient transaction orders, and perform an experimental comparison to gauge the performance of these heuristics.","PeriodicalId":387956,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors","volume":"179 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126765777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Block-update parallel processing QRD-RLS algorithm for throughput improvement with low power consumption 块更新并行处理QRD-RLS算法在低功耗下提高吞吐量
Lijun Gao, K. Parhi
{"title":"Block-update parallel processing QRD-RLS algorithm for throughput improvement with low power consumption","authors":"Lijun Gao, K. Parhi","doi":"10.1109/ASAP.2000.862393","DOIUrl":"https://doi.org/10.1109/ASAP.2000.862393","url":null,"abstract":"In this paper, a block-update parallel processing algorithm is proposed for increasing the throughput of the CORDIC-based QRD-RLS filtering with low power consumption. The proposed algorithm employs single-state-update parallel processing, and with this algorithm, the throughput of a block-by-block weight-update QRD-RLS filter can be increased at the cost of linear increase in hardware resource. However, the proposed algorithm does not change the iteration bounds and clock frequency of the QRD-RLS filters. As a result, the functional units need not be pipelined and the power consumption only increases linearly instead of quadratically. Due to non-pipelining and less power consumption, a higher folding factor can be used for a folding transformation and a great reduction in hardware resource can be achieved without exceeding the physical limitation on pipelining level and power density. Therefore, the proposed algorithm can serve as an important stage in designing and mapping a QRD-RLS filter onto physical hardware or computing resources, and thus is better for both ASIC chip design and parallel computing when block-by-block weight-update is applicable.","PeriodicalId":387956,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127719175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Architecture of an image rendering co-processor for MPEG-4 systems 用于MPEG-4系统的图像渲染协处理器体系结构
Mladen Berekovic, P. Pirsch, T. Selinger, Kai-Immo Wels, C. Miro, A. Lafage, C. Heer, G. Ghigo
{"title":"Architecture of an image rendering co-processor for MPEG-4 systems","authors":"Mladen Berekovic, P. Pirsch, T. Selinger, Kai-Immo Wels, C. Miro, A. Lafage, C. Heer, G. Ghigo","doi":"10.1109/ASAP.2000.862374","DOIUrl":"https://doi.org/10.1109/ASAP.2000.862374","url":null,"abstract":"The TANGRAM VLSI co-processor is intended as a building block for use in system-on-chip (SOC) designs for the versatile MPEG-4 multimedia standard. It is designed to perform the computation intensive final step of MPEG-4 video decoding: compositing of scenes at the display. This includes warping and alpha blending of multiple full-screen video textures in real-lime. TANGRAM consists of a RISC control processor and multiple powerful arithmetic units that perform rendering calculations directly in hardware. This hybrid architecture enables adaptation to changes in algorithms or software support for different video-formats. Communication to a host CPU and video decoding hardware is done via the very common PI-bus on-chip interface. TANGRAM directly interfaces with the ITU-R601/656 digital video output. VHDL implementation and synthesis for a 0.35 /spl mu/ standard-cell library provide an estimate of 100 MHz achievable clock-frequency (worst-case), 52 mm/sup 2/ overall area and 1 Watt power dissipation. TANGRAM has sufficient performance for rendering of MPEG-4 Main Profile@Layer3 scenes (CCIR).","PeriodicalId":387956,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors","volume":"39 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131894500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Partitioning conditional data flow graphs for embedded system design 嵌入式系统设计中条件数据流图的划分
M. Auguin, L. Bianco, Laurent Capella, E. Gresset
{"title":"Partitioning conditional data flow graphs for embedded system design","authors":"M. Auguin, L. Bianco, Laurent Capella, E. Gresset","doi":"10.1109/ASAP.2000.862404","DOIUrl":"https://doi.org/10.1109/ASAP.2000.862404","url":null,"abstract":"The complexity of embedded applications increases continuously. Integration advances provides a rising range of possibilities to implement a system on a chip. The designers are faced to the difficult challenge to select the right units to implement the application functionalities so that the silicon area is minimized and the time constraints of the application are met. This paper presents an effective method to design system architectures which operates on a conditional data flow graph which is well suited to represent signal processing applications.","PeriodicalId":387956,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors","volume":"225 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115659248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
A Booth multiplier accepting both a redundant or a non redundant input with no additional delay 布斯乘法器,接受冗余或非冗余输入,没有额外的延迟
M. Daumas, D. Matula
{"title":"A Booth multiplier accepting both a redundant or a non redundant input with no additional delay","authors":"M. Daumas, D. Matula","doi":"10.1109/ASAP.2000.862391","DOIUrl":"https://doi.org/10.1109/ASAP.2000.862391","url":null,"abstract":"Past recorders have added critical path delay for the more frequent case where both inputs are non redundant. Our proposed circuit does not lengthen the time of one multiplication compared to the state-of-the-art encoding, if both inputs are non redundant. We have slightly modified an existing cell to accept a redundant binary number in place of the non redundant number by changing some connections. The recoding operators associated with a high level quantity (the fraction range) all defined in this paper are used to rule out some possibilities as inputs of this newly created cell. We check that the modified cell yields the correct output for the remaining possible inputs.","PeriodicalId":387956,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125198002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
High level modeling for parallel executions of nested loop algorithms 嵌套循环算法并行执行的高级建模
E. Deprettere, E. Rijpkema, P. Lieverse, B. Kienhuis
{"title":"High level modeling for parallel executions of nested loop algorithms","authors":"E. Deprettere, E. Rijpkema, P. Lieverse, B. Kienhuis","doi":"10.1109/ASAP.2000.862380","DOIUrl":"https://doi.org/10.1109/ASAP.2000.862380","url":null,"abstract":"High level modeling and (quantitative) performance analysis of signal processing systems requires high level models for the applications (algorithms) and the implementations (architecture), a mapping of the former into the latter and a simulator for fast execution of the whole. Signal processing algorithms are very often nested-loop algorithms with a high degree of inherent parallelism. This paper presents-for such applications-suitable application and implementation models, a method to convert a given imperative executable specification to a specification in terms of the application model, a method to map this specification into an architecture specification in terms of the implementation model, and a method to analyze the performance through simulation. The methods and tools ore illustrated by means of an example.","PeriodicalId":387956,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117345814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
A multiplication-free parallel architecture for affine transformation 仿射变换的无乘法并行结构
Wael Badawy, M. Bayoumi
{"title":"A multiplication-free parallel architecture for affine transformation","authors":"Wael Badawy, M. Bayoumi","doi":"10.1109/ASAP.2000.862375","DOIUrl":"https://doi.org/10.1109/ASAP.2000.862375","url":null,"abstract":"This paper presents a novel low power parallel architecture for computing affine transformation (AT). It is based on a new multiplication-free algorithm that employs the inherent algebraic properties of the AT. Low power has been achieved at the algorithmic level by replacing the multiplication with shifting operation, at the architecture level by using parallel computational units, and at the circuit level by using low power cells. The proposed architecture can be used as a computational kernel in object-based video processing. It is compatible with MPEG-4 and VRML standards. The architecture has been prototyped in 0.6 /spl mu/m CMOS technology with three layers of metal.","PeriodicalId":387956,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors","volume":"141 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132184367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Integration of high-performance ASICs into reconfigurable systems providing additional multimedia functionality 将高性能asic集成到可重构系统中,提供额外的多媒体功能
H. Blume, Hans-Martin Blüthgen, C. Henning, Patrick Osterloh
{"title":"Integration of high-performance ASICs into reconfigurable systems providing additional multimedia functionality","authors":"H. Blume, Hans-Martin Blüthgen, C. Henning, Patrick Osterloh","doi":"10.1109/ASAP.2000.862379","DOIUrl":"https://doi.org/10.1109/ASAP.2000.862379","url":null,"abstract":"The computational power of many future multimedia applications is beyond the capabilities of today's multimedia systems. Therefore, the integration of additional high-performance multimedia components is most decisive. This paper presents the integration of multimedia components into computer systems using reconfigurable coprocessor boards. The goal of these reconfigurable platforms which can be adapted to several applications and which include digital signal processors, controlling and memory devices as well as dedicated multimedia ASICs is worked out. On the way to such a platform four ASICs for image and text processing are presented. The integration of these components into a computing system using a CardBus-based coprocessor board is shown.","PeriodicalId":387956,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115306093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Control for high-speed PE arrays 高速PE阵列控制
M. Herbordt, Honghai Zhang, Calvin Lin, H. Rao, J. Cravy
{"title":"Control for high-speed PE arrays","authors":"M. Herbordt, Honghai Zhang, Calvin Lin, H. Rao, J. Cravy","doi":"10.1109/ASAP.2000.862395","DOIUrl":"https://doi.org/10.1109/ASAP.2000.862395","url":null,"abstract":"Although arrays of SIMD PEs can be built with very high operating frequencies, problems exist in keeping the array busy. The inherent mismatch between host and array makes it difficult to maintain high array utilization: either the rate of instruction issue is very low or PE data locality is compromised, having the same effect. Our solution is based on an array control unit (ACU) design that expands macro instructions in two stages, first by data tile and then into microinstructions. The expansion itself solves the issue problem; decoupling the expansion modalities maintains data locality. Several issues involving host/ACU interaction need to be resolved to effect this solution.","PeriodicalId":387956,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126259505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Tradeoff analysis and architecture design of a hybrid hardware/software sorter 混合硬件/软件分选器的权衡分析与架构设计
M. Bednara, O. Beyer, J. Teich, R. Wanka
{"title":"Tradeoff analysis and architecture design of a hybrid hardware/software sorter","authors":"M. Bednara, O. Beyer, J. Teich, R. Wanka","doi":"10.1109/ASAP.2000.862400","DOIUrl":"https://doi.org/10.1109/ASAP.2000.862400","url":null,"abstract":"Sorting long sequences of keys is a problem that occurs in many different applications. For embedded systems, a uniprocessor software solution is often not applicable due to the low performance, while realizing multiprocessor sorting methods on parallel computers is much too expensive with respect to power consumption, physical weight, and cost. We investigate cost/performance tradeoffs for hybrid sorting algorithms that use a mixture of sequential merge sort and systolic insertion sort techniques. We propose a scalable architecture for integer sorting that consists of a uniprocessor and an FPGA-based parallel systolic co-processor. Speedups obtained analytically and experimentally and depending on hardware (cost) constraints are determined as a function of time constants of the uniprocessor and the co-processor.","PeriodicalId":387956,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128378219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信