Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors最新文献

筛选
英文 中文
A linear array parallel image processor: SliM-II 线性阵列并行图像处理器SliM-II
Hyunman Chang, S. Ong, M. Sunwoo
{"title":"A linear array parallel image processor: SliM-II","authors":"Hyunman Chang, S. Ong, M. Sunwoo","doi":"10.1109/ASAP.1997.606810","DOIUrl":"https://doi.org/10.1109/ASAP.1997.606810","url":null,"abstract":"This paper describes architectures and design of a general purpose parallel image processor chip called a SliM-II Image Processor. The chip has a linear array of 64 processing elements (PEs), operates at 30 MHz in the worst case simulation and gives 1.92 GIPS. SIiM-II can greatly reduce the inter-PE communication overhead, due to the idea of sliding that is overlapping inter-PE communication with computation. In contrast to existing array processors, each PE has a multiplier that is quite effective for convolution, template matching, etc. The instruction set can execute an ALU operation, data I/O, and inter-PE communication simultaneously in an instruction cycle. In addition, during the ALU/multiplier operation, SliM-II provides parallel load/store between the register file and on-chip memory as in DSP chips. The bandwidth of data I/O and inter-PE communication increases due to bit-parallel paths. We developed VHDL models and performed logic synthesis using the COMPASS/sup TM/ CAD tool. We used the COMPASS/sup TM/ 3.3 V 0.6 /spl mu/m standard cell library (v8r4.9.1). The total number of transistors is about 1.5 millions. The SliM-II chip is being fabricated at the LG Semiconductor Co,, Ltd. The performance estimation shows a significant improvement for algorithms requiring multiplications compared with existing array processors.","PeriodicalId":368315,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128123315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Realization of a nonlinear digital filter on a DSP array processor 非线性数字滤波器在DSP阵列处理器上的实现
H. Kwan, E. Powers, E. Swartzlander
{"title":"Realization of a nonlinear digital filter on a DSP array processor","authors":"H. Kwan, E. Powers, E. Swartzlander","doi":"10.1109/ASAP.1997.606809","DOIUrl":"https://doi.org/10.1109/ASAP.1997.606809","url":null,"abstract":"This paper presents the performance evaluation of a fast third-order Volterra digital filtering algorithm mapped onto an AT&T DSP-3 parallel processor. Five different implementations are considered. Speed-up results indicate that the \"time-skewing\" method is currently the fastest. An application to nonlinear communication channel equalization using a 64-QAM signal constellation is presented.","PeriodicalId":368315,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131200925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Low latency word serial CORDIC 低延迟字串行CORDIC
J. Villalba, T. Lang
{"title":"Low latency word serial CORDIC","authors":"J. Villalba, T. Lang","doi":"10.1109/ASAP.1997.606819","DOIUrl":"https://doi.org/10.1109/ASAP.1997.606819","url":null,"abstract":"In this paper we present a modification of the CORDIC algorithm which reduces the number of iterations almost to half by merging two successive iterations of the basic algorithm. The two coefficients per iteration are obtained with only a small increase in the cycle time by estimating one of the coefficients. A correcting iteration method is used to correct the possible errors produced by the estimate. Moreover, the modified iteration permits the reduction of the number of cycles required for the compensation of the scaling factor. The resulting architecture is word serial, working both in rotation and vectoring operation modes, presenting a low latency in comparison with the classical CORDIC approach.","PeriodicalId":368315,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131396877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Mapping multirate dataflow to complex RT level hardware models 将多速率数据流映射到复杂的RT级硬件模型
J. Horstmannshoff, Thorsten Grötker, H. Meyr
{"title":"Mapping multirate dataflow to complex RT level hardware models","authors":"J. Horstmannshoff, Thorsten Grötker, H. Meyr","doi":"10.1109/ASAP.1997.606834","DOIUrl":"https://doi.org/10.1109/ASAP.1997.606834","url":null,"abstract":"The design of digital signal processing systems typically consists of an algorithm development phase carried out at a behavioral level and the selection of an efficient hardware architecture for implementation. In order to speed up the joint optimization of algorithms and architectures, a fast path to implementation must be provided. This can be achieved efficiently by directly mapping the data flow specification of the system to an RTL target architecture by means of HDL code generation. For algorithm design, communication systems are most easily modeled using multirate data flow graphs in which no notion of time is maintained. HDL code generation introduces a cycle based timing model and maps the data flow models to RTL implementations, which are usually taken from a library. Due to the increase in ASIC design complexity, these building blocks reach a high level of functionality and have complex interfacing properties. Therefore, it becomes necessary to generate additional interfacing and controlling hardware to synthesize an operable system. In this paper, we present a new approach of mapping multirate dataflow graphs to complex RTL hardware models and derive algorithms to synthesize these high-level RTL building blocks into a complete operable system.","PeriodicalId":368315,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115952463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
A logical framework to prove properties of ALPHA programs 证明ALPHA程序性质的逻辑框架
L. Bougé, D. Cachera
{"title":"A logical framework to prove properties of ALPHA programs","authors":"L. Bougé, D. Cachera","doi":"10.1109/ASAP.1997.606825","DOIUrl":"https://doi.org/10.1109/ASAP.1997.606825","url":null,"abstract":"We present an assertional approach to prove properties of ALPHA programs. ALPHA is a functional language based on affine recurrence equations. We first present two kinds of operational semantics for ALPHA together with some equivalence and confluence properties of these semantics. We then present an attempt to provide ALPHA with an external logical framework. We therefore define a proof method based on invariants. We focus on a particular class of invariants, namely canonical invariants, that are a logical expression of the program's semantics. We finally show that this framework is well-suited to prove partial properties, equivalence properties between ALPHA programs and properties that we cannot express within the ALPHA language.","PeriodicalId":368315,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123482734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Conception and design of a RISC CPU for the use as embedded controller within a parallel multimedia architecture RISC CPU在并行多媒体架构中作为嵌入式控制器的概念与设计
S. Dogimont, M. Gumm, F. Mombers, D. Mlynek, A. Torielli
{"title":"Conception and design of a RISC CPU for the use as embedded controller within a parallel multimedia architecture","authors":"S. Dogimont, M. Gumm, F. Mombers, D. Mlynek, A. Torielli","doi":"10.1109/ASAP.1997.606846","DOIUrl":"https://doi.org/10.1109/ASAP.1997.606846","url":null,"abstract":"In this paper, the problem of defining a high performance control structure for a parallel motion estimation architecture for MPEG2 coding is addressed. Various design and architecture choices are discussed and the final architecture is described. It represents a combined MIMD-SIMD approach which is based on a small but efficient ASIP with subword parallelism.","PeriodicalId":368315,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117080294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
The processing graph method tool (PGMT) 加工图方法工具(PGMT)
R. S. Stevens
{"title":"The processing graph method tool (PGMT)","authors":"R. S. Stevens","doi":"10.1109/ASAP.1997.606832","DOIUrl":"https://doi.org/10.1109/ASAP.1997.606832","url":null,"abstract":"To acquire stare-of-the-art hardware at reduced cost, the U.S. Navy is committed to buying commercial off the shelf (COTS) computer hardware. In this rapidly changing technological world, today's hardware will be obsolete tomorrow. The Navy's complex problems often require more computational power than can be delivered by a single serial processor. The solution lies in distributed processing. However, distributed processors tend to have architecture specific languages, requiring an expensive and time-consuming manual rewrite of application software as new technology and new machines become available. The processing graph method (PGM), developed at the Naval Research Laboratory (NRL) in Washington, DC, is an architecture independent method for specifying application software for distributed architectures. Its model of computation is reconfigurable dynamic data flow: dynamic because the amount of data consumed and produced by an actor may vary from one firing to another; and reconfigurable, because a graph may be disassembled and reassembled. PGM was implemented on the Navy Standard Signal Processor (AN/UYS-2), and on VAX and Sun workstations. The PGMT project at NRL is developing a tool set that will facilitate the implementation of PGM on a given distributed architecture at relatively low cost. We describe the major features PGM and discuss the PGMT project.","PeriodicalId":368315,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116956126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
A modular element for shared buffer ATM switch fabrics 用于共享缓冲ATM交换结构的模块化元件
Mike Parks
{"title":"A modular element for shared buffer ATM switch fabrics","authors":"Mike Parks","doi":"10.1109/ASAP.1997.606848","DOIUrl":"https://doi.org/10.1109/ASAP.1997.606848","url":null,"abstract":"This paper presents the architecture of a modular element for the design of shared buffer ATM switch fabrics. The component is designed for deployment in a bit-sliced approach, and includes mechanisms to allow the number of elements in the fabric to be matched to the required aggregate bandwidth of the switch. All of the input ports must be synchronized to a Start of Cell input signal; the output ports optionally can be synchronized via an Output Hold signal. A bus forwards a portion of each incoming cell to a separate controller for identification and prioritization of the corresponding output operations. In addition to supporting width expansion for increased bandwidth, the component is designed to support depth expansion for more cell storage capacity at a given aggregate throughput. The component includes 32 one-bit inputs, 32 one-bit outputs, and 4 megabits of static RAM storage. Eight of the 100 MHz devices comprise a 32 port ATM switch fabric with an aggregate bandwidth of 20 gigabits per second and a storage capacity of 64 K/spl times/512 bits.","PeriodicalId":368315,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors","volume":"185 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121837557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A methodology for user-oriented scalability analysis 面向用户的可伸缩性分析方法
D. Royo, M. Valero-García, Antonio González, Carme Mari
{"title":"A methodology for user-oriented scalability analysis","authors":"D. Royo, M. Valero-García, Antonio González, Carme Mari","doi":"10.1109/ASAP.1997.606836","DOIUrl":"https://doi.org/10.1109/ASAP.1997.606836","url":null,"abstract":"Scalability analysis provides information about the effectiveness of increasing the number of resources of a parallel system. Several methods have been proposed which use different approaches to provide this information. This paper presents a family of analysis methods oriented to the user. The methods in this family should assist the user in estimating the benefits when increasing the system size. The key issue in the proposal is the appropriate combination of a scaling model, which reflects the way the users utilize an increasing number of resources, and a figure of merit that the user wants to improve with the larger system. Another important element in the proposal is the approach to characterize the scalability, which enables quick visual analyses and comparisons. Finally, three concrete examples of methods belonging to the proposed family are introduced in this paper.","PeriodicalId":368315,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129326340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A multiprocessor system for real time high resolution image correlation 一个实时高分辨率图像相关的多处理器系统
M. Cavadini, M. Wosnitza, M. Thaler, G. Tröster
{"title":"A multiprocessor system for real time high resolution image correlation","authors":"M. Cavadini, M. Wosnitza, M. Thaler, G. Tröster","doi":"10.1109/ASAP.1997.606843","DOIUrl":"https://doi.org/10.1109/ASAP.1997.606843","url":null,"abstract":"In this paper a dedicated multiprocessor architecture for a real time implementation of the normalized cross correlation function (NCCF) on images up to 1024x1024 pixels is presented. The computational requirements are dramatically reduced by calculating this algorithm in the frequency domain. In contrast to a standard implementation of the NCCF which inherently imposes rectangular templates, the proposed enhanced method also allows to search for free-form templates which even may include holes. The computation in the frequency domain is based on a single program multiple data (SPMD) architecture which includes a dedicated ASIC for the computation of the 1D complex FFT. Besides this specific part of the system, the image pre- and post- processing tasks are supported by general purpose DSP's. A system consisting of 4 ASIC's and 2 Sharc DSP's is able to compute the enhanced NCCF of a free form template on images of 1024x1024 pixels within 134 ms (8 frames/s).","PeriodicalId":368315,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125196580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信