{"title":"A processor for staggered interval arithmetic","authors":"M. Schulte, E. Swartzlander","doi":"10.1109/ASAP.1995.522910","DOIUrl":"https://doi.org/10.1109/ASAP.1995.522910","url":null,"abstract":"The paper presents the design of a high-speed processor which performs staggered interval arithmetic. Each staggered interval is represented as the sum of a set of floating point numbers plus an interval, which consists of two floating point endpoints. Staggered interval arithmetic allows the precision of the computation to be specified and the accuracy of the result to be determined. Efficient arithmetic algorithms, which reduce the number of floating point operations needed to perform staggered interval arithmetic, are introduced. To achieve high performance, the processor employs an array of pipelined floating point arithmetic units and two long accumulators. The processor provides direct hardware support for accurate and numerically reliable vector and matrix computations.","PeriodicalId":354358,"journal":{"name":"Proceedings The International Conference on Application Specific Array Processors","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127283704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Recomputing by operand exchanging: a time-redundancy approach for fault-tolerant neural networks","authors":"Y. Hsu, E. Swartzlander, V. Piuri","doi":"10.1109/ASAP.1995.522905","DOIUrl":"https://doi.org/10.1109/ASAP.1995.522905","url":null,"abstract":"The use of neural networks in mission-critical applications requires concurrent error detection and correction at architectural level to provide high consistency and reliability of system's outputs. Time redundancy allows for fault tolerance in digital realizations with low circuit complexity increase. In this paper, we propose the use of REcomputation with eXchanged Operands-an approach based on operands' rotation-to introduce concurrent error detection and correction, when timing constraints are not particularly strict. Different architectural approaches for neural design are considered to match the implementation constraints and to show the versatility of the proposed solutions.","PeriodicalId":354358,"journal":{"name":"Proceedings The International Conference on Application Specific Array Processors","volume":"287 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124566785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Horizontal microcode compaction for programmable systolic accelerators","authors":"P. Ienne","doi":"10.1109/ASAP.1995.522908","DOIUrl":"https://doi.org/10.1109/ASAP.1995.522908","url":null,"abstract":"This paper addresses the problem of compacting microcode for complex systolic systems used as accelerators for traditional computers. For this sort of system, the purpose is to have a low-level programming paradigm that is simple enough for those users that are not completely aware of hardware details. The microcode should be issued from a high-level language application developed on the host processor. The paper introduces an effective technique to structure the microcode into elementary primitives and a simple compaction algorithm to shorten the microcode program. This compaction strategy has been tested on a real machine to implement a neural-network algorithm and some results are reported.","PeriodicalId":354358,"journal":{"name":"Proceedings The International Conference on Application Specific Array Processors","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129212872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Design and implementation of a parallel image processor chip for a SIMD array processor","authors":"M. Sunwoo, S. Ong, B. Ahn, Kyungwoo Lee","doi":"10.1109/ASAP.1995.522906","DOIUrl":"https://doi.org/10.1109/ASAP.1995.522906","url":null,"abstract":"This paper presents the design and implementation of a sliding memory plane (SliM) image processor chip to build a mesh-connected SIMD architecture called a SliM array processor. The SliM image processor chip consists of 5/spl times/5 processing elements (PEs) connected by a mesh topology. A set of SliM image processor chips can form the SliM array processor. Due to the idea of sliding, that is, overlapping inter-PE communication with computation, the SliM image processor can greatly reduce the inter-PE communication overhead, a significant disadvantage of existing SIMD array processors. In addition, using the by-passing path provides eight-way connectivity even with four physical links. This paper addresses architectures of the SliM image processor chip, the design of an instruction set, and implementation issues. The chip has 55255 gates and twenty-five 128/spl times/9-bit SRAM modules, and was simulated at 18 MHz for the worst case conditions, and will actually run at a higher clock rate. The package type is the 144 pin MQFP. We conduct the performance evaluation of the chip that shows a significant improvement.","PeriodicalId":354358,"journal":{"name":"Proceedings The International Conference on Application Specific Array Processors","volume":"562 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116285681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Baglietto, M. Maresca, A. Migliaro, M. Migliardi
{"title":"Parallel implementation of the full search block matching algorithm for motion estimation","authors":"P. Baglietto, M. Maresca, A. Migliaro, M. Migliardi","doi":"10.1109/ASAP.1995.522922","DOIUrl":"https://doi.org/10.1109/ASAP.1995.522922","url":null,"abstract":"Motion estimation is a key technique in most algorithms for video compression and particularly in the MPEG and H.261 standards. The most frequently used technique is based on a Full Search Block Matching Algorithm which is highly computing intensive and requires the use of special purpose architectures to obtain real-time performance. We propose an approach to the parallel implementation of the Full Search Block Matching Algorithm which is suitable for implementation on massively parallel architectures ranging from large scale SIMD computers to dedicated processor arrays realized in ASICs. While the first alternative can be used for the implementation of high performance coders the second alternative is particularly attractive for low cost video compression devices. This paper describes the approach proposed for the parallel implementation of the Full Search Block Matching Algorithm and the implementation of such an approach in an ASIC.","PeriodicalId":354358,"journal":{"name":"Proceedings The International Conference on Application Specific Array Processors","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128249854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The VLSI design and implementation of the array processors of a multilayer vision system architecture","authors":"B. Saha, J. S. Mertoguno, N. Bourbakis","doi":"10.1109/ASAP.1995.522913","DOIUrl":"https://doi.org/10.1109/ASAP.1995.522913","url":null,"abstract":"This paper describes the VLSI design and simulation of the lower layer processors of the KYDON vision system. KYDON is a completely autonomous, hierarchical, multilayered image understanding system. The VLSI design of the individual components as well as the timing simulation results of the processor array have been presented. The system runs at 50 MHz and promises a high processing rate of 300 image frames/sec.","PeriodicalId":354358,"journal":{"name":"Proceedings The International Conference on Application Specific Array Processors","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131640460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A solid translation engine using ray representation","authors":"T. Alexander, J. L. Ellis, Gershon Kedem","doi":"10.1109/ASAP.1995.522919","DOIUrl":"https://doi.org/10.1109/ASAP.1995.522919","url":null,"abstract":"We describe an extension to the geometric domain of solid modeling to include solids defined by spatial sweeping and Minkowski sums. We develop an efficient, parallel algorithm for the translation of such solid models. An architecture and design of an array processor that implements this algorithm are presented. We discuss some applications of the new computer to solid modeling an CAD/CAM and modeling of large biomolecules (proteins) for rational drug design.","PeriodicalId":354358,"journal":{"name":"Proceedings The International Conference on Application Specific Array Processors","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114531189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A design tool for the specification and the simulation of array processors architectures application to image processing: the extraction of regions of interests","authors":"G. Ramstein, O. Déforges, P. Bakowski","doi":"10.1109/ASAP.1995.522936","DOIUrl":"https://doi.org/10.1109/ASAP.1995.522936","url":null,"abstract":"This paper deals with a CAD tool dedicated to the design and the simulation of specific array processor architectures. These architectures are described into a specific notation which includes major characteristics of the VHDL syntax. This language provides a very concise and legible means to specify array processors. A preprocessor generates full standard VHDL code describing the behavior of the designed architecture. An original application to image processing is given: the design of a specific architecture for the extraction of regions of interests.","PeriodicalId":354358,"journal":{"name":"Proceedings The International Conference on Application Specific Array Processors","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128497131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bit level block matching systolic arrays","authors":"Y. Chan, S. Kung","doi":"10.1109/ASAP.1995.522925","DOIUrl":"https://doi.org/10.1109/ASAP.1995.522925","url":null,"abstract":"We present two bit-level systolic arrays for block matching which are designed by using a well-known methodology. Hardware complexities and speeds of both bit-level designs and conventional word-level arrays are compared by using synthesis tools. We pay special attention to a class of issues which were somewhat overlooked by previous publications, including power consumption due to high frequency, area due to routing and control, and optimal level of pipelining. Our design offers the following features: (1) The bit-level arrays are estimated to offer 200+% speed-up over word-level arrays. (2) When compared with word-level system with same throughput, the bit-level designs reduce control complexity, bus/routing area, and data buffering. (3) When dynamic power control is desired, these bit-level designs offer the flexibility of disabling some processing elements (for lower significant bits) at slight cost of picture quality. Finally, the potential promises and limitations of bit-level systolic block matching arrays, especially those concerning their integration into codec application system are investigated and discussed.","PeriodicalId":354358,"journal":{"name":"Proceedings The International Conference on Application Specific Array Processors","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121163219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The MGAP's programming environment and the *C++ language","authors":"R. Bajwa, R. Owens, M. J. Irwin","doi":"10.1109/ASAP.1995.522912","DOIUrl":"https://doi.org/10.1109/ASAP.1995.522912","url":null,"abstract":"The MGAP is a special-purpose, workstation co-processor board in which the computing elements are fine grain processors implemented as custom ASICs. In this paper we present the language *CC++, used for programming on the MGAP. Using the class concept of C++ we create special parallel data-types like bit, digit, word and array and overload operators to manipulate the parallel data required by the MGAP. The hierarchical relationships among the data-types are used by the compiler to generate parallel code for the MGAP. We demonstrate that by using the same high-level language and the same program we can operate on data at all levels of granularity, from bits to arrays, without any loss in performance.","PeriodicalId":354358,"journal":{"name":"Proceedings The International Conference on Application Specific Array Processors","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116503769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}