{"title":"Very high (over 40 Gb/s) speed circuits for optical communications design methodology and application examples","authors":"A. Konczykowska","doi":"10.1109/DSD.2001.952304","DOIUrl":"https://doi.org/10.1109/DSD.2001.952304","url":null,"abstract":"The important increase of communication services, and particularly the Internet traffic, needs to be supported by the development of adequate communication networks. High speed electronic circuits can be successfully used in MultiGigabit-rate Time Division Multiplexing (TDM) transmission systems. In this paper we discuss main characteristics and challenges of ICs for optical communications. The design methodology, and necessary CAD tools are presented. MS-DFF design for 40 Gb/s is given as an example.","PeriodicalId":285358,"journal":{"name":"Proceedings Euromicro Symposium on Digital Systems Design","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125998383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Experimental evaluation of CPU performance features","authors":"J. Sosnowski, Rafal Jurkiewicz, J. Nowicki","doi":"10.1109/DSD.2001.952282","DOIUrl":"https://doi.org/10.1109/DSD.2001.952282","url":null,"abstract":"The paper addresses the problem of evaluating CPU performance in real system environment. We present an efficient methodology of CPU performance analysis at the architecture (coarse grained) and microarchitecture (fine grained) levels. It is based on time and internal event monitoring technique. This methodology is referred to Intel processors operating in IBM PC environment. The usefulness of the presented approach was proved in many experiments described in the paper.","PeriodicalId":285358,"journal":{"name":"Proceedings Euromicro Symposium on Digital Systems Design","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129544975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving single-thread fetch performance on a multithreaded processor","authors":"J. Moure, R. B. García, Dolores Rexachs, E. Luque","doi":"10.1109/DSD.2001.952344","DOIUrl":"https://doi.org/10.1109/DSD.2001.952344","url":null,"abstract":"Multithreaded processors, by simultaneously using both the thread-level parallelism and the instruction-level parallelism of applications, achieve larger instruction per cycle rate than single-thread processors. On a multi-thread workload, a clustered organization maximizes performances. On a single-thread workload, however, all but one of the clusters are idle, degrading single-thread performance significantly. Using a clustered multi-thread performance as a baseline, we propose and analyze several mechanisms and policies to improve single-thread execution exploiting the existing hardware without a significant multi-thread performance loss. We focus on the fetch unit, which is maybe the most performance-critical stage. Essentially, we analyze three ways of exploiting the idle fetch clusters: allowing a single thread accessing its neighbor clusters, use the idle fetch clusters to provide multiple-path execution, or use them to widen the effective single-three fetch block.","PeriodicalId":285358,"journal":{"name":"Proceedings Euromicro Symposium on Digital Systems Design","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123603243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Design of a faithful LNS interpolator","authors":"M. Arnold","doi":"10.1109/DSD.2001.952321","DOIUrl":"https://doi.org/10.1109/DSD.2001.952321","url":null,"abstract":"A design is given for a quadratic interpolator needed by the logarithmic number system (LNS). Unlike previous LNS designs that have attempted to produce results consistently better than a floating-paint representation of the same word size (32 bits), the design goal is to minimize memory requirements and system complexity, at the expense of a slight increase in approximation error. Simulation results have shown this goal causes only a modest impact on overall accuracy, but the memory savings are significant. Despite a slight increase in error compared to prior LNS implementations, on average, the error is still less than conventional number representations satisfying the IEEE-754 standard. Proposed applications for the interpolator include multimedia, signal processing, graphics and reconfigurable computing.","PeriodicalId":285358,"journal":{"name":"Proceedings Euromicro Symposium on Digital Systems Design","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124714794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An improved input-output encoding approach for functional decomposition","authors":"M. Venkatesan","doi":"10.1109/DSD.2001.952260","DOIUrl":"https://doi.org/10.1109/DSD.2001.952260","url":null,"abstract":"Functional decomposition is a process of representing a complex function as a function of smaller functions. The size of the decomposed and the number of don't cares it contains is determined during the encoding process. This work proposes a novel input-output encoding approach that minimizes the size of both decomposed function and introduces additional don't cares in the decomposed functions. The weighted graph approach heuristically determines the optimal encoding. The approach has been implemented and tested using the MCNC benchmarks and the decomposed functions are optimal.","PeriodicalId":285358,"journal":{"name":"Proceedings Euromicro Symposium on Digital Systems Design","volume":"287 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122212675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Architectural design of a fast floating-point multiplication-add fused unit using signed-digit addition","authors":"Chichyang Chen, Liang-An Chen, Jih-Ren Cheng","doi":"10.1109/DSD.2001.952324","DOIUrl":"https://doi.org/10.1109/DSD.2001.952324","url":null,"abstract":"Signed digit (SD) addition is applied to the design of a new floating-point (FLP) multiplication-add fused (MAF) unit. This adoption, together with the proposed two-step normalization method, can reduce the three-word-length addition that is required in the conventional FLP MAF unit to two-word-length addition. Furthermore, sign reversion of the intermediate mantissa that requires three-word-length carry propagation in the conventional MAF unit is replaced by only single-word sign detection. These two improvements can enhance the speed and cost of the MAF unit significantly. With the use of the SD addition, the circuit of the unit can be designed in a more regular and simple manner, which is a property that is desired in VLSI design. The proposed FLP MAF unit has been designed and simulated by using Verilog hardware description language. The functions of the designed unit are verified to be correct.","PeriodicalId":285358,"journal":{"name":"Proceedings Euromicro Symposium on Digital Systems Design","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117285760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Kuacharoen, Tankut Akgul, V. Mooney, V. Madisetti
{"title":"Adaptability, extensibility and flexibility in real-time operating systems","authors":"P. Kuacharoen, Tankut Akgul, V. Mooney, V. Madisetti","doi":"10.1109/DSD.2001.952348","DOIUrl":"https://doi.org/10.1109/DSD.2001.952348","url":null,"abstract":"In this paper, we present a mechanism for runtime updating of all kernel modules of a highly modular dynamic real-time operating system. Our approach can help solve the lack of adaptability, extensibility and flexibility of existing real-time operating systems. The dynamic real-time operating system will efficiently support a wide range of applications since any kernel module can be dynamically loaded at runtime to exactly suit the applications without necessitating a reboot of the system.","PeriodicalId":285358,"journal":{"name":"Proceedings Euromicro Symposium on Digital Systems Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121048157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Pipelining considerations for an FPGA case","authors":"O. Cadenas, G. Megson","doi":"10.1109/DSD.2001.952298","DOIUrl":"https://doi.org/10.1109/DSD.2001.952298","url":null,"abstract":"This paper presents a semi-synchronous pipeline scheme, here referred as single-pulse pipeline, to the problem of mapping pipelined circuits to a Field Programmable Gate Array (FPGA). Area and timing considerations are given for a general case and later applied to a systolic circuit as illustration. The single-pulse pipeline can manage asynchronous worst-case data completion and it is evaluated against two chosen asynchronous pipelining: a four-phase bundle-data pipeline and a doubly-latched asynchronous pipeline. The semi-synchronous pipeline proposal takes less FPGA area and operates faster than the two selected fully-asynchronous schemes for an FPGA case.","PeriodicalId":285358,"journal":{"name":"Proceedings Euromicro Symposium on Digital Systems Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122080949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the optimization power of redundancy addition and removal for sequential logic optimization","authors":"E. S. Millán, L. Entrena, J. A. Espejo","doi":"10.1109/DSD.2001.952305","DOIUrl":"https://doi.org/10.1109/DSD.2001.952305","url":null,"abstract":"The paper attempts to determine the capabilities of existing redundancy addition and removal (SRAR) techniques for logic optimization of sequential circuits. To this purpose, we compare this method with the retiming and resynthesis (RaR) techniques. For the RaR case the set of possible transformations has been established by relating them to STG transformations by other authors. Following these works, we first formally demonstrate that logic transformations provided by RaR are covered by SRAR as well. Then we also show that SRAR is able to identify transformations that cannot be found by RaR. This way we prove the higher potential of the sequential redundancy addition and removal over the retiming and resynthesis techniques.","PeriodicalId":285358,"journal":{"name":"Proceedings Euromicro Symposium on Digital Systems Design","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121814768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FPGA implementation of a faithful polynomial approximation for powering function computation","authors":"José-Alejandro Piñeiro, J. Bruguera, J. Muller","doi":"10.1109/DSD.2001.952292","DOIUrl":"https://doi.org/10.1109/DSD.2001.952292","url":null,"abstract":"A FPGA implementation of a method for the calculation of faithfully rounded single-precision floating-point powering (X/sup p/) is presented in this paper. A second-degree minimax polynomial approximation is used, together with the employment of table look-up, a specialized squaring unit and a fused accumulation tree. The FPGA implementation of an architecture with a latency of 3 cycles and a throughput of one result per cycle has been performed using a Xilinx XC4036XL device. The implemented unit has an operation frequency over 33 MHz.","PeriodicalId":285358,"journal":{"name":"Proceedings Euromicro Symposium on Digital Systems Design","volume":"162 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113983932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}