{"title":"Very high (over 40 Gb/s) speed circuits for optical communications design methodology and application examples","authors":"A. Konczykowska","doi":"10.1109/DSD.2001.952304","DOIUrl":"https://doi.org/10.1109/DSD.2001.952304","url":null,"abstract":"The important increase of communication services, and particularly the Internet traffic, needs to be supported by the development of adequate communication networks. High speed electronic circuits can be successfully used in MultiGigabit-rate Time Division Multiplexing (TDM) transmission systems. In this paper we discuss main characteristics and challenges of ICs for optical communications. The design methodology, and necessary CAD tools are presented. MS-DFF design for 40 Gb/s is given as an example.","PeriodicalId":285358,"journal":{"name":"Proceedings Euromicro Symposium on Digital Systems Design","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125998383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Experimental evaluation of CPU performance features","authors":"J. Sosnowski, Rafal Jurkiewicz, J. Nowicki","doi":"10.1109/DSD.2001.952282","DOIUrl":"https://doi.org/10.1109/DSD.2001.952282","url":null,"abstract":"The paper addresses the problem of evaluating CPU performance in real system environment. We present an efficient methodology of CPU performance analysis at the architecture (coarse grained) and microarchitecture (fine grained) levels. It is based on time and internal event monitoring technique. This methodology is referred to Intel processors operating in IBM PC environment. The usefulness of the presented approach was proved in many experiments described in the paper.","PeriodicalId":285358,"journal":{"name":"Proceedings Euromicro Symposium on Digital Systems Design","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129544975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FPGA implementation of a faithful polynomial approximation for powering function computation","authors":"José-Alejandro Piñeiro, J. Bruguera, J. Muller","doi":"10.1109/DSD.2001.952292","DOIUrl":"https://doi.org/10.1109/DSD.2001.952292","url":null,"abstract":"A FPGA implementation of a method for the calculation of faithfully rounded single-precision floating-point powering (X/sup p/) is presented in this paper. A second-degree minimax polynomial approximation is used, together with the employment of table look-up, a specialized squaring unit and a fused accumulation tree. The FPGA implementation of an architecture with a latency of 3 cycles and a throughput of one result per cycle has been performed using a Xilinx XC4036XL device. The implemented unit has an operation frequency over 33 MHz.","PeriodicalId":285358,"journal":{"name":"Proceedings Euromicro Symposium on Digital Systems Design","volume":"162 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113983932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving single-thread fetch performance on a multithreaded processor","authors":"J. Moure, R. B. García, Dolores Rexachs, E. Luque","doi":"10.1109/DSD.2001.952344","DOIUrl":"https://doi.org/10.1109/DSD.2001.952344","url":null,"abstract":"Multithreaded processors, by simultaneously using both the thread-level parallelism and the instruction-level parallelism of applications, achieve larger instruction per cycle rate than single-thread processors. On a multi-thread workload, a clustered organization maximizes performances. On a single-thread workload, however, all but one of the clusters are idle, degrading single-thread performance significantly. Using a clustered multi-thread performance as a baseline, we propose and analyze several mechanisms and policies to improve single-thread execution exploiting the existing hardware without a significant multi-thread performance loss. We focus on the fetch unit, which is maybe the most performance-critical stage. Essentially, we analyze three ways of exploiting the idle fetch clusters: allowing a single thread accessing its neighbor clusters, use the idle fetch clusters to provide multiple-path execution, or use them to widen the effective single-three fetch block.","PeriodicalId":285358,"journal":{"name":"Proceedings Euromicro Symposium on Digital Systems Design","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123603243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Kuacharoen, Tankut Akgul, V. Mooney, V. Madisetti
{"title":"Adaptability, extensibility and flexibility in real-time operating systems","authors":"P. Kuacharoen, Tankut Akgul, V. Mooney, V. Madisetti","doi":"10.1109/DSD.2001.952348","DOIUrl":"https://doi.org/10.1109/DSD.2001.952348","url":null,"abstract":"In this paper, we present a mechanism for runtime updating of all kernel modules of a highly modular dynamic real-time operating system. Our approach can help solve the lack of adaptability, extensibility and flexibility of existing real-time operating systems. The dynamic real-time operating system will efficiently support a wide range of applications since any kernel module can be dynamically loaded at runtime to exactly suit the applications without necessitating a reboot of the system.","PeriodicalId":285358,"journal":{"name":"Proceedings Euromicro Symposium on Digital Systems Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121048157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Design of a faithful LNS interpolator","authors":"M. Arnold","doi":"10.1109/DSD.2001.952321","DOIUrl":"https://doi.org/10.1109/DSD.2001.952321","url":null,"abstract":"A design is given for a quadratic interpolator needed by the logarithmic number system (LNS). Unlike previous LNS designs that have attempted to produce results consistently better than a floating-paint representation of the same word size (32 bits), the design goal is to minimize memory requirements and system complexity, at the expense of a slight increase in approximation error. Simulation results have shown this goal causes only a modest impact on overall accuracy, but the memory savings are significant. Despite a slight increase in error compared to prior LNS implementations, on average, the error is still less than conventional number representations satisfying the IEEE-754 standard. Proposed applications for the interpolator include multimedia, signal processing, graphics and reconfigurable computing.","PeriodicalId":285358,"journal":{"name":"Proceedings Euromicro Symposium on Digital Systems Design","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124714794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Pipelining considerations for an FPGA case","authors":"O. Cadenas, G. Megson","doi":"10.1109/DSD.2001.952298","DOIUrl":"https://doi.org/10.1109/DSD.2001.952298","url":null,"abstract":"This paper presents a semi-synchronous pipeline scheme, here referred as single-pulse pipeline, to the problem of mapping pipelined circuits to a Field Programmable Gate Array (FPGA). Area and timing considerations are given for a general case and later applied to a systolic circuit as illustration. The single-pulse pipeline can manage asynchronous worst-case data completion and it is evaluated against two chosen asynchronous pipelining: a four-phase bundle-data pipeline and a doubly-latched asynchronous pipeline. The semi-synchronous pipeline proposal takes less FPGA area and operates faster than the two selected fully-asynchronous schemes for an FPGA case.","PeriodicalId":285358,"journal":{"name":"Proceedings Euromicro Symposium on Digital Systems Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122080949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the optimization power of redundancy addition and removal for sequential logic optimization","authors":"E. S. Millán, L. Entrena, J. A. Espejo","doi":"10.1109/DSD.2001.952305","DOIUrl":"https://doi.org/10.1109/DSD.2001.952305","url":null,"abstract":"The paper attempts to determine the capabilities of existing redundancy addition and removal (SRAR) techniques for logic optimization of sequential circuits. To this purpose, we compare this method with the retiming and resynthesis (RaR) techniques. For the RaR case the set of possible transformations has been established by relating them to STG transformations by other authors. Following these works, we first formally demonstrate that logic transformations provided by RaR are covered by SRAR as well. Then we also show that SRAR is able to identify transformations that cannot be found by RaR. This way we prove the higher potential of the sequential redundancy addition and removal over the retiming and resynthesis techniques.","PeriodicalId":285358,"journal":{"name":"Proceedings Euromicro Symposium on Digital Systems Design","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121814768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"High-performance floating point divide","authors":"A. Liddicoat, M. Flynn","doi":"10.1109/DSD.2001.952327","DOIUrl":"https://doi.org/10.1109/DSD.2001.952327","url":null,"abstract":"In modern processors floating point divide operations often take 20 to 25 clock cycles, five times that of multiplication. Typically multiplicative algorithms with quadratic convergence are used for high-performance divide. A divide unit based on the multiplicative Newton-Raphson iteration is proposed. This divide unit utilizes the higher-order Newton-Raphson reciprocal approximation to compute the quotient fast, efficiently and with high throughput. The divide unit achieves fast execution by computing the square, cube and higher powers of the approximation directly and much faster than the traditional approach with serial multiplications. Additionally, the second, third and higher-order terms are computed simultaneously further reducing the divide latency. Significant hardware reductions have been identified that reduce the overall computation significantly and therefore, reduce the area required for implementation and the power consumed by the computation. The proposed hardware unit is designed to achieve the desired quotient precision in a single iteration allowing the unit to be fully pipelined for maximum throughput.","PeriodicalId":285358,"journal":{"name":"Proceedings Euromicro Symposium on Digital Systems Design","volume":"26 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132768139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christian Kreiner, C. Steger, E. Teiniker, R. Weiss
{"title":"A HW/SW codesign framework based on distributed DSP virtual machines","authors":"Christian Kreiner, C. Steger, E. Teiniker, R. Weiss","doi":"10.1109/DSD.2001.952284","DOIUrl":"https://doi.org/10.1109/DSD.2001.952284","url":null,"abstract":"In recent years the interest on the problem of designing mixed hardware/software systems has increased due to growing system complexities. This paper describes a hardware/software codesign framework for the design of embedded systems based on digital signal processors and FPGAs. Our approach is based on distributed DSP virtual machines for simulation and verification of the application on a Linux cluster and for running the application on different target architectures (DSPs, FPGAs) as well. The DSP virtual machines were designed to make DSP applications portable across different platforms while maintaining optimal code.","PeriodicalId":285358,"journal":{"name":"Proceedings Euromicro Symposium on Digital Systems Design","volume":"195 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131785335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}