{"title":"A data alignment technique for improving cache performance","authors":"P. Panda, Hiroshi Nakamura, N. Dutt, A. Nicolau","doi":"10.1109/ICCD.1997.628925","DOIUrl":"https://doi.org/10.1109/ICCD.1997.628925","url":null,"abstract":"We address the problem of improving the data cache performance of numerical applications-specifically, those with blocked (or tiled) loops. We present DAT, a data alignment technique utilizing array-padding, to improve program performance through minimizing cache conflict misses. We describe algorithms for selecting tile sizes for maximizing data cache utilization, and computing pad sizes for eliminating self-interference conflicts in the chosen tile. We also present a generalization of the technique to handle applications with several tiled arrays. Our experimental results comparing our technique with previous published approaches on machines with different cache configurations show consistently good performance on several benchmark programs, for a variety of problem sizes.","PeriodicalId":154864,"journal":{"name":"Proceedings International Conference on Computer Design VLSI in Computers and Processors","volume":"5 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127920425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Bacis, G. Buonanno, Fabrizio Ferrandi, F. Fummi, Luca Gerli, D. Sciuto
{"title":"Application of a testing framework to VHDL descriptions at different abstraction levels","authors":"M. Bacis, G. Buonanno, Fabrizio Ferrandi, F. Fummi, Luca Gerli, D. Sciuto","doi":"10.1109/ICCD.1997.628935","DOIUrl":"https://doi.org/10.1109/ICCD.1997.628935","url":null,"abstract":"The test problem increasingly affects the system design process and related costs and time to market. Requirements from VLSI/WSI manufacturers are for fast and reliable testability tools, with the possibility of their introduction in early phases of design. The paper presents a global toolset architecture for testability analysis and test pattern generation. Three abstraction levels are considered in this design flow, from the behavioral specifications, through RTL descriptions, down to gate level. In all these phases, VHDL is chosen as the referring description language. The paper then presents an application scenario, detailing the results achieved by the proposed methodology.","PeriodicalId":154864,"journal":{"name":"Proceedings International Conference on Computer Design VLSI in Computers and Processors","volume":"169 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123788321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"First test results of system level fault tolerant design validation through laser fault injection","authors":"W. Moreno, F. J. Falquez, J. Samson, T. Smith","doi":"10.1109/ICCD.1997.628919","DOIUrl":"https://doi.org/10.1109/ICCD.1997.628919","url":null,"abstract":"Fault tolerant design validation tests through laser fault injection (LFI) have been carried out at the Center for Microelectronics Research (CMR) of the University of South Florida (USF) by a team of scientists and engineers led by Dr. Wilfrido Moreno with cooperation from the Space and Strategic Systems Operation (SASSO) of Honeywell, Inc. The technique, demonstrated by previous work at the CMR involves the precise application of a laser pulse tailored as to power, pulse width and frequency into a very large scale integrated circuit (VLSIC) which is a component of an operating computer capable of detecting, logging and recovering from a transient fault and then proceeding with its operation. The test vehicle is the radiation hardened 32-bit processor (RH32) developed by Honeywell for the Rome Laboratory of the United States Air Force and the Laser facility is the Laser Restructuring Laboratory (LRL) of the CMR built under a grant from the Defense Advanced Research Project Agency (DARPA). Two system level series of tests have been completed. The first one involved the verification of initial demo tests performed by others on an early version of the computer which was limited to verifying that the computer detected and logged a hardware error in the register file of the central processing unit (CPU). These tests were expanded to observe the incrementing of the error count register of the same chip as laser pulses were applied. During the second series of rests, and for the first time, the result was obtained of observing the processor detect a hardware error, log and correct it and then proceed with the present instruction. The previous being evident by the data entered by the processor in the statusing registers.","PeriodicalId":154864,"journal":{"name":"Proceedings International Conference on Computer Design VLSI in Computers and Processors","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127434012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PROPHID: a heterogeneous multi-processor architecture for multimedia","authors":"J. Leijten, J. V. Meerbergen, A. Timmer, J. Jess","doi":"10.1109/ICCD.1997.628864","DOIUrl":"https://doi.org/10.1109/ICCD.1997.628864","url":null,"abstract":"PROPHID is a design method aiming at high-performance systems with a focus on high-throughput signal processing for multimedia applications. The processing and communication bandwidth requirements of such systems are very high. To obtain a good balance between performance, programmability and efficiency in terms of speed, area and power PROPHID uses a novel heterogeneous multi-processor architecture template which exploits task-level concurrency. A general purpose processor aimed at control-oriented tasks and low to medium-performance signal processing tasks, as well as application domain specific processors aimed at high-performance signal processing tasks are available in this template. Next to a central control-oriented bus a special high-throughput communication network is used to meet the high bandwidth requirements of the application domain specific processors. This paper discusses the characteristics and advantages of the PROPHID architecture showing that high performance is obtained by embedding multiple autonomous data-driven processors in a stream-based communication environment.","PeriodicalId":154864,"journal":{"name":"Proceedings International Conference on Computer Design VLSI in Computers and Processors","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130951526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Allocation and data arrival design of hard real-time systems","authors":"D. Rhodes, W. Wolf","doi":"10.1109/ICCD.1997.628900","DOIUrl":"https://doi.org/10.1109/ICCD.1997.628900","url":null,"abstract":"The paper presents new models for process activation and process scheduling for real-time embedded systems. The authors introduce a realistic, yet high-level input data arrival model which includes both polled and interrupt-driven process activation. They consider the effect of combinations of these process activation styles on a static, priority-based, preemptive scheduler. Given a set of periodic tasks and a set of resources (e.g. processors), a configuration is defined as: i) a mapping of each process to a resource; ii) assignment of priority to each process; and iii) a mapping of each interprocess communication event to either a polled or interrupt-driven implementation. They present a new method which utilizes an exact schedule analysis to determine a configuration which can meet hard real time deadlines subject to a fixed limit on the number of interrupts available per resource. Task graph examples and comparisons are used to validate the method.","PeriodicalId":154864,"journal":{"name":"Proceedings International Conference on Computer Design VLSI in Computers and Processors","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125954974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comparison between nMOS pass transistor logic style vs. CMOS complementary cells","authors":"R. Mehrotra, Massoud Pedram, Xunwei Wu","doi":"10.1109/ICCD.1997.628859","DOIUrl":"https://doi.org/10.1109/ICCD.1997.628859","url":null,"abstract":"This paper compares three different logic styles for implementing arbitrary Boolean functions of up to three inputs in terms of their layout area, delay and power dissipation. The three styles are nMOS pass transistor based design, NAND gate based design, and CMOS complementary logic design. Results of the comparison show that pass transistor based design is superior to NAND based design, but loses to CMOS complementary logic design.","PeriodicalId":154864,"journal":{"name":"Proceedings International Conference on Computer Design VLSI in Computers and Processors","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114809823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Nonenumerative path delay fault coverage estimation with optimal algorithms","authors":"D. Kagaris, S. Tragoudas, D. Karayiannis","doi":"10.1109/ICCD.1997.628896","DOIUrl":"https://doi.org/10.1109/ICCD.1997.628896","url":null,"abstract":"A recent method proposed that a lower bound on the number of path delay faults excited by a given test set can be computed using a set independent lines that form a cut. For each line in the cut a subcircuit consisting of all paths that contain the line is defined, and a lower bound to the number of excited path delay faults can be obtained by working on the respective subcircuits. A polynomial time algorithm is presented here for computing the maximum cardinality set of independent circuit lines. Experimental results show that the more the subcircuits the better the lower bound on the number of excited path delay faults is. More subcircuits may be generated only in a heuristic manner. It was proposed to consider two or more line-disjoint cuts C/sub i/. We propose a technique where only one C/sub i/ must be a cut. This scheme is based on novel algorithms, and results in more subcircuits than the previous one.","PeriodicalId":154864,"journal":{"name":"Proceedings International Conference on Computer Design VLSI in Computers and Processors","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122139789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Design and performance evaluation of a cache assist to implement selective caching","authors":"L. John, Akila Subramanian","doi":"10.1109/ICCD.1997.628916","DOIUrl":"https://doi.org/10.1109/ICCD.1997.628916","url":null,"abstract":"Conventional cache architectures exploit locality, but do so rather blindly. By forcing all references through a single structure, the cache's effectiveness on many references is reduced. This paper presents a cache assist namely the annex cache which implements a selective caching scheme. Except for filling a main cache at cold start, all entries come to the cache via the annex cache. Items referenced only rarely will be excluded from the main cache, eliminating several conflict misses. The basic premise is that an item deserves to be in the main cache only if it can prove its right to exist in the main cache by demonstrating locality. The annex cache combines the features of Jouppi's (1990) victim caches and McFarling's (1992) cache exclusion schemes. Extensive simulation studies for annex and victim caches using a variety of SPEC programs are presented in the paper. Annex caches were observed to be significantly better than conventional caches, better than victim caches in certain cases, and comparable to victim caches in other cases.","PeriodicalId":154864,"journal":{"name":"Proceedings International Conference on Computer Design VLSI in Computers and Processors","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132933327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Discrete drive selection for continuous sizing","authors":"R. Haddad, L. V. Ginneken, Narendra V. Shenoy","doi":"10.1109/ICCD.1997.628856","DOIUrl":"https://doi.org/10.1109/ICCD.1997.628856","url":null,"abstract":"Due to deep sub micron effects, accurate gate sizing is increasingly important. Most advanced methods for sizing assume that gates can be sized continuously, while most design is done with discrete gate array and standard cell libraries. To bridge this gap, this paper proposes a new approach to gate sizing which combines a small number of discrete gate sizes to approximate the ideal of continuous sizing. To date research has focussed on deciding an appropriate set of logic functions for the primitive gates in a library. The main contribution of this paper is the development of a theoretical framework for strategies of drive strength selection for libraries. We demonstrate that nearly continuous sizing can be achieved by combining gates of a small number of sizes. The experimental procedure is based on simulated annealing. The results show that with only 5 discrete sizes any continuous size within a range of two orders of magnitude can be approximated with an accuracy of 1.7%.","PeriodicalId":154864,"journal":{"name":"Proceedings International Conference on Computer Design VLSI in Computers and Processors","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132096164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An evaluation of asynchronous and synchronous design for superscalar architectures","authors":"A. Davey, D. Lloyd","doi":"10.1109/ICCD.1997.628882","DOIUrl":"https://doi.org/10.1109/ICCD.1997.628882","url":null,"abstract":"The high performance of superscalar architectures is obtained through the simultaneous execution of several machine operations upon multiple functional units. Traditional synchronous design techniques restrict the operation of these functional units to worst-case performance within discrete globally determined periods of time. However, asynchronous design techniques do not suffer from these restrictions, and so potentially promote greater utilisation of the functional units and therefore higher performance. This paper presents results from an empirical study that has been undertaken to assess the effect of asynchronous versus synchronous design techniques on the overall machine performance, and the utilisation of hardware resources. The results suggest that asynchronous design increases the opportunities for instructions to use functional units, potentially allowing equivalent performance to synchronous processors, but requiring fewer hardware resources.","PeriodicalId":154864,"journal":{"name":"Proceedings International Conference on Computer Design VLSI in Computers and Processors","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115331336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}