{"title":"Wagging Logic: Implicit Parallelism Extraction Using Asynchronous Methodologies","authors":"C. Brej","doi":"10.1109/ACSD.2010.11","DOIUrl":null,"url":null,"abstract":"Asynchronous circuits have a number of potential performance advantages over their synchronous equivalents due to the ability to exploit average case performance. These advantages are offset by the loss of performance caused by the handshaking overheads which causes designs to be throughput bound. This paper investigates the nature of the throughput problem and proposes a novel automatic approach to overcome its effect. The designs generated using the method not only cease suffering from a throughput bottleneck, but also attain the parallel computation properties despite their original sequential specification. The method is then demonstrated on a processor design. The processor demonstrates the ability of the method to implement a seven gate delay per operation super scalar microprocessor with: register locking, instruction reordering, simultaneous multi-threading, cache-banking and other complex techniques, all automatically or with minor design effort. Such a design can be constructed in days rather than the hundreds of person years required by conventional methodologies.","PeriodicalId":169191,"journal":{"name":"2010 10th International Conference on Application of Concurrency to System Design","volume":"45 11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 10th International Conference on Application of Concurrency to System Design","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ACSD.2010.11","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
Asynchronous circuits have a number of potential performance advantages over their synchronous equivalents due to the ability to exploit average case performance. These advantages are offset by the loss of performance caused by the handshaking overheads which causes designs to be throughput bound. This paper investigates the nature of the throughput problem and proposes a novel automatic approach to overcome its effect. The designs generated using the method not only cease suffering from a throughput bottleneck, but also attain the parallel computation properties despite their original sequential specification. The method is then demonstrated on a processor design. The processor demonstrates the ability of the method to implement a seven gate delay per operation super scalar microprocessor with: register locking, instruction reordering, simultaneous multi-threading, cache-banking and other complex techniques, all automatically or with minor design effort. Such a design can be constructed in days rather than the hundreds of person years required by conventional methodologies.