{"title":"Architectural considerations for application-specific counterflow pipelines","authors":"B. Childers, J. Davidson","doi":"10.1109/ARVLSI.1999.756034","DOIUrl":null,"url":null,"abstract":"Application-specific processor design is a promising approach for meeting the performance and cost goals of a system. Application-specific processors are especially promising for embedded systems (e.g., digital cameras, cellular phones, etc.) where a small increase in performance and decrease in cost can have a large impact on a product's viability. Sproull, Sutherland and Molnar (see IEEE Design and Test of Computers, vol. 11, no. 3, p. 48-59, 1994) have proposed a new pipeline organization called the Counterflow Pipeline (CFP). This paper evaluates CFP design alternatives and shows that the CFP is an ideal architecture for fast, low-cost design of high-performance processors customized for computation-intensive embedded applications. First, we describe why CFP's are particularly well-suited to realizing application-specific processors. Second we describe how a CFP tailored to an application can be constructed automatically. Third, we present measurements that evaluate CFP design trade-offs and show that CFP's provide speculative and out-of-order execution, and register renaming that is matched to an application. Fourth, we show that asynchronous counterflow pipelines achieve high-performance by reducing the average execution latency of instructions over synchronous implementations. Finally, we demonstrate that custom CFP's achieve cycles per instruction measurements that are competitive with 4-way superscalar out-of-order processors at a potentially low design complexity.","PeriodicalId":358015,"journal":{"name":"Proceedings 20th Anniversary Conference on Advanced Research in VLSI","volume":"93 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1999-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 20th Anniversary Conference on Advanced Research in VLSI","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ARVLSI.1999.756034","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
Application-specific processor design is a promising approach for meeting the performance and cost goals of a system. Application-specific processors are especially promising for embedded systems (e.g., digital cameras, cellular phones, etc.) where a small increase in performance and decrease in cost can have a large impact on a product's viability. Sproull, Sutherland and Molnar (see IEEE Design and Test of Computers, vol. 11, no. 3, p. 48-59, 1994) have proposed a new pipeline organization called the Counterflow Pipeline (CFP). This paper evaluates CFP design alternatives and shows that the CFP is an ideal architecture for fast, low-cost design of high-performance processors customized for computation-intensive embedded applications. First, we describe why CFP's are particularly well-suited to realizing application-specific processors. Second we describe how a CFP tailored to an application can be constructed automatically. Third, we present measurements that evaluate CFP design trade-offs and show that CFP's provide speculative and out-of-order execution, and register renaming that is matched to an application. Fourth, we show that asynchronous counterflow pipelines achieve high-performance by reducing the average execution latency of instructions over synchronous implementations. Finally, we demonstrate that custom CFP's achieve cycles per instruction measurements that are competitive with 4-way superscalar out-of-order processors at a potentially low design complexity.
特定于应用程序的处理器设计是满足系统性能和成本目标的一种很有前途的方法。应用专用处理器对于嵌入式系统(例如,数码相机、移动电话等)尤其有前景,在这些系统中,性能的小幅提高和成本的降低会对产品的生存能力产生很大的影响。史普罗,萨瑟兰和莫尔纳(见IEEE计算机设计与测试,第11卷,第11期)。3, p. 48-59, 1994)提出了一种新的管道组织,称为逆流管道(CFP)。本文评估了CFP设计方案,并表明CFP是一种理想的架构,可以快速、低成本地为计算密集型嵌入式应用定制高性能处理器。首先,我们描述了为什么CFP特别适合于实现特定于应用程序的处理器。其次,我们描述了为应用程序量身定制的CFP如何自动构建。第三,我们提出了评估CFP设计权衡的测量方法,并表明CFP提供推测性和乱序执行,以及与应用程序匹配的注册重命名。第四,我们展示了异步逆流管道通过减少指令在同步实现上的平均执行延迟来实现高性能。最后,我们证明了定制CFP在潜在的低设计复杂性下实现了与4路超标量乱序处理器竞争的每指令周期测量。