{"title":"Technology Options for Beyond-CMOS","authors":"I. Young","doi":"10.1145/3036669.3041225","DOIUrl":"https://doi.org/10.1145/3036669.3041225","url":null,"abstract":"CMOS integrated circuit technology for computation is at an inflexion point. Although this is the technology which has enabled the semiconductor industry to make vast progress over the past 30-plus years, it is expected to see challenges going beyond the ten year horizon, particularly from an energy efficiency point of view. Thus it is extremely important for the semiconductor industry to discover a new integrated circuit technology which can carry us to the beyond CMOS era, so that the power-performance of computing can continue to improve. Currently, researchers are exploring novel device concepts and new information tokens as an alternative for CMOS technology. Examples of areas being actively researched are; quantum electronic devices, such as the tunneling field-effect transistor (TFET), and devices based on electron spin and nano-magnetics (spintronics). It is clear that choices will need to be made in the next 10 years to identify viable alternatives for CMOS by 2025. To prioritize and guide the research exploration in materials, devices and circuits, benchmarking methodology and metrics are being used. This talk will give an overview of the beyond CMOS device research horizon and the benchmarking of these devices for computation. A more detailed investigation of circuits based upon some promising beyond-CMOS devices will follow.","PeriodicalId":269197,"journal":{"name":"Proceedings of the 2017 ACM on International Symposium on Physical Design","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131365032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. Flach, Mateus Fogaça, Jucemar Monteiro, M. Johann, R. Reis
{"title":"Rsyn: An Extensible Physical Synthesis Framework","authors":"G. Flach, Mateus Fogaça, Jucemar Monteiro, M. Johann, R. Reis","doi":"10.1145/3036669.3038249","DOIUrl":"https://doi.org/10.1145/3036669.3038249","url":null,"abstract":"Due to the advanced stage of development on EDA science, it has been increasingly difficult to implement realistic software infrastructures in academia so that new problems and solutions are tested in a meaningful and consistent way. In this paper we present Rsyn, a free and open-source C++ framework for physical synthesis research and development comprising an elegant netlist data model, analysis tools (e.g. timing analysis, congestion), optimization methods (e.g. placement, sizing, buffering) and a graphical user interface. It is designed to be very modular and incrementally extensible. New components can be easily integrated making Rsyn increasingly valuable as a framework to leverage research in physical design. Standard and third party components can be mixed together via code or script language to create a comprehensive design flow, which can be used to better assess the quality of results of the research being conducted. The netlist data model uses the new features of C++11 providing a simple but efficient way to traverse and modify the netlist. Attributes can be seamlessly added to objects and a notification system alerts components about changes in the netlist. The flexibility of the netlist inspired the name Rsyn, which comes from the word resynthesis. Rsyn is created to allow researchers to focus on what is really important to their research spending less time on the infrastructure development. Allowing the sharing and reusability of common components is also one of the main contributions of the Rsyn framework. In this paper, the key concepts of Rsyn are presented. Examples of use are drawn, the important standard components (e.g. physical layer, timing) are detailed and some case studies based on recent Electronic Design Automation (EDA) contests are analyzed. Rsyn is available at http://rsyn.design.","PeriodicalId":269197,"journal":{"name":"Proceedings of the 2017 ACM on International Symposium on Physical Design","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116479547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"100x Evolution of Video Codec Chips","authors":"Jinjia Zhou, Dajiang Zhou, S. Goto","doi":"10.1145/3036669.3038252","DOIUrl":"https://doi.org/10.1145/3036669.3038252","url":null,"abstract":"In the past two decades, there has been tremendous progress in video compression technologies. Meanwhile, the use of these technologies, along with the ever-increasing demand for emerging ultra-high-definition applications greatly challenges the design of video codec chips, with the extensive requirements on both memory (DRAM) bandwidth and computation power. Besides, the high data dependencies of video coding algorithms restrict the degree of efficient hardware parallelism and pipelining. This paper describes the techniques to realize high-performance video codec chips. Firstly, we introduce various optimization techniques to solve the DRAM traffic issue. Furthermore, the techniques to reduce the computational complexity and alleviate data dependencies are described. The proposed techniques have been implemented in several ASIC video codecs. Experiments show that the DRAM traffic and DRAM access time are reduced by 80% and 90% respectively. The performance of the video codec chips can achieve 7680x4320@120fps, which is more than 100x better than previous works.","PeriodicalId":269197,"journal":{"name":"Proceedings of the 2017 ACM on International Symposium on Physical Design","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132949361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Clock Tree Construction based on Arrival Time Constraints","authors":"Rickard Ewetz, Cheng-Kok Koh","doi":"10.1145/3036669.3036671","DOIUrl":"https://doi.org/10.1145/3036669.3036671","url":null,"abstract":"There are striking differences between constructing clock trees based on dynamic implied skew constraints and based on static arrival time constraints. Dynamic implied skew constraints allow the full timing margins to be utilized, but the constraints are required to be updated (with high time complexity). In contrast, static arrival time constraints are decoupled and are not required to be updated. Therefore, the constraints can be obtained in constant time, which facilitates the exploration of various tree topologies. On the other hand, arrival time constraints do not allow the full timing margins to be utilized. Consequently, there is a trade-off between topology exploration and timing margin utilization. In this paper, the advantages of static arrival time constraints are leveraged to construct clock trees with useful skew while exploring various tree topologies. Moreover, the constraints are specified and respecified throughout the synthesis process reduce the cost of the constructed clock trees. It is experimentally demonstrated that the proposed approach results in clock trees with 16% lower average capacitive cost compared with clock trees constructed based on dynamic implied skew constraints.","PeriodicalId":269197,"journal":{"name":"Proceedings of the 2017 ACM on International Symposium on Physical Design","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127956198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Routability Optimization for Industrial Designs at Sub-14nm Process Nodes Using Machine Learning","authors":"W. Chan, Pei-Hsin Ho, A. Kahng, Prashant Saxena","doi":"10.1145/3036669.3036681","DOIUrl":"https://doi.org/10.1145/3036669.3036681","url":null,"abstract":"Design rule check (DRC) violations after detailed routing prevent a design from being taped out. To solve this problem, state-of-the-art commercial EDA tools global-route the design to produce a global-route congestion map; this map is used by the placer to optimize the placement of the design to reduce detailed-route DRC violations. However, in sub-14nm processes and beyond, DRCs arising from multiple patterning and pin-access constraints drastically weaken the correlation between global-route congestion and detailed-route DRC violations. Hence, the placer|based on the global-route congestion map|may leave too many detailed-route DRC violations to be fixed manually by designers. In this paper, we present a method that employs (1) machine-learning techniques to effectively predict detailed-route DRC violations after global routing and (2) detailed placement techniques to effectively reduce detailed-route DRC violations. We demonstrate on several layouts of a sub-14nm industrial design that this method predicts the locations of 74% of the detailed-route DRCs (with false positive prediction rate below 0.2%) and automatically reduces the number of detailed-route DRC violations by up to 5x. Whereas previous works on machine learning for routability [30] [4] have focused on routability prediction at the floorplanning and placement stages, ours is the first paper that not only predicts the actual locations of detailed-route DRC violations but furthermore optimizes the design to significantly reduce such violations.","PeriodicalId":269197,"journal":{"name":"Proceedings of the 2017 ACM on International Symposium on Physical Design","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116871085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Challenges and Opportunities: From Near-memory Computing to In-memory Computing","authors":"Soroosh Khoram, Yue Zha, Jialiang Zhang, J. Li","doi":"10.1145/3036669.3038242","DOIUrl":"https://doi.org/10.1145/3036669.3038242","url":null,"abstract":"The confluence of the recent advances in technology and the ever-growing demand for large-scale data analytics created a renewed interest in a decades-old concept, processing-in-memory (PIM). PIM, in general, may cover a very wide spectrum of compute capabilities embedded in close proximity to or even inside the memory array. In this paper, we present an initial taxonomy for dividing PIM into two broad categories: 1) Near-memory processing and 2) In-memory processing. This paper highlights some interesting work in each category and provides insights into the challenges and possible future directions.","PeriodicalId":269197,"journal":{"name":"Proceedings of the 2017 ACM on International Symposium on Physical Design","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115465046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Stephen Yang, C. Mulpuri, Sainath Reddy, Meghraj Kalase, Srinivasan Dasasathyan, M. E. Dehkordi, Marvin Tom, R. Aggarwal
{"title":"Clock-Aware FPGA Placement Contest","authors":"Stephen Yang, C. Mulpuri, Sainath Reddy, Meghraj Kalase, Srinivasan Dasasathyan, M. E. Dehkordi, Marvin Tom, R. Aggarwal","doi":"10.1145/3036669.3038241","DOIUrl":"https://doi.org/10.1145/3036669.3038241","url":null,"abstract":"Modern FPGA device contains complex clocking architecture on top of FPGA logic fabric. To best utilize FPGA clocking architecture, both FPGA designers and EDA tool developers need to understand the clocking architecture and design best methodology/algorithm for various design styles. Clock legalization and clock aware placement become one of the key factors in FPGA design flow. They can greatly influence FPGA design performance and routability. FPGA placement problem can get very difficult with clock legalization constraints. This year's contest is a continuous challenge based on last year's routability driven placement. Contestants need to design best-in-class clock aware placement approach to excel in the contest.","PeriodicalId":269197,"journal":{"name":"Proceedings of the 2017 ACM on International Symposium on Physical Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129045096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xifan Tang, Edouard Giacomin, G. Micheli, P. Gaillardon
{"title":"Physical Design Considerations of One-level RRAM-based Routing Multiplexers","authors":"Xifan Tang, Edouard Giacomin, G. Micheli, P. Gaillardon","doi":"10.1145/3036669.3036675","DOIUrl":"https://doi.org/10.1145/3036669.3036675","url":null,"abstract":"Resistive Random Access Memory(RRAM) technology opens the opportunity for granting both high-performance and low-power features to routing multiplexers. In this paper, we study the physical design considerations related to RRAM-based routing multiplexers and particularly the integration of 4T(ransistor)1R(RAM) programming structures within their routing tree. We first analyze the limitations in the physical design of a naive one-level 4T1R-based multiplexer, such as co-integration of low-voltage nominal power supply and high voltage programming supply, as well as the use of long metal wires across different isolating wells. To address the limitations, we improve the one-level 4T1R-based multiplexer by re-arranging the nominal and programming voltage domains, and also study the optimal location of RRAMs in terms of performance. The improved design can effectively reduce the length of long metal wires by 50%. Electrical simulations show that using a 7nm FinFET transistor technology, the improved 4T1R-based multiplexers improve delay by 69% as compared to the basic design. At nominal working voltage, considering an input size ranging from 2 to 32, the improved 4T1R-based multiplexers outperform the best CMOS multiplexers in area by 1.4x, delay by 2x and power by 2x respectively. The improved 4T1R-based multiplexers operating at near-Vt regime can improve Power-Delay Product by up to 5.8x when compare to the best CMOS multiplexers working at nominal voltage.","PeriodicalId":269197,"journal":{"name":"Proceedings of the 2017 ACM on International Symposium on Physical Design","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133230435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Fast Incremental Cycle Ratio Algorithm","authors":"Gang Wu, C. Chu","doi":"10.1145/3036669.3036670","DOIUrl":"https://doi.org/10.1145/3036669.3036670","url":null,"abstract":"In this paper, we propose an algorithm to quickly find the maximum cycle ratio (MCR) on an incrementally changing directed cyclic graph. Compared with traditional MCR algorithms which have to recalculate everything from scratch at each incremental change, our algorithm efficiently finds the MCR by just leveraging the previous MCR and the corresponding largest cycle before the change. In particular, the previous MCR allows us to safely break the graph at the changed node. Then, we can detect the changing direction of the MCR by solving a single source longest path problem on a graph without positive cycle. A distance bucket approach is proposed to speed up the process of finding the longest paths. Our algorithm continues to search upward or downward based on whether the MCR is detected as increased or decreased. The downward search is quickly performed by a modified Karp-Orlin algorithm reusing the longest paths found during the cycle detection. In addition, a cost shifting idea is proposed to avoid calculating MCR on certain type of incremental changes. We evaluated our algorithm on both random graphs and circuit benchmarks. A timing-driven detailed placement approach which applies our algorithm is also proposed. Compared with Howard's and Karp-Orlin MCR algorithm, our algorithm shows much more efficiency on finding the MCR in both random graphs and circuit benchmarks.","PeriodicalId":269197,"journal":{"name":"Proceedings of the 2017 ACM on International Symposium on Physical Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130599619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Pushing the boundaries of Moore's Law to transition from FPGA to All Programmable Platform","authors":"I. Bolsens","doi":"10.1145/3036669.3041226","DOIUrl":"https://doi.org/10.1145/3036669.3041226","url":null,"abstract":"Since their inception, FPGAs have changed significantly in their capacity and architecture. The devices we use today are called upon to solve problems in mixed-signal, high-speed communications, signal processing and compute acceleration that early devices could not address. The architecture has evolved towards an \"All Programmable\" platform that immerses multiple programmable technologies into a complex interconnect infrastructure that, today, spans the boundary of multiple dies in one package. As the devices continue to grow in capability and complexity, new design tools and methodologies are being proposed. We will discuss future technology challenges that need to be solved in order to continue pushing the boundary of integration.","PeriodicalId":269197,"journal":{"name":"Proceedings of the 2017 ACM on International Symposium on Physical Design","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125923419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}