{"title":"Holistic 2.5D Chiplet Design Flow: A 65nm Shared-Block Microcontroller Case Study","authors":"M. Kabir, Yarui Peng","doi":"10.1109/socc49529.2020.9524798","DOIUrl":"https://doi.org/10.1109/socc49529.2020.9524798","url":null,"abstract":"Traditionally, different components of a system are integrated through Printed Circuit Boards (PCB). The long traces on PCB have severe power loss and limit the bandwidth of the interconnects between the components. Advanced packaging offers high-bandwidth, low power, and high-performance inter-die communications with compact sizes and dense pin arrays. 2.5D integration further provides better thermal dissipation, lower cost, and higher yield compared to 3D stacking. Novel CAD tool flows dedicated to 2.5D chiplet designs are essential to enable flexible and efficient 2.5D system designs. In this paper, we present our design, optimization, and analysis methodologies and a design case study implementing an ARM Cortex-M0 microcontroller system using a holistic 2.5D tool flow. We use TSMC 65nm as our chiplet implementation technology with a modified metal stack referring to 2.5D Fan-Out Wafer-Level Packaging (FOWLP) solutions. We also discuss design techniques for chiplet reuse and the Drop-in design approach to develop low-power, low-cost, and high-performance flavors of a 2.5D system. We compare the 2.5D system with its 2D counterpart to validate the holistic design flow.","PeriodicalId":114740,"journal":{"name":"2020 IEEE 33rd International System-on-Chip Conference (SOCC)","volume":"46 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130318142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A SpaceWire PHY with Double Data Rate and Fallback Redundancy","authors":"Mong Tee Sim, Yanyan Zhuang","doi":"10.1109/socc49529.2020.9524763","DOIUrl":"https://doi.org/10.1109/socc49529.2020.9524763","url":null,"abstract":"In satellite applications, the cost of failure is significantly higher than regular applications. As a result, redundancy is critical. In this paper, we propose a SpaceWire Physical Layer (PHY) Transceiver with a dual-data lane that can support double the data rate and fallback redundancy. The dual-data lane allows twice the data rate with the same transmission frequency. With a lane muxing circuitry embedded in the PHY, our design can support four transmission topologies for fallback redundancy. We used Verilog HDL and ModelSim to create, simulate, and test our SpaceWire PHY Transceiver design. The results show that our design can transmit and receive data in four topologies. When operating in the fourth topology mode with a dual-data lane can deliver twice the data rate compared to other SpaceWire PHY using the same transmission frequency.","PeriodicalId":114740,"journal":{"name":"2020 IEEE 33rd International System-on-Chip Conference (SOCC)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131172179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Binayak Tiwari, Mei Yang, Xiaohang Wang, Yingtao Jiang, V. Muthukumar
{"title":"Improving the Performance of a NoC-based CNN Accelerator with Gather Support","authors":"Binayak Tiwari, Mei Yang, Xiaohang Wang, Yingtao Jiang, V. Muthukumar","doi":"10.1109/socc49529.2020.9524799","DOIUrl":"https://doi.org/10.1109/socc49529.2020.9524799","url":null,"abstract":"The increasing application of deep learning technology drives the need for an efficient parallel computing architecture for Convolutional Neural Networks (CNNs). A significant challenge faced when designing a many-core CNN accelerator is to handle the data movement between the processing elements. The CNN workload introduces many-to-one traffic in addition to one-to-one and one-to-many traffic. As the de-facto standard for on-chip communication, Network-on-Chip (NoC) can support various unicast and multicast traffic. For many-to-one traffic, repetitive unicast is employed which is not an efficient way. In this paper, we propose to use the gather packet on mesh-based NoCs employing output stationary systolic array in support of many-to-one traffic. The gather packet will collect the data from the intermediate nodes eventually leading to the destination efficiently. This method is evaluated using the traffic traces generated from the convolution layer of AlexNet and VGG-16 with improvement in the latency and power than the repetitive unicast method.","PeriodicalId":114740,"journal":{"name":"2020 IEEE 33rd International System-on-Chip Conference (SOCC)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130159651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"C2IM: A Compact Computing-In-Memory Unit of 10 Transistors with Standard 6T SRAM","authors":"Erxiang Ren, Li Luo, Zheyu Liu, F. Qiao, Qi Wei","doi":"10.1109/socc49529.2020.9524791","DOIUrl":"https://doi.org/10.1109/socc49529.2020.9524791","url":null,"abstract":"Memory wall has been a major bottleneck that restrains the speed and power consumption of processors in the Von Neumann architecture. Computing-in-memory (CIM) was proposed as a promising method to tackle the memory wall by implementing computing in memory instead of fetching the value from memory to the processor. Based on the standard 6T -SRAM, this paper proposes a compact CIM (C2IM) unit of 10 transistors. This C2imunit is capable to not only implement the complete function of SRAM, but also realize the multiplication between the input, which is copied into the unit through the current mirror, and the value stored in the SRAM. Current-mode circuits are adopted in this unit so that it can implement higher energy efficient multiply-accumulate (MAC) operation with simpler control timing and transistor cost. Based on TSMC 65nm CMOS lower power process, the proposed unit can achieve 166.67 TOPS/W energy efficiency under 200MHz clock frequency.","PeriodicalId":114740,"journal":{"name":"2020 IEEE 33rd International System-on-Chip Conference (SOCC)","volume":"125 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114149009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Nimmalapudi, H. Stiegler, A. Marshall, Keith Jarreau
{"title":"Programmable Voltage Reference Circuit Using an Analog Floating Gate Device","authors":"S. Nimmalapudi, H. Stiegler, A. Marshall, Keith Jarreau","doi":"10.1109/socc49529.2020.9524788","DOIUrl":"https://doi.org/10.1109/socc49529.2020.9524788","url":null,"abstract":"Voltage references are an integrated circuit component used to provide stable references. They provide a voltage that is generally independent of supply voltage and temperature and have a variety of applications in control systems and data converters where there is a need for a fixed voltage. Much research has occurred in developing reliable and low power reference circuits. We here discuss improvements to voltage reference circuit that employs an analog floating gate (AFG) device to generate programmable reference voltages. The approach taken here allows us to maintain accuracy of reference voltage over a very wide range of supply voltage and trim to any desired reference voltage from 0.4 to 3.1 volts.","PeriodicalId":114740,"journal":{"name":"2020 IEEE 33rd International System-on-Chip Conference (SOCC)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121935976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Low-Cost Fault Injection Attack Resilient FSM Design","authors":"Ziming Wang, Aijiao Cui, G. Qu","doi":"10.1109/socc49529.2020.9524779","DOIUrl":"https://doi.org/10.1109/socc49529.2020.9524779","url":null,"abstract":"Finite state machine (FSM) plays an important role in digital circuit design. Since it stores the system states and controls system functionality, security vulnerabilities of FSM have been exploited extensively. Among the potential attacks, fault inject attack (FIA) is one of the most severe and most challenging to defend against. Unlike existing countermeasures, we propose a novel structure for FSM state flip flop design that can mitigate any kind of timing based FIAs. Our key idea is to sample the flip flop input signals multiple times during one clock cycle, and then compare these values to determine the correct one. This can effectively defeat all the FIAs based on violating FSM state setup time constraint. In addition, such design will make the design more robust against jitters. In order to reduce the design overhead, we use the low-cost transmission gates to implement the proposed latch and flip flop. We use Hspice to simulate the error conditions with delayed input data and jitter and the results confirm that our design is error resilient. We also implement the FSM in AES with our proposed flip flops and compare the area overhead with existing FIA countermeasures. Results show that the two state-of-the-art approaches have 2X and 4X area overhead than ours.","PeriodicalId":114740,"journal":{"name":"2020 IEEE 33rd International System-on-Chip Conference (SOCC)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122546798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jhanani Thiagarajan, Arnab A. Purkayastha, A. Patil, H. Tabkhi
{"title":"Exploring the Scalability of OpenCL Coarse Grained Parallelism on Cloud FPGAs","authors":"Jhanani Thiagarajan, Arnab A. Purkayastha, A. Patil, H. Tabkhi","doi":"10.1109/socc49529.2020.9524765","DOIUrl":"https://doi.org/10.1109/socc49529.2020.9524765","url":null,"abstract":"OpenCL programming ability combined with FPGAs pipelined parallelism have enabled high-performance execution and power-efficient solutions for massively parallel applications. This paper provides an exhaustive study on the scalability of OpenCL coarse-grain parallelism, Compute Unit (CU) replication on cloud FPGAs. This work demonstrates that for many applications there is an optimum number of CUs to achieve the maximum performance benefits with respect to memory bandwidth, memory conflicts introduced by CU replication and available FPGA resources. At the same time, the paper provides a source-code template and an optimized front-end design tool to explore and identify the optimum CU number for a given application, while hiding the programming and exploration difficulties from programmers. Our experimental results on 15 applications taken from the Xilinx SDAccel v2017.4 suite and the Rodinia Benchmark Suite v3.1 show a speedup of 6.2X, bandwidth improvement of 3.5X with a mere 1.04X power and less than 10% resource utilization on average. In addition, our tool results in a 31% improvement in the total design synthesis time for an illustrative Histogram application.","PeriodicalId":114740,"journal":{"name":"2020 IEEE 33rd International System-on-Chip Conference (SOCC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128478204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Energy-Efficient Adiabatic Circuits Using Transistor-Level Monolithic 3D Integration","authors":"Ivan Miketic, E. Salman","doi":"10.1109/socc49529.2020.9524748","DOIUrl":"https://doi.org/10.1109/socc49529.2020.9524748","url":null,"abstract":"Charge-recycling adiabatic circuits are recently receiving increased attention due to both high energy-efficiency and higher resistance against side-channel attacks. These characteristics make adiabatic circuits a promising technique for Internet-of-things based applications. One of the important limitations of adiabatic logic is the higher intra-cell interconnect capacitance due to differential outputs and cross-coupled pMOS transistors. Since energy consumption has quadratic dependence on capacitance in adiabatic circuits (unlike conventional static CMOS where dependence is linear), higher interconnect capacitance significantly degrades the overall power savings that can be achieved by adiabatic logic, particularly in nanoscale technologies. In this paper, monolithic 3D integrated adiabatic circuits are introduced where transistor-level monolithic 3D technology is used to implement adiabatic gates. A 45 nm two-tier Mono3D PDK is used to demonstrate the proposed approach. Monolithic inter-tier vias are leveraged to significantly reduce parasitic interconnect capacitance, achieving up to 47% reduction in power-delay product as compared to 2D adiabatic circuits in a 45 nm technology node.","PeriodicalId":114740,"journal":{"name":"2020 IEEE 33rd International System-on-Chip Conference (SOCC)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127849702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}