{"title":"DVFS Considering Spatial Correlation Timing and Process-Voltage-Temperature Variations","authors":"Tung-Liang Lin, Sao-Jie Chen","doi":"10.1109/socc49529.2020.9524768","DOIUrl":"https://doi.org/10.1109/socc49529.2020.9524768","url":null,"abstract":"A novel scheme, spatially-correlated Design Dependent Critical-Path Monitor (DDCPM), is proposed, which can provide valuable references in deriving application-specific, process- and temperature-aware DVFS for aggressive power saving during runtime. Such DDCPM utilizes its unique spatial correlation feature and real-time sampling techniques to precisely sense the unexpected behavior introduced by over-scaled voltage under the operating conditions with random and mutually dependent Process-Voltage-Temperature (PVT) variations in each individual chip. Our experimental results obtained in two IPs implemented in TSMC 28 nm process node respectively show average step-wise 7.80% and 8.19% power could be reduced at a smaller granular level of voltage scaling, which corresponding maximum power reductions, 55.6% and 57.5% in Typical Corner could be finally achieved.","PeriodicalId":114740,"journal":{"name":"2020 IEEE 33rd International System-on-Chip Conference (SOCC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130556510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vishant Gotra, S. Reddy, Tanniru Srinivasa Rao, P. Pavithra
{"title":"Optimized Power Grid Planning for Enabling Low Power Features for Leakage Power Reduction in SOC","authors":"Vishant Gotra, S. Reddy, Tanniru Srinivasa Rao, P. Pavithra","doi":"10.1109/socc49529.2020.9524781","DOIUrl":"https://doi.org/10.1109/socc49529.2020.9524781","url":null,"abstract":"Leakage power reduction and Power Delivery Network (PDN) are amongst the most challenging areas in modern VLSI design. The focus is on reducing leakage power by using more leakage saving techniques and Multiple power pins (commonly known as MPP) standard cell libraries in the SOC. MPP cells can retain data even when logic modules in the design are in the power off state. But extensive usage of MPP cells comes with an overhead of backup supply power grid which eats up the routing resources thereby increasing the routing congestion, physical design and layout convergence challenges in high utilization designs. The backup supply power grid needs to meet the voltage IR drop requirements. So the problem statement is not only restricted to saving leakage, it is more of saving leakage while targeting high frequency and high utilization with a power grid meeting IR drop limits and without causing timing and layout convergence challenges. To solve this problem, optimized and robust backup supply power grid planning along with its verification is proposed in this paper which enables the extensive use of MPP cell addition while achieving high utilization. As an application of the proposed approach, around 20% of leakage power reduction is achieved due to the addition of auto power gating features without increasing layout convergence challenges and the IR drop issues. The proposed technique can be used during the ECO mode as well if some MPP cells are added during ECO phase.","PeriodicalId":114740,"journal":{"name":"2020 IEEE 33rd International System-on-Chip Conference (SOCC)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124447238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Ultra-Low Power 900 MHz Intermediate Frequency Low Noise Amplifier For Low-Power RF Receivers","authors":"Aasish Boora, Bharatha Kumar Thangarasu, K. Yeo","doi":"10.1109/socc49529.2020.9524753","DOIUrl":"https://doi.org/10.1109/socc49529.2020.9524753","url":null,"abstract":"The recent advancement in biomedical and healthcare sectors shows that the portable ambulatory medical devices with very low power consumption play an important role in continuous monitoring and diagnosis of outpatients by mitigating undesired frequent replacing or recharging of power supply source. To aid this requirement, the fully integrated on-chip circuits should consume very little power without compromising on the overall system performance. In this paper, we present a novel ultra-low power dual-stage intermediate frequency low-noise amplifier (IF LNA) operating at 900 MHz designed in TSMC CMOS 40nm technology. The proposed LNA comprises two identical complementary input stages externally matched to 50 Ω at the load and source along with inter-stage matching. Simulation results of the circuit indicate unconditional stability with a power consumption of 112.9 µW from a 0.56 V supply, a noise figure of 4.66 dB, and a gain of 10.2 dB. The input-referred IP3 is around -17.2 dBm. This work aims to be incorporated in a fully integrated ultra-low-power (ULP) RF receiver in the 2.4 GHz ISM band.","PeriodicalId":114740,"journal":{"name":"2020 IEEE 33rd International System-on-Chip Conference (SOCC)","volume":"12 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117326179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A DNN Compression Framework for SOT-MRAM-based Processing-In-Memory Engine","authors":"Geng Yuan, Xiaolong Ma, Sheng Lin, Zhengang Li, Jieren Deng, Caiwen Ding","doi":"10.1109/socc49529.2020.9524757","DOIUrl":"https://doi.org/10.1109/socc49529.2020.9524757","url":null,"abstract":"The computing wall and data movement challenges of deep neural networks (DNNs) have exposed the limitations of conventional CMOS-based DNN accelerators. Furthermore, the deep structure and large model size will make DNNs prohibitive to embedded systems and IoT devices, where low power consumption is required. To address these challenges, spin-orbit torque magnetic random-access memory (SOT-MRAM) and SOT-MRAM based Processing-In-Memory (PIM) engines have been used to reduce the power consumption of DNNs since SOT-MRAM has the characteristic of near-zero standby power, high density, non-volatile. However, the drawbacks of SOT-MRAM based PIM engines such as high writing latency and requiring low bit-width data decrease its popularity as a favorable energy-efficient DNN accelerator. To mitigate these drawbacks, we propose an ultra-energy-efficient framework by using model compression techniques including weight pruning and quantization from the software level considering the architecture of SOT-MRAM PIM. And we incorporate the alternating direction method of multipliers (ADMM) into the training phase to further guarantee the solution feasibility and satisfy SOT-MRAM hardware constraints. Thus, the footprint and power consumption of SOT-MRAM PIM can be reduced, while increasing the overall system performance rate (frame per second) in the meantime, making our proposed ADMM-based SOT-MRAM PIM more energy efficient and suitable for embedded systems or IoT devices. Our experimental results show the accuracy and compression rate of our proposed framework is consistently outperforming the reference works, while the efficiency (area & power) and performance rate of SOT-MRAM PIM engine is significantly improved.","PeriodicalId":114740,"journal":{"name":"2020 IEEE 33rd International System-on-Chip Conference (SOCC)","volume":"132 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123517250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Nimmalapudi, A. Marshall, H. Stiegler, Keith Jarreau
{"title":"Self-Correcting Op-Amp Input Offset Using Analog Floating Gates","authors":"S. Nimmalapudi, A. Marshall, H. Stiegler, Keith Jarreau","doi":"10.1109/socc49529.2020.9524775","DOIUrl":"https://doi.org/10.1109/socc49529.2020.9524775","url":null,"abstract":"Low input offset is important in high precision Op-Amps. However, input offset errors caused by mismatch in differential signal paths as a result of random variations are unavoidable even with optimum layout techniques. A relatively new method, the use of Analog Floating Gate (AFG) devices, to enable correction is studied. AFG devices act as analog storage and allow precise trimming of input offset. The proposed methodology results in offset correction for continuous time operation, provides low power operation, does not limit bandwidth and avoids discrete errors seen with some correction methods. Unlike some other analog memories, AFG devices tend to lose charge over time, typically a few mV per year. As a result, we have developed circuitry that automatically recalibrates the AFG charge and retains the Op-amp offset target throughout the product lifetime.","PeriodicalId":114740,"journal":{"name":"2020 IEEE 33rd International System-on-Chip Conference (SOCC)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124691403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rui Xu, E. Sha, Qingfeng Zhuge, Liang Shi, Shouzhen Gu
{"title":"Architectural Exploration on Racetrack Memories","authors":"Rui Xu, E. Sha, Qingfeng Zhuge, Liang Shi, Shouzhen Gu","doi":"10.1109/socc49529.2020.9524792","DOIUrl":"https://doi.org/10.1109/socc49529.2020.9524792","url":null,"abstract":"It has become a trend that embedded systems are designed for big data and artificial intelligence applications, which demand the large capacity and high access performance of memory. Racetrack memory (RM) is a novel non-volatile memory with high access performance, high density, and low power consumption. Thus, for data-intensive applications specific embedded systems, RM can meet the requirements of access speed, capacity, and power consumption. However, before accessing data on RM, data in nanowires need to be shifted to align them with read/write port, which is called shift operation. Numerous shift operations cause high latency and energy. In that case, increasing the number of ports or reducing the length of tapes while increasing the number of tape strips can reduce the shift operations. However, these methods may increase the area of RM. In this paper, we aim to explore the appropriate RM configurations. An Explore Pareto-Optimal Configuration(EPOC) technique based on application access pattern is proposed to generate the appropriate RM configurations. Lastly, a simple example is used to analyze the configurations generated by EPOC.","PeriodicalId":114740,"journal":{"name":"2020 IEEE 33rd International System-on-Chip Conference (SOCC)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130765335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Simultaneous Multi Voltage Aware Timing Analysis Methodology for SOC using Machine Learning","authors":"Vishant Gotra, S. Reddy","doi":"10.1109/socc49529.2020.9524780","DOIUrl":"https://doi.org/10.1109/socc49529.2020.9524780","url":null,"abstract":"To improve gross margins, the semiconductor industry is focused on the PPA (power, performance, area) matrix of the SOC. The current trend is to put more IPs on the chips to enable multiple functionalities to support various applications. To optimize PPA of such SOCs, multi voltage and multi power domain design techniques are used due to which the timing signoff of the chip has to be done on multiple corners and multiple modes (MCMM). Single voltage timing analysis is easier. With the multi-level supply voltage and dynamic scaling features, the timing analysis complexity increases because timing signoff has to be done on different voltages and cross-voltage paths. Multi-voltage designs need exhaustive analysis of cross voltage domain paths to make sure all worst-case paths are identified under all voltage combinations. With numerous operating PVT corners, timing analysis across corners is very challenging. Simultaneous multi-voltage aware analysis (SMVA) do the analysis of all cross-domain paths under all voltage scenarios in a single run, without the need for margining that can add pessimism. In this paper, we propose a machine learning model, based on bigrams of path stages, to predict the timing slack divergence and cell delays across voltages. We identified the circuit parameters which affects the cell delays due to voltage changes and thereby causing the differences in the endpoint arrival times. We use the timing analysis data of a given testcase at multiple voltages and with the use of Classification and regression tree (CART) approach we develop a predictive model for the new arrival times due to the change in voltages. Experimental results show that our model is able to predict the timing path slack divergence with ~97% accuracy at different voltages on both clock and data paths with a lower turnaround time including the cross-voltage timing paths.","PeriodicalId":114740,"journal":{"name":"2020 IEEE 33rd International System-on-Chip Conference (SOCC)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121613321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FPGA Based Co-design of Storage-side Query Filter for Big Data Systems","authors":"Jinyu Zhan, Ying Li, Wei Jiang, Jianping Zhu","doi":"10.1109/socc49529.2020.9524801","DOIUrl":"https://doi.org/10.1109/socc49529.2020.9524801","url":null,"abstract":"In this paper we are interested in accelerating the processing of big data systems. We consider the architecture of storage and computing separated Big Data systems, and approach to improve the data query efficiency in the storage side. We propose an Field Programmable Gate Array (FPGA) based co-design of query filter on storage nodes to reduce the workloads of computing nodes and the communication overheads between them. The codesign of query filter is composed of software layer and FPGA layer. In software layer, we use the pointers to project the data in the RCFile format to reduce data transmission, and then formulate the combined predicate of SQL conditions into parameters. In FPGA layer, we design two filtering schemes on FPGA for data in RCFile format, i.e. parallel sequential filter and parallel pipeline filter, by which we can achieve that different columns and SQL queries are completely parallel. Based on TPC-H benchmark and Tencent data set, we conduct extensive experiments to evaluate our design, which can save averagely 76.2% of time overhead compared with Presto and 96.86% of time overhead compared with Hive.","PeriodicalId":114740,"journal":{"name":"2020 IEEE 33rd International System-on-Chip Conference (SOCC)","volume":"239 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115074389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Sub-1 ppm/°C CMOS Bandgap Voltage Reference With Process Tolerant Piecewise Second-Order Curvature Compensation","authors":"Yongjoon Ahn, Suhwan Kim, Hyunjoong Lee","doi":"10.1109/socc49529.2020.9524787","DOIUrl":"https://doi.org/10.1109/socc49529.2020.9524787","url":null,"abstract":"This paper presents a CMOS high-precision bandgap voltage reference. To obtain low temperature coefficient (TC) regardless of process variation, piecewise second-order curvature compensation method is proposed. Curvature compensation current is generated through current subtraction and current squaring operation with two currents with different dependence on temperature. Also, several circuit techniques are adopted to achieve compensate error sources. Chopping technique is utilized to cancel 1/f noise and DC offset of the error amplifier. Trimming resistor is used to compensate process variation. The bandgap reference is designed in a 0.13µm CMOS process. Post layout simulation shows that TC of the bandgap reference is 0.64ppm/°C over a wide temperature range of -40°C to 125°C. Moreover, sub-1 ppm/°C TC is achieved irrespective of process variation after two-point temperature trimming. The bandgap reference consumes 44µA at 27°C and layout size is 0.0534mm2.","PeriodicalId":114740,"journal":{"name":"2020 IEEE 33rd International System-on-Chip Conference (SOCC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129271807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fine Grained Control Flow Checking with Dedicated FPGA Monitors","authors":"Augusto W. Hoppe, J. Becker, F. Kastensmidt","doi":"10.1109/socc49529.2020.9524751","DOIUrl":"https://doi.org/10.1109/socc49529.2020.9524751","url":null,"abstract":"Control flow errors are known to compose the most common type of error for embedded systems in safety-critical environments. While data protection techniques can be easily implemented with special codifications, control flow monitoring techniques either require significant hardware modifications or impose prohibitive software overheads. We propose a new control flow trace checker (CFTC) scheme which can achieve high detection rates without modifying processor hardware or adding execution overheads. Our technique can detect 97.8% of all single bit flips to control flow operations, with detection delays as fast as 0.6 µs on a Cortex-A9 CPU.","PeriodicalId":114740,"journal":{"name":"2020 IEEE 33rd International System-on-Chip Conference (SOCC)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121646690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}