{"title":"Local approximation improvement of trajectory piecewise linear macromodels through Chebyshev interpolating polynomials","authors":"M. Farooq, L. Xia","doi":"10.1109/ASPDAC.2013.6509693","DOIUrl":"https://doi.org/10.1109/ASPDAC.2013.6509693","url":null,"abstract":"We introduce the concept of two dimensional (2D) scalability of trajectory piecewise linear (TPWL) through the exploitation of Chebyshev interpolating polynomials in each piecewise region. The goal of 2D scalability is to improve the local approximation properties of TPWL macromodels. Horizontal scalability is achieved through the reduction of number of linearization points along the trajectory; vertical scalability is obtained by extending the scope of macromodel to predict the response of a nonlinear system for inputs far from training trajectory. In this way more efficient macromodels are obtained in terms of simulation speed up of complex nonlinear systems. The methodology developed is to predict the nonlinear responses generated by faults introduced in Micro Electro-Mechanical Systems (MEMS) accelerometer during fabrication, that are used to obtain the seismic images for oil and gas discovery. We provide the implementation details and illustrate the 2D scalability concept with an example using nonlinear transmission line.","PeriodicalId":297528,"journal":{"name":"2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132772178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Support tools for porting legacy applications to multicore","authors":"Yuri Ardila, Natsuki Kawai, Takashi Nakamura, Yosuke Tamura","doi":"10.1109/ASPDAC.2013.6509658","DOIUrl":"https://doi.org/10.1109/ASPDAC.2013.6509658","url":null,"abstract":"This paper presents PEMAP, an automated performance estimation tool to project performance of hand-parallelized programs from sequential programs and BEMAP, a benchmark suite to measure an auto-parallelizer or even a machine's performance. BEMAP is an open-source project, and the documentations on code explanations and experimental results are also provided. Our experiments on PEMAP shows we can estimate performance of hand-parallelized programs in an error of 0.44% of sequential program's performance on average, while using BEMAP shows that the ability of an auto-parallelizer can be measured by comparing the compiled code to the handtuned parallelized OpenCL code, and therefore assisting the development of the auto-parallelizer tool.","PeriodicalId":297528,"journal":{"name":"2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116217313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A 6.72-Gb/s, 8pJ/bit/iteration WPAN LDPC decoder in 65nm CMOS","authors":"Zhixiang Chen, Xiao Peng, Xiongxin Zhao, Leona Okamura, Dajiang Zhou, S. Goto","doi":"10.1109/ASPDAC.2013.6509569","DOIUrl":"https://doi.org/10.1109/ASPDAC.2013.6509569","url":null,"abstract":"An LDPC decoder in 65nm CMOS targeting WPAN (IEEE 802.15.3c) is presented with measurement results. A modified-PCM based message permutation strategy with compatible data flow is proposed to solve the network problem raised by high parallelism LDPC decoding. Compared to the state-of-art, decoder chip achieves 17.7%, 33.5% and 49% improvements in chip density, gate count and energy efficiency, respectively.","PeriodicalId":297528,"journal":{"name":"2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"31 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116940000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimizing multi-level combinational circuits for generating random bits","authors":"Chen Wang, Weikang Qian","doi":"10.1109/ASPDAC.2013.6509586","DOIUrl":"https://doi.org/10.1109/ASPDAC.2013.6509586","url":null,"abstract":"Random bits are an important construct in many applications, such as hardware-based implementation of probabilistic algorithms and weighted random testing. One approach in generating random bits with required probabilities is to synthesize combinational circuits that transform a set of source probabilities into target probabilities. In [1], the authors proposed a greedy algorithm that synthesizes circuits in the form of a gate chain to approximate target probabilities. However, since this approach only considers circuits of such a special form, the resulting circuits are not satisfactory both in terms of the approximation error and the circuit depth. In this paper, we propose a new algorithm to synthesize combinational circuits for generating random bits. Compared to the previous one, our approach greatly enlarges the search space. Also, we apply a linear property of probabilistic logic computation and an iterative local search method to increase the efficiency of our algorithm. Experimental results comparing the approximation errors and the depths of the circuits synthesized by our method to those of the circuits synthesized by the previous approach demonstrate the superiority of our method.","PeriodicalId":297528,"journal":{"name":"2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115544541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
K. Niitsu, Naohiro Harigai, D. Hirabayashi, D. Oki, Masato Sakurai, O. Kobayashi, Takahiro J. Yamaguchi, Haruo Kobayashi
{"title":"Design of a clock jitter reduction circuit using gated phase blending between self-delayed clock edges","authors":"K. Niitsu, Naohiro Harigai, D. Hirabayashi, D. Oki, Masato Sakurai, O. Kobayashi, Takahiro J. Yamaguchi, Haruo Kobayashi","doi":"10.1109/ASPDAC.2013.6509577","DOIUrl":"https://doi.org/10.1109/ASPDAC.2013.6509577","url":null,"abstract":"Design of a clock jitter reduction circuit that exploits the phase blending technique between the uncorrelated clock edges that are self-delayed by multiples of the clock cycle, nT is presented. By blending uncorrelated clock edges, the output clock edges approach the ideal timing and, thus, timing jitter can be reduced by a factor of √2 per stage. There are three technical challenges to realize this: 1) generating uncorrelated clock edges, 2) phase averaging with small time offset from the ideal center position, and 3) minimizing the error in nT-delay being deviated from ideal nT. The proposed circuit overcomes each of these by exploiting an nT-delay, gated phase blending, and self-calibrated nT-delay elements, respectively. Measurement results with a 180-nm CMOS prototype chip demonstrated an approximately four-fold reduction in timing jitter from 30.2 ps to 8.8 ps in 500-MHz clock by cascading the proposed circuit with four-stages.","PeriodicalId":297528,"journal":{"name":"2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"134 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122059367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An adaptive filtering mechanism for energy efficient data prefetching","authors":"Xianglei Dang, Xiaoyin Wang, Dong Tong, Zichao Xie, Lingda Li, Keyi Wang","doi":"10.1109/ASPDAC.2013.6509617","DOIUrl":"https://doi.org/10.1109/ASPDAC.2013.6509617","url":null,"abstract":"As data prefetching is used in embedded processors, it is crucial to reduce the wasted energy for improving the energy efficiency. In this paper, we propose an adaptive prefetch filtering (APF) mechanism to reduce the wasted bandwidth and energy as well as the cache pollution caused by useless prefetches. APF records the prefetch-victim address pairs of issued prefetches and collects information about which address in each pair is first accessed by the processor to guide the filtering of new generated useless prefetches. Meanwhile, filtered prefetches are recorded for building the feedback mechanism to avoid filtering useful prefetches. Experimental results demonstrate that APF reduces useless prefetches by an average of 53.81% with a mere 5.28% reduction of useful prefetches, thus reducing the memory access bandwidth consumption by 59.92% and the L2 cache energy by 6.19%. APF also improves the performance of several programs by reducing the cache pollution incurred by useless prefetches, thus gaining an average performance improvement of 2.12%.","PeriodicalId":297528,"journal":{"name":"2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121335233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A binding algorithm in high-level synthesis for path delay testability","authors":"Yuki Yoshikawa","doi":"10.1109/ASPDAC.2013.6509653","DOIUrl":"https://doi.org/10.1109/ASPDAC.2013.6509653","url":null,"abstract":"A binding method in high-level synthesis for path delay testability is proposed in this paper. For a given scheduled data flow graph, the proposed method synthesizes a path delay testable RTL datapath and its controller. Every path in the datapath is two pattern testable with the controller if the path is activated in the functional operation, i.e., the path is not false path. Our experimental results show that the proposed method can synthesize such RTL circuits with small area overhead compared with that augmented by some DFT techniques such as scan design.","PeriodicalId":297528,"journal":{"name":"2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130308657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Simplification of C-RTL equivalent checking for fused multiply add unit using intermediate models","authors":"Bin Xue, Prosenjit Chatterjee, S. Shukla","doi":"10.1109/ASPDAC.2013.6509686","DOIUrl":"https://doi.org/10.1109/ASPDAC.2013.6509686","url":null,"abstract":"The functionality of Fused multiply add (FMA) design can be formally verified by comparing its register transition level (RTL) implementation against its system level specification often modeled by C/C++ language using sequential equivalent checking (SEC). However, C-RTL SEC does not scale for FMA because of the huge discrepancy existed between the two models. This paper analyzes the dissimilarities and proposes two intermediate models, one abstract RTL and one rewritten C model to bridge the gap. The original SEC proof are partitioned into three sub-proofs among intermediate models where a variety of simplification techniques are applied to further reduce the complexity. Experiments from an industry project show that with the two intermediate models, the SEC proof is complete and scalable for FMA design.","PeriodicalId":297528,"journal":{"name":"2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"0510 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125823962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Wille, Mathias Soeken, Christian Otterstedt, R. Drechsler
{"title":"Improving the mapping of reversible circuits to quantum circuits using multiple target lines","authors":"R. Wille, Mathias Soeken, Christian Otterstedt, R. Drechsler","doi":"10.1109/ASPDAC.2013.6509587","DOIUrl":"https://doi.org/10.1109/ASPDAC.2013.6509587","url":null,"abstract":"The efficient synthesis of quantum circuits is an active research area. Since many of the known quantum algorithms include a large Boolean component (e.g. the database in the Grover search algorithm), quantum circuits are commonly synthesized in a two-stage approach. First, the desired function is realized as a reversible circuit making use of existing synthesis methods for this domain. Afterwards, each reversible gate is mapped to a functionally equivalent quantum gate cascade. In this paper, we propose an improved mapping of reversible circuits to quantum circuits which exploits a certain structure of many reversible circuits. In fact, it can be observed that reversible circuits are often composed of similar gates which only differ in the position of their target lines. We introduce an extension of reversible gates which allow multiple target lines in a single gate. This enables a significantly cheaper mapping to quantum circuits. Experiments show that considering multiple target lines leads to improvements of up to 85% in the resulting quantum cost.","PeriodicalId":297528,"journal":{"name":"2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121785592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Helmstetter, Sylvain Basset, R. Lemaire, F. Clermidy, P. Vivet, M. Langevin, Chuck Pilkington, P. Paulin, D. Fuin
{"title":"A dynamic stream link for efficient data flow control in NoC based heterogeneous MPSoC","authors":"C. Helmstetter, Sylvain Basset, R. Lemaire, F. Clermidy, P. Vivet, M. Langevin, Chuck Pilkington, P. Paulin, D. Fuin","doi":"10.1109/ASPDAC.2013.6509556","DOIUrl":"https://doi.org/10.1109/ASPDAC.2013.6509556","url":null,"abstract":"As Systems-on-Chip size increase, the communication costs become critical and Networks-on-Chip (NoC) bring innovative solutions. Efficient stream-based protocols over NoC have been widely studied to address dataflow communications. They are usually controlled by a set of static parameters. However, new applications, such as high-resolution video decoders, present more data-dependent behaviors forcing communication protocols to support higher dynamicity. For this purpose, we present in this paper dynamic stream links for stream-based end-to-end NoC communications by introducing two link protocols, both independent of the transfer size, allowing to improve the hardware/software control flexibility. The proposed protocols have been modeled in a MPSoC virtual platform and the hardware cost evaluated. Based on simulations, we provide guidelines to exploit these protocols according to application needs.","PeriodicalId":297528,"journal":{"name":"2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121943846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}