O. Nishii, I. Nonomura, Y. Yoshida, K. Hayase, S. Shibahara, Y. Tsujimoto, M. Takada, T. Hattori
{"title":"Design of a 90nm 4-CPU 4320MIPS SoC with individually managed frequency and 2.4GB/s multi-master on-chip interconnect","authors":"O. Nishii, I. Nonomura, Y. Yoshida, K. Hayase, S. Shibahara, Y. Tsujimoto, M. Takada, T. Hattori","doi":"10.1109/ASSCC.2007.4425785","DOIUrl":"https://doi.org/10.1109/ASSCC.2007.4425785","url":null,"abstract":"We have developed a 97.6 mm2 SoC that includes four SuperHtrade architecture CPUs and a DDR-2 controller with 90-nm CMOS for high-performance embedded applications. These four 600 MHz CPUs are identical and each has a floating point unit, 32/32 KB cache memory, and 152 KB local memory. CPUs totally achieve performance of 4320MIPS. Main on-chip 300 MHz 64-bit bus manages processors access and another dedicated connection holds cache coherency operation. Considering varying processing load, this chip targets both low power consumption (proportional to processing load), and constant on-chip bandwidth. Each processor can be operated different frequencies while keeping on-chip bus frequency constant. With utilizing this individual core clock distribution scheme, the following designs have been developed: (i) frequency transition control that permits on-chip bus access of other bus master, (11) light-sleep mode that maintains cache coherency control, (iii) cache snoop control logic that holds cache coherency between multiple frequency processors. The main on-chip interconnect (bus) connects four-processor and other on-chip IPs. The numbers of access master and access slave increase due to processor number. Standard-Vth (against high-Vth) cell usage and layout control achieved 300-MHz multi-master operation.","PeriodicalId":186095,"journal":{"name":"2007 IEEE Asian Solid-State Circuits Conference","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126207693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An all-digital reused-SAR delay-locked loop with adjustable duty cycle","authors":"Wei-Ming Lin, Shen-Iuan Liu","doi":"10.1109/ASSCC.2007.4425693","DOIUrl":"https://doi.org/10.1109/ASSCC.2007.4425693","url":null,"abstract":"An all-digital delay-locked loop (DLL) with multiple outputs and adjustable duty cycle is presented by using the reused successive approximation register (SAR). This DLL provides the multiple synchronous clocks with independently adjustable duty cycles. The proposed reused SAR is similar to a conventional SAR, but it saves a lot of area. The clock duty cycle is adjusted by a 5-bit coarse code and a 2-bit fine code shared each other. This DLL has been fabricated in a CMOS 0.18 mum technology. The measured input frequency is from 300 MHz to 800 MHz. The measured peak-to-peak jitter is 9.78 ps at 800 MHz. The power consumption of this DLL with one output clock is 2.7 mW at 800 MHz. The maximum duty cycle variation at 300 MHz is less than 1%. The area of this DLL is 0.054 mm2.","PeriodicalId":186095,"journal":{"name":"2007 IEEE Asian Solid-State Circuits Conference","volume":"261 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115845736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
I-Chyn Wey, You-Gang Chen, Changhong Yu, Jie Chen, A. Wu
{"title":"A 0.13μm hardware-efficient probabilistic-based noise-tolerant circuit design and implementation with 24.5dB noise-immunity improvement","authors":"I-Chyn Wey, You-Gang Chen, Changhong Yu, Jie Chen, A. Wu","doi":"10.1109/ASSCC.2007.4425694","DOIUrl":"https://doi.org/10.1109/ASSCC.2007.4425694","url":null,"abstract":"As the size of CMOS devices is scaled down to the nanoscale level, noise interferences start to significantly affect the VLSI circuit performance. Because the noise is random and dynamic in nature, a probabilistic-based approach is more suitable to handle signal errors than the conventional deterministic circuit designs. However, probabilistic-based designs cost larger hardware area. In this paper, we design and implement a hardware-efficient probabilistic-based noise-tolerant circuit, an 8-bit Markov random field carry lookahead adder (MRF_CLA), in 0.13 mum CMOS process technology. The measurement results show that the proposed MRF_CLA can provide 24.5 dB of noise-immunity enhancement as compared with its conventional CMOS design. Moreover, the transistor count can be saved 42% as compared to the state-of-art MRF design [1].","PeriodicalId":186095,"journal":{"name":"2007 IEEE Asian Solid-State Circuits Conference","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123831051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sangkwon Na, Woong Hwangbo, Jaemoon Kim, Seunghan Lee, C. Kyung
{"title":"1.8mW, hybrid-pipelined H.264/AVC decoder for mobile devices","authors":"Sangkwon Na, Woong Hwangbo, Jaemoon Kim, Seunghan Lee, C. Kyung","doi":"10.1109/ASSCC.2007.4425763","DOIUrl":"https://doi.org/10.1109/ASSCC.2007.4425763","url":null,"abstract":"To meet the performance, area and power requirement constraints of H.264/AVC, we propose a hybrid pipeline architecture, and a data reuse mechanism to reduce off-chip memory access. A 4x4 sub-macroblock pipeline architecture is optimized for low power as well as performance. The proposed H.264/AVC decoder architecture can support CIF(352x288) 30 fps videos at 6MHz with 1.8 mW @ 1.65 V, implemented in 0.18 mum technology.","PeriodicalId":186095,"journal":{"name":"2007 IEEE Asian Solid-State Circuits Conference","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123373203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
W. Rhee, H. Ainspan, D. Friedman, T. Rasmus, S. Garvin, C. Cranford
{"title":"A uniform bandwidth PLL using a continuously tunable single-input dual-path LC VCO for 5Gb/s PCI express Gen2 application","authors":"W. Rhee, H. Ainspan, D. Friedman, T. Rasmus, S. Garvin, C. Cranford","doi":"10.1109/ASSCC.2007.4425732","DOIUrl":"https://doi.org/10.1109/ASSCC.2007.4425732","url":null,"abstract":"A 4.75 to 6.1 GHz PLL with uniform bandwidth control is implemented in 90 nm CMOS. Utilizing a continuously tunable single-input dual-path LC VCO and a constant-gain phase detector, the proposed architecture is well suited to implementing PLLs that must be compliant with standards that specify minimum and maximum allowable PLL bandwidths such as PCI Express Gcn2 or FB-DIMM applications. This work also addresses noise and coupling aspects in dual-path VCO design. The measurement results show that the PLL bandwidth and random jitter (R.I) variations are well regulated and that the use of a differentially controlled dual-path VCO is important for deterministic jitter (DJ) performance.","PeriodicalId":186095,"journal":{"name":"2007 IEEE Asian Solid-State Circuits Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125521236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Manish Shah, Jama Barreh, J. Brooks, R. Golla, G. Grohoski, N. Gura, Ricky C. Hetherington, Paul J. Jordan, Mark Luttrell, Christopher H. Olson, Bikram Saha, Denis Sheahan, Lawrence Spracklen, Aaron Wynn
{"title":"UltraSPARC T2: A highly-treaded, power-efficient, SPARC SOC","authors":"Manish Shah, Jama Barreh, J. Brooks, R. Golla, G. Grohoski, N. Gura, Ricky C. Hetherington, Paul J. Jordan, Mark Luttrell, Christopher H. Olson, Bikram Saha, Denis Sheahan, Lawrence Spracklen, Aaron Wynn","doi":"10.1109/ASSCC.2007.4425786","DOIUrl":"https://doi.org/10.1109/ASSCC.2007.4425786","url":null,"abstract":"UltraSPARC T2 is Sun Microsystems' second generation multi-core, multi-threaded SPARC System-on-a-chip. It delivers twice the throughput performance of the first generation UltraSPARC T1 processor in essentially the same power envelope. UltraSPARC T2 supports concurrent execution of 64 threads by utilizing eight SPARC cores, each with eight hardware threads. The cores communicate via a high bandwidth crossbar and share a 4 MB, eight bank, L2 cache. Each SPARC core includes two integer execution units and a dedicated floating point and graphics unit, which delivers a peak floating point throughput of 11.2 GFLOPS/sec at 1.4 GHz. Each core also has a cryptographic unit. For I/O, UltraSPARC T2 has an integrated x8 PCI-Express channel and two 10 G Ethernet ports with XAUI interfaces. Memory is accessed via four on-chip controllers each controlling 2 FBDIMM channels for a peak memory bandwidth in excess of 60 GB/sec. UltraSPARC T2 is fabricated in an 11 metal, 1.1 V, triple-Vt CMOS process. The chip has ~500 M transistors on a 342 mm2 die with a power consumption of 84 W at 1.4 GHz. The high level of system integration along with high throughput, floating point, and cryptographic performance makes UltraSPARC T2 an ideal choice for a range of applications including webservers, database and applications servers, high performance computing, secure networking, campus backbones, and file servers.","PeriodicalId":186095,"journal":{"name":"2007 IEEE Asian Solid-State Circuits Conference","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127868365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
K. Mizumoto, T. Tanizaki, S. Kobayashi, M. Nakajima, T. Gyohten, H. Yamasaki, H. Noda, M. Higashida, Y. Okuno, K. Arimoto
{"title":"A multi matrix-processor core architecture for real-time image processing SoC","authors":"K. Mizumoto, T. Tanizaki, S. Kobayashi, M. Nakajima, T. Gyohten, H. Yamasaki, H. Noda, M. Higashida, Y. Okuno, K. Arimoto","doi":"10.1109/ASSCC.2007.4425760","DOIUrl":"https://doi.org/10.1109/ASSCC.2007.4425760","url":null,"abstract":"This paper describes a real time image processing SoC (MX-SoC) with programmable multi matrix -processor (MX-core) architecture. The MX-SoC has three MX-cores, host-CPU, and I/O peripheral modules. An unit MX-core is a massively parallel (1024) flexible SIMD processor based on the matrix architecture. The MX-SoC, which can perform the image processing of CCD camera, is implemented on 90nm low power CMOS process technology and can operate at 162 MHz under the worst condition. A novel parallel pixel data processing algorithm, and multi task execution suitable for multi MX-core processing can achieve 30 frame/sec image processing. This performance is 30 times faster than general purpose CPU solution. The MX-SoC with multi MX-core architecture can realize the software solution of real time image processing application field.","PeriodicalId":186095,"journal":{"name":"2007 IEEE Asian Solid-State Circuits Conference","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116481563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Yeh, Wei-Yu Wang, Wen-Liang Wang, Yu-Hong Lin, Ying-Lien Cheng, Tsung-Hsin Chou, Jyhfong Lin
{"title":"A PCI-express Gen2 transceiver with adaptive 2-Tap DFE for up to 12-meter external cabling","authors":"T. Yeh, Wei-Yu Wang, Wen-Liang Wang, Yu-Hong Lin, Ying-Lien Cheng, Tsung-Hsin Chou, Jyhfong Lin","doi":"10.1109/ASSCC.2007.4425788","DOIUrl":"https://doi.org/10.1109/ASSCC.2007.4425788","url":null,"abstract":"The most updated specification of PCI-Express External Cabling 1.0 only specifies Gen1 (2.5 Gbps) for short-reach usage. This proposed transceiver architecture not only increases the link rate from Genl to Gen2 (5 Gbps), but also extends link range from short-reach to long-reach using a 12-meter 26AWG cable. The S21 of such a cable is -20 dB at 2.5 GHz. The new receiver achieves jitter tolerance at the far-end terminal followed by such a cable is 0.76UI, with a random jitter of 0.31 UI, under the BER of 10-12. This design has been fabricated in TSMC 80 nm CMOS process, with the die area of 0.4 mm2 for each lane.","PeriodicalId":186095,"journal":{"name":"2007 IEEE Asian Solid-State Circuits Conference","volume":"162 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121558926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Where is Korea’s fabless semiconductor industry headed?","authors":"J. Hwang","doi":"10.1109/ASSCC.2007.4425783","DOIUrl":"https://doi.org/10.1109/ASSCC.2007.4425783","url":null,"abstract":"Riding on the boom of handset and system display manufacturing at home, Korea's fabless semiconductor industry has made a sharp increase by leaps and bounds. The sales revenue of the fabless semiconductor companies nearly tripled to US SI.58 billion in 2006, compared to US $545 million in 2003. Marking an annual average growth of 42 percent, the industry is now around the corner of global jump: by overcoming such issues as a broad range of portfolio set-up, business volume expansion through M&A, market diversification, etc.","PeriodicalId":186095,"journal":{"name":"2007 IEEE Asian Solid-State Circuits Conference","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126639060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Joonhee Lee, Kyunglok Kim, Junghyup Lee, Taekwang Jang, Seonghwan Cho
{"title":"A 480-MHz to 1-GHz sub-picosecond clock generator with a fast and accurate automatic frequency calibration in 0.13-µm CMOS","authors":"Joonhee Lee, Kyunglok Kim, Junghyup Lee, Taekwang Jang, Seonghwan Cho","doi":"10.1109/ASSCC.2007.4425733","DOIUrl":"https://doi.org/10.1109/ASSCC.2007.4425733","url":null,"abstract":"In this paper, an ultra-low jitter clock generator that employs a novel automatic frequency calibration (AFC) technique is presented. To achieve low jitter, the clock generator uses an LC-VCO with S-bit switched tuning scheme. The clock output is taken from the output of a multi-modulus divider, which increases the output frequency range with small variation in the loop bandwidth. The capacitor array of the the VCO is controlled by a novel AFC technique that performs binary search for fast calibration and fine search to select an optimum tuning curve. A prototype chip implemented in 0.13-mum CMOS process achieves 480 MHz to 1 GHz of output frequency while consuming 22 mW from a 1.2 V supply. The measured tins jitter and calibration time of the proposed clock generator are 940 fs at 600 MM/, and 350 ns, respectively. These numbers are the fastest calibration time and one of the lowest jitter that have been reported in a clock generator.","PeriodicalId":186095,"journal":{"name":"2007 IEEE Asian Solid-State Circuits Conference","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126873730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}