A Zero-overhead Self-timed 160ns 54b CMOS Divider

T. E. Williarns, M. Horowitz
{"title":"A Zero-overhead Self-timed 160ns 54b CMOS Divider","authors":"T. E. Williarns, M. Horowitz","doi":"10.1109/isscc.1991.689080","DOIUrl":null,"url":null,"abstract":"This circuit demonstrates a self-timed iterating ring which attains the speed of a combinational array while using only a fraction of the silicon area. The stages in the ring compute mantissa quotient digits for a floating-point division operation. Unlike circuits which implement self-timing by using a matched on-chip clock generator to provide an internal clock for synchronous blocks, the circuit of this paper uses local control handshaking between fully asynchronous blocks and will operate correctly for any values of gate delays.' To avoid embedded in the data throughout the design by using dual-requiring matching path delays, complction information is monotonic wire pairs. The precharged function blocks use merged n-channel pull-down networks to choose which of thc wires in each pair to set high. nous pipeline by looping data from its output back to its input. A self-timed iterating ring is formed from an asynchro-The total latency and throughput tradeoff with the number of latches in the ring? Minimallatency is achievcdinthis chip by directly concatenating precharged logic blocks into a looped domino chain without adding any explicit latches. The prc-charge (reset) signals for each block are controlled separately so each block can be used as an implicit latch without adding any additional transistors. The self-timed control is designed to precharge each block after data passes it, and to remove its precharge enabling its evaluation, before data loops around to its inputs again. A graph-based method is used to analyze the inter-block dependencies and aids in keeping the critical path solely within the combinational data elements. of which is internally composed of precharged blocks. .4 key t o Theringisorganizedasaseriesofadjoiningstageseach removing extra control dependencies which could degrade utilize and encompass the time taken by lhecontrol so its delay performance is t o place enough stages in the loop to fully is completely hidden. This chip uses five stages to allow the control signals to enable each block 0.7 stage delays before its data arrives, which is measured at 211s as shown in Figure 1. This margin ensures no control logic enters into the critical path even with some variances in the delays. Thus, the data flow continually at the same rate it would flow through an \" unwrapped \" combinational array implementing the same functions. While most previous asynchronous circuits have suffered delays due to handshaking and control, this methodof self-timing adds zero control overhead to the latency of the raw function computation. …","PeriodicalId":360958,"journal":{"name":"1991 IEEE International Solid-State Circuits Conference. Digest of Technical Papers","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"49","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"1991 IEEE International Solid-State Circuits Conference. Digest of Technical Papers","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/isscc.1991.689080","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 49

Abstract

This circuit demonstrates a self-timed iterating ring which attains the speed of a combinational array while using only a fraction of the silicon area. The stages in the ring compute mantissa quotient digits for a floating-point division operation. Unlike circuits which implement self-timing by using a matched on-chip clock generator to provide an internal clock for synchronous blocks, the circuit of this paper uses local control handshaking between fully asynchronous blocks and will operate correctly for any values of gate delays.' To avoid embedded in the data throughout the design by using dual-requiring matching path delays, complction information is monotonic wire pairs. The precharged function blocks use merged n-channel pull-down networks to choose which of thc wires in each pair to set high. nous pipeline by looping data from its output back to its input. A self-timed iterating ring is formed from an asynchro-The total latency and throughput tradeoff with the number of latches in the ring? Minimallatency is achievcdinthis chip by directly concatenating precharged logic blocks into a looped domino chain without adding any explicit latches. The prc-charge (reset) signals for each block are controlled separately so each block can be used as an implicit latch without adding any additional transistors. The self-timed control is designed to precharge each block after data passes it, and to remove its precharge enabling its evaluation, before data loops around to its inputs again. A graph-based method is used to analyze the inter-block dependencies and aids in keeping the critical path solely within the combinational data elements. of which is internally composed of precharged blocks. .4 key t o Theringisorganizedasaseriesofadjoiningstageseach removing extra control dependencies which could degrade utilize and encompass the time taken by lhecontrol so its delay performance is t o place enough stages in the loop to fully is completely hidden. This chip uses five stages to allow the control signals to enable each block 0.7 stage delays before its data arrives, which is measured at 211s as shown in Figure 1. This margin ensures no control logic enters into the critical path even with some variances in the delays. Thus, the data flow continually at the same rate it would flow through an " unwrapped " combinational array implementing the same functions. While most previous asynchronous circuits have suffered delays due to handshaking and control, this methodof self-timing adds zero control overhead to the latency of the raw function computation. …
零开销自定时160ns 54b CMOS分频器
该电路演示了一种自定时迭代环,仅使用一小部分硅面积即可达到组合阵列的速度。环中的阶段为浮点除法运算计算尾数商数。与通过使用匹配的片上时钟发生器为同步块提供内部时钟实现自定时的电路不同,本文的电路在完全异步块之间使用本地控制握手,并将在任何门延迟值下正确运行。为了避免在整个设计过程中使用双要求匹配路径延迟来嵌入数据,补全信息是单调的线对。预充电功能块使用合并的n通道下拉网络来选择每对中的哪条线设置高电平。Nous管道通过将数据从输出循环回其输入。自定时迭代环由异步形成-总延迟和吞吐量权衡与环中的锁存器数量?通过直接将预先充电的逻辑块连接到一个环形的多米诺骨牌链中,而不添加任何显式的锁存器,可以实现最小的延迟。每个块的prc-charge(复位)信号被单独控制,因此每个块可以用作隐式锁存器,而无需添加任何额外的晶体管。自定时控制被设计为在数据通过后对每个块进行预充,并在数据再次循环到其输入之前删除其预充以使其评估。使用基于图的方法来分析块间依赖关系,并帮助将关键路径单独保留在组合数据元素中。4 .关键是要有组织的一系列相连的阶段,每个阶段都消除了额外的控制依赖关系,这些依赖关系可能会降低利用率,并包含lhectrol所花费的时间,因此其延迟性能是为了在循环中放置足够的阶段以完全隐藏。该芯片使用5级,允许控制信号使每个模块在数据到达之前延迟0.7级,测量值为211秒,如图1所示。这个余量确保了即使延迟有一些变化,也不会有控制逻辑进入关键路径。因此,数据以与通过实现相同功能的“未包装”组合数组时相同的速率连续流动。虽然大多数以前的异步电路由于握手和控制而遭受延迟,但这种自定时方法为原始函数计算的延迟增加了零控制开销。...
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信