{"title":"面向物联网的ΣΔ流异步随机计算的错误-延迟权衡","authors":"Patricia Gonzalez-Guerrero, S. G. Wilson, M. Stan","doi":"10.1109/SOCC46988.2019.1570548453","DOIUrl":null,"url":null,"abstract":"Asynchronous stochastic computing (ASC) using continuous-time-asynchronous $\\Sigma\\Delta$ modulators $(\\mathrm{S}\\mathrm{C}-\\mathrm{A}\\Sigma\\Delta \\mathrm{M})$ has the potential to enable ultra-low-power, on-node machine learning algorithms for the next generation of sensors for the Internet of Things $(\\mathrm{I}\\mathrm{o}\\mathrm{T})$. Similar to synchronous stochastic computing $(\\mathrm{S}\\mathrm{S}\\mathrm{C}^{\\mathrm{I}})$1, in $\\mathrm{S}\\mathrm{C}-\\mathrm{A}\\Sigma\\Delta \\mathrm{M}$ complex processing units can be implemented with simple gates because numbers are represented with streams. For example a multiplier is implemented with a XNOR gate, yielding savings in power and area of 90% compared with the typical binary approach. Previous work demonstrated that $\\mathrm{S}\\mathrm{C}-\\mathrm{A}\\Sigma\\Delta \\mathrm{M}$ leverages SSC advantages and addresses its drawbacks, achieving significant savings in energy, power and latency. In this work, we study a theoretical model to determine the fundamental limits of accuracy and computing time for SC- $\\mathrm{A}\\Sigma\\Delta \\mathrm{M}$. Since the $\\Sigma\\Delta$ streams are periodic the final computing error is non-zero and depends on the period of the input streams. We validate our theoretical model with Spice-level simulations and evaluate the power and energy consumption using a standard FinFetlX2 technology for two cases: 1) multiplication and 2) gamma correction, an image processing algorithm. Our work determines circuit design guidelines for $\\mathrm{S}\\mathrm{C}-\\mathrm{A}\\Sigma\\Delta \\mathrm{M}$ and shows that multiplication with $\\mathrm{S}\\mathrm{C}-\\mathrm{A}\\Sigma\\Delta \\mathrm{M}$ requires at least 6X less time than SSC. The latency reduction and novel architecture positively impacts the overall energy consumption in the $\\mathrm{I}\\mathrm{o}\\mathrm{T}$ node, enabling savings in energy of 79% compared with the binary approach.1SC is by definition a synchronous approach, thus we use SSC to differentiate it from asynchronous stochastic computing2In modern technologies the node number does not refer to any one feature in the process, and foundries use slightly different conventions; we use lx to denote the 14/16nm FinFET nodes offered by the foundry.","PeriodicalId":253998,"journal":{"name":"2019 32nd IEEE International System-on-Chip Conference (SOCC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Error-latency Trade-off for Asynchronous Stochastic Computing with ΣΔ Streams for the IoT\",\"authors\":\"Patricia Gonzalez-Guerrero, S. G. Wilson, M. Stan\",\"doi\":\"10.1109/SOCC46988.2019.1570548453\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Asynchronous stochastic computing (ASC) using continuous-time-asynchronous $\\\\Sigma\\\\Delta$ modulators $(\\\\mathrm{S}\\\\mathrm{C}-\\\\mathrm{A}\\\\Sigma\\\\Delta \\\\mathrm{M})$ has the potential to enable ultra-low-power, on-node machine learning algorithms for the next generation of sensors for the Internet of Things $(\\\\mathrm{I}\\\\mathrm{o}\\\\mathrm{T})$. Similar to synchronous stochastic computing $(\\\\mathrm{S}\\\\mathrm{S}\\\\mathrm{C}^{\\\\mathrm{I}})$1, in $\\\\mathrm{S}\\\\mathrm{C}-\\\\mathrm{A}\\\\Sigma\\\\Delta \\\\mathrm{M}$ complex processing units can be implemented with simple gates because numbers are represented with streams. For example a multiplier is implemented with a XNOR gate, yielding savings in power and area of 90% compared with the typical binary approach. Previous work demonstrated that $\\\\mathrm{S}\\\\mathrm{C}-\\\\mathrm{A}\\\\Sigma\\\\Delta \\\\mathrm{M}$ leverages SSC advantages and addresses its drawbacks, achieving significant savings in energy, power and latency. In this work, we study a theoretical model to determine the fundamental limits of accuracy and computing time for SC- $\\\\mathrm{A}\\\\Sigma\\\\Delta \\\\mathrm{M}$. Since the $\\\\Sigma\\\\Delta$ streams are periodic the final computing error is non-zero and depends on the period of the input streams. We validate our theoretical model with Spice-level simulations and evaluate the power and energy consumption using a standard FinFetlX2 technology for two cases: 1) multiplication and 2) gamma correction, an image processing algorithm. Our work determines circuit design guidelines for $\\\\mathrm{S}\\\\mathrm{C}-\\\\mathrm{A}\\\\Sigma\\\\Delta \\\\mathrm{M}$ and shows that multiplication with $\\\\mathrm{S}\\\\mathrm{C}-\\\\mathrm{A}\\\\Sigma\\\\Delta \\\\mathrm{M}$ requires at least 6X less time than SSC. The latency reduction and novel architecture positively impacts the overall energy consumption in the $\\\\mathrm{I}\\\\mathrm{o}\\\\mathrm{T}$ node, enabling savings in energy of 79% compared with the binary approach.1SC is by definition a synchronous approach, thus we use SSC to differentiate it from asynchronous stochastic computing2In modern technologies the node number does not refer to any one feature in the process, and foundries use slightly different conventions; we use lx to denote the 14/16nm FinFET nodes offered by the foundry.\",\"PeriodicalId\":253998,\"journal\":{\"name\":\"2019 32nd IEEE International System-on-Chip Conference (SOCC)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 32nd IEEE International System-on-Chip Conference (SOCC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SOCC46988.2019.1570548453\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 32nd IEEE International System-on-Chip Conference (SOCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SOCC46988.2019.1570548453","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
摘要
使用连续时间异步$\Sigma\Delta$调制器的异步随机计算(ASC) $(\mathrm{S}\mathrm{C}-\mathrm{A}\Sigma\Delta \mathrm{M})$有潜力为下一代物联网传感器实现超低功耗、节点上机器学习算法$(\mathrm{I}\mathrm{o}\mathrm{T})$。与同步随机计算$(\mathrm{S}\mathrm{S}\mathrm{C}^{\mathrm{I}})$ 1类似,在$\mathrm{S}\mathrm{C}-\mathrm{A}\Sigma\Delta \mathrm{M}$中,复杂的处理单元可以用简单的门来实现,因为数字是用流表示的。例如,用XNOR门实现乘法器,可以节省90的功率和面积% compared with the typical binary approach. Previous work demonstrated that $\mathrm{S}\mathrm{C}-\mathrm{A}\Sigma\Delta \mathrm{M}$ leverages SSC advantages and addresses its drawbacks, achieving significant savings in energy, power and latency. In this work, we study a theoretical model to determine the fundamental limits of accuracy and computing time for SC- $\mathrm{A}\Sigma\Delta \mathrm{M}$. Since the $\Sigma\Delta$ streams are periodic the final computing error is non-zero and depends on the period of the input streams. We validate our theoretical model with Spice-level simulations and evaluate the power and energy consumption using a standard FinFetlX2 technology for two cases: 1) multiplication and 2) gamma correction, an image processing algorithm. Our work determines circuit design guidelines for $\mathrm{S}\mathrm{C}-\mathrm{A}\Sigma\Delta \mathrm{M}$ and shows that multiplication with $\mathrm{S}\mathrm{C}-\mathrm{A}\Sigma\Delta \mathrm{M}$ requires at least 6X less time than SSC. The latency reduction and novel architecture positively impacts the overall energy consumption in the $\mathrm{I}\mathrm{o}\mathrm{T}$ node, enabling savings in energy of 79% compared with the binary approach.1SC is by definition a synchronous approach, thus we use SSC to differentiate it from asynchronous stochastic computing2In modern technologies the node number does not refer to any one feature in the process, and foundries use slightly different conventions; we use lx to denote the 14/16nm FinFET nodes offered by the foundry.
Error-latency Trade-off for Asynchronous Stochastic Computing with ΣΔ Streams for the IoT
Asynchronous stochastic computing (ASC) using continuous-time-asynchronous $\Sigma\Delta$ modulators $(\mathrm{S}\mathrm{C}-\mathrm{A}\Sigma\Delta \mathrm{M})$ has the potential to enable ultra-low-power, on-node machine learning algorithms for the next generation of sensors for the Internet of Things $(\mathrm{I}\mathrm{o}\mathrm{T})$. Similar to synchronous stochastic computing $(\mathrm{S}\mathrm{S}\mathrm{C}^{\mathrm{I}})$1, in $\mathrm{S}\mathrm{C}-\mathrm{A}\Sigma\Delta \mathrm{M}$ complex processing units can be implemented with simple gates because numbers are represented with streams. For example a multiplier is implemented with a XNOR gate, yielding savings in power and area of 90% compared with the typical binary approach. Previous work demonstrated that $\mathrm{S}\mathrm{C}-\mathrm{A}\Sigma\Delta \mathrm{M}$ leverages SSC advantages and addresses its drawbacks, achieving significant savings in energy, power and latency. In this work, we study a theoretical model to determine the fundamental limits of accuracy and computing time for SC- $\mathrm{A}\Sigma\Delta \mathrm{M}$. Since the $\Sigma\Delta$ streams are periodic the final computing error is non-zero and depends on the period of the input streams. We validate our theoretical model with Spice-level simulations and evaluate the power and energy consumption using a standard FinFetlX2 technology for two cases: 1) multiplication and 2) gamma correction, an image processing algorithm. Our work determines circuit design guidelines for $\mathrm{S}\mathrm{C}-\mathrm{A}\Sigma\Delta \mathrm{M}$ and shows that multiplication with $\mathrm{S}\mathrm{C}-\mathrm{A}\Sigma\Delta \mathrm{M}$ requires at least 6X less time than SSC. The latency reduction and novel architecture positively impacts the overall energy consumption in the $\mathrm{I}\mathrm{o}\mathrm{T}$ node, enabling savings in energy of 79% compared with the binary approach.1SC is by definition a synchronous approach, thus we use SSC to differentiate it from asynchronous stochastic computing2In modern technologies the node number does not refer to any one feature in the process, and foundries use slightly different conventions; we use lx to denote the 14/16nm FinFET nodes offered by the foundry.