Javier E. Soto, Paulo Ubisse, Cecilia Hernández, M. Figueroa
{"title":"A hardware accelerator for entropy estimation using the top-k most frequent elements","authors":"Javier E. Soto, Paulo Ubisse, Cecilia Hernández, M. Figueroa","doi":"10.1109/DSD51259.2020.00032","DOIUrl":null,"url":null,"abstract":"Estimating the empirical entropy of the elements in a dataset is an important task in data analysis. In particular, empirical entropy can be effectively used to detect anomalies in network traffic. However, computing the empirical entropy of a large dataset is computationally expensive and requires a large amount of memory. This is particularly important in high-speed network traffic analysis, where computing the entropy of a data flow in real time requires using hardware accelerators with restricted on-chip memory and arithmetic resources. In this work, we propose a method to estimate the entropy using a streaming algorithm with sublinear space requirements. Our approach uses a sketch to estimate the frequency of the elements in the stream, and a priority queue to store the top-k most frequent elements. We show that our method can provide a good approximation of the entropy of the dataset, and present the design of a hardware accelerator that can compute the entropy of the stream with a throughput of one packet per clock cycle. Implemented on a Xilinx Zynq UltraScale + MPSoC ZCU102 FPGA, our accelerator can operate at line rates above 181 Gbps, consuming 511 mW and using less than 24% of the resources available on the device.","PeriodicalId":128527,"journal":{"name":"2020 23rd Euromicro Conference on Digital System Design (DSD)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 23rd Euromicro Conference on Digital System Design (DSD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSD51259.2020.00032","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Estimating the empirical entropy of the elements in a dataset is an important task in data analysis. In particular, empirical entropy can be effectively used to detect anomalies in network traffic. However, computing the empirical entropy of a large dataset is computationally expensive and requires a large amount of memory. This is particularly important in high-speed network traffic analysis, where computing the entropy of a data flow in real time requires using hardware accelerators with restricted on-chip memory and arithmetic resources. In this work, we propose a method to estimate the entropy using a streaming algorithm with sublinear space requirements. Our approach uses a sketch to estimate the frequency of the elements in the stream, and a priority queue to store the top-k most frequent elements. We show that our method can provide a good approximation of the entropy of the dataset, and present the design of a hardware accelerator that can compute the entropy of the stream with a throughput of one packet per clock cycle. Implemented on a Xilinx Zynq UltraScale + MPSoC ZCU102 FPGA, our accelerator can operate at line rates above 181 Gbps, consuming 511 mW and using less than 24% of the resources available on the device.