{"title":"布鲁姆过滤器 H3 哈希函数的统一性和独立性","authors":"Furkan Koltuk;Ece Güran Schmidt","doi":"10.1109/TC.2024.3398426","DOIUrl":null,"url":null,"abstract":"In this paper, we investigate the effects of violating the conditions of hash function uniformity and/or independence on the false positive probability of Bloom Filters (BF). To this end, we focus on hash functions of the H3 family with a partitioned memory organization for fast hardware implementations of BFs. We first introduce a dependence metric that quantifies hash function uniformity and independence. We then state and prove the necessary and sufficient conditions on the BF parameters for constructing uniform and independent hash functions. Finally, we derive an analytical expression for the exact false positive probability of a BF with hash functions that are not necessarily uniform or independent. We verify our expression with a hardware test bench and explore the effects of losing uniformity and independence through an experimental study that systematically sweeps different dependence metric values and numbers of hash functions. We demonstrate the effects of violating hash function uniformity and independence on the stated target false positive probability for selected previous works in the literature. As an important finding, we show that uniformity of individual hash functions is essential, whereas limited dependencies between hash functions can be tolerated without a negative effect on the false positive probability.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"73 8","pages":"1913-1923"},"PeriodicalIF":3.6000,"publicationDate":"2024-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Uniformity and Independence of H3 Hash Functions for Bloom Filters\",\"authors\":\"Furkan Koltuk;Ece Güran Schmidt\",\"doi\":\"10.1109/TC.2024.3398426\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we investigate the effects of violating the conditions of hash function uniformity and/or independence on the false positive probability of Bloom Filters (BF). To this end, we focus on hash functions of the H3 family with a partitioned memory organization for fast hardware implementations of BFs. We first introduce a dependence metric that quantifies hash function uniformity and independence. We then state and prove the necessary and sufficient conditions on the BF parameters for constructing uniform and independent hash functions. Finally, we derive an analytical expression for the exact false positive probability of a BF with hash functions that are not necessarily uniform or independent. We verify our expression with a hardware test bench and explore the effects of losing uniformity and independence through an experimental study that systematically sweeps different dependence metric values and numbers of hash functions. We demonstrate the effects of violating hash function uniformity and independence on the stated target false positive probability for selected previous works in the literature. As an important finding, we show that uniformity of individual hash functions is essential, whereas limited dependencies between hash functions can be tolerated without a negative effect on the false positive probability.\",\"PeriodicalId\":13087,\"journal\":{\"name\":\"IEEE Transactions on Computers\",\"volume\":\"73 8\",\"pages\":\"1913-1923\"},\"PeriodicalIF\":3.6000,\"publicationDate\":\"2024-03-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Computers\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10527416/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computers","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10527416/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
Uniformity and Independence of H3 Hash Functions for Bloom Filters
In this paper, we investigate the effects of violating the conditions of hash function uniformity and/or independence on the false positive probability of Bloom Filters (BF). To this end, we focus on hash functions of the H3 family with a partitioned memory organization for fast hardware implementations of BFs. We first introduce a dependence metric that quantifies hash function uniformity and independence. We then state and prove the necessary and sufficient conditions on the BF parameters for constructing uniform and independent hash functions. Finally, we derive an analytical expression for the exact false positive probability of a BF with hash functions that are not necessarily uniform or independent. We verify our expression with a hardware test bench and explore the effects of losing uniformity and independence through an experimental study that systematically sweeps different dependence metric values and numbers of hash functions. We demonstrate the effects of violating hash function uniformity and independence on the stated target false positive probability for selected previous works in the literature. As an important finding, we show that uniformity of individual hash functions is essential, whereas limited dependencies between hash functions can be tolerated without a negative effect on the false positive probability.
期刊介绍:
The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field. It publishes papers on research in areas of current interest to the readers. These areas include, but are not limited to, the following: a) computer organizations and architectures; b) operating systems, software systems, and communication protocols; c) real-time systems and embedded systems; d) digital devices, computer components, and interconnection networks; e) specification, design, prototyping, and testing methods and tools; f) performance, fault tolerance, reliability, security, and testability; g) case studies and experimental and theoretical evaluations; and h) new and important applications and trends.