{"title":"精确和近似范围模式查询数据结构的实践","authors":"Meng He, Zhen Liu","doi":"10.4230/LIPIcs.SEA.2023.19","DOIUrl":null,"url":null,"abstract":"We conduct an experimental study on the range mode problem. In the exact version of the problem, we preprocess an array A , such that given a query range [ a, b ], the most frequent element in A [ a, b ] can be found efficiently. For this problem, our most important finding is that the strategy of using succinct data structures to encode more precomputed information not only helped Chan et al. (Linear-space data structures for range mode query in arrays, Theory of Computing Systems, 2013) improve previous results in theory but also helps us achieve the best time/space tradeoff in practice; we even go a step further to replace more components in their solution with succinct data structures and improve the performance further. In the approximate version of this problem, a (1 + ε )-approximate range mode query looks for an element whose occurrences in A [ a, b ] is at least F a,b / (1 + ε ), where F a,b is the frequency of the mode in A [ a, b ]. We implement all previous solutions to this problems and find that, even when ε = 1 2 , the average approximation ratio of these solutions is close to 1 in practice, and they provide much faster query time than the best exact solution. These solutions achieve different useful time-space tradeoffs, and among them, El-Zein et al. (On Approximate Range Mode and Range Selection, 30th International Symposium on Algorithms and Computation, 2019) provide us with one solution whose space usage is only 35 . 6% to 93 . 8% of the cost of storing the input array of 32-bit integers (in most cases, the space cost is closer to the lower end, and the average space cost is 20.2 bits per symbol among all datasets). Its non-succinct version also stands out with query support at least several times faster than other O ( nε )-word structures while using only slightly more space in practice.","PeriodicalId":9448,"journal":{"name":"Bulletin of the Society of Sea Water Science, Japan","volume":"165 1","pages":"19:1-19:22"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Exact and Approximate Range Mode Query Data Structures in Practice\",\"authors\":\"Meng He, Zhen Liu\",\"doi\":\"10.4230/LIPIcs.SEA.2023.19\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We conduct an experimental study on the range mode problem. In the exact version of the problem, we preprocess an array A , such that given a query range [ a, b ], the most frequent element in A [ a, b ] can be found efficiently. For this problem, our most important finding is that the strategy of using succinct data structures to encode more precomputed information not only helped Chan et al. (Linear-space data structures for range mode query in arrays, Theory of Computing Systems, 2013) improve previous results in theory but also helps us achieve the best time/space tradeoff in practice; we even go a step further to replace more components in their solution with succinct data structures and improve the performance further. In the approximate version of this problem, a (1 + ε )-approximate range mode query looks for an element whose occurrences in A [ a, b ] is at least F a,b / (1 + ε ), where F a,b is the frequency of the mode in A [ a, b ]. We implement all previous solutions to this problems and find that, even when ε = 1 2 , the average approximation ratio of these solutions is close to 1 in practice, and they provide much faster query time than the best exact solution. These solutions achieve different useful time-space tradeoffs, and among them, El-Zein et al. (On Approximate Range Mode and Range Selection, 30th International Symposium on Algorithms and Computation, 2019) provide us with one solution whose space usage is only 35 . 6% to 93 . 8% of the cost of storing the input array of 32-bit integers (in most cases, the space cost is closer to the lower end, and the average space cost is 20.2 bits per symbol among all datasets). Its non-succinct version also stands out with query support at least several times faster than other O ( nε )-word structures while using only slightly more space in practice.\",\"PeriodicalId\":9448,\"journal\":{\"name\":\"Bulletin of the Society of Sea Water Science, Japan\",\"volume\":\"165 1\",\"pages\":\"19:1-19:22\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Bulletin of the Society of Sea Water Science, Japan\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4230/LIPIcs.SEA.2023.19\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bulletin of the Society of Sea Water Science, Japan","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4230/LIPIcs.SEA.2023.19","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
摘要
我们对距离模式问题进行了实验研究。在这个问题的确切版本中,我们预处理一个数组A,这样给定一个查询范围[A, b],可以有效地找到A [A, b]中最频繁的元素。对于这个问题,我们最重要的发现是,使用简洁的数据结构来编码更多预先计算的信息的策略不仅有助于Chan等人(数组中范围模式查询的线性空间数据结构,Theory of Computing Systems, 2013)在理论上改善了以前的结果,而且还帮助我们在实践中实现了最佳的时间/空间权衡;我们甚至更进一步,用简洁的数据结构替换他们解决方案中的更多组件,并进一步提高性能。在这个问题的近似版本中,a (1 + ε)-近似范围模式查询查找在a [a,b]中出现的元素至少是F a,b / (1 + ε),其中F a,b是a [a,b]中模式的频率。我们实现了该问题之前的所有解,发现即使当ε = 1 2时,这些解的平均近似比在实践中也接近于1,并且它们提供的查询时间比最佳精确解快得多。这些解决方案实现了不同的有用的时空权衡,其中El-Zein等人(On Approximate Range Mode and Range Selection,第30届国际算法与计算研讨会,2019)为我们提供了一个空间利用率仅为35的解决方案。6%到93。存储32位整数输入数组的成本的8%(在大多数情况下,空间成本更接近下限,在所有数据集中,平均空间成本为每个符号20.2位)。它的非简洁版本也很突出,它的查询支持速度至少比其他0 (nε)字结构快几倍,而在实践中只使用了稍微多一点的空间。
Exact and Approximate Range Mode Query Data Structures in Practice
We conduct an experimental study on the range mode problem. In the exact version of the problem, we preprocess an array A , such that given a query range [ a, b ], the most frequent element in A [ a, b ] can be found efficiently. For this problem, our most important finding is that the strategy of using succinct data structures to encode more precomputed information not only helped Chan et al. (Linear-space data structures for range mode query in arrays, Theory of Computing Systems, 2013) improve previous results in theory but also helps us achieve the best time/space tradeoff in practice; we even go a step further to replace more components in their solution with succinct data structures and improve the performance further. In the approximate version of this problem, a (1 + ε )-approximate range mode query looks for an element whose occurrences in A [ a, b ] is at least F a,b / (1 + ε ), where F a,b is the frequency of the mode in A [ a, b ]. We implement all previous solutions to this problems and find that, even when ε = 1 2 , the average approximation ratio of these solutions is close to 1 in practice, and they provide much faster query time than the best exact solution. These solutions achieve different useful time-space tradeoffs, and among them, El-Zein et al. (On Approximate Range Mode and Range Selection, 30th International Symposium on Algorithms and Computation, 2019) provide us with one solution whose space usage is only 35 . 6% to 93 . 8% of the cost of storing the input array of 32-bit integers (in most cases, the space cost is closer to the lower end, and the average space cost is 20.2 bits per symbol among all datasets). Its non-succinct version also stands out with query support at least several times faster than other O ( nε )-word structures while using only slightly more space in practice.