Categorical range maxima queries

Manish Patil, Sharma V. Thankachan, R. Shah, Yakov Nekrich, J. Vitter
{"title":"Categorical range maxima queries","authors":"Manish Patil, Sharma V. Thankachan, R. Shah, Yakov Nekrich, J. Vitter","doi":"10.1145/2594538.2594557","DOIUrl":null,"url":null,"abstract":"Given an array A[1...n] of n distinct elements from the set {1, 2, ..., n} a range maximum query RMQ(a, b) returns the highest element in A[a...b] along with its position. In this paper, we study a generalization of this classical problem called Categorical Range Maxima Query (CRMQ) problem, in which each element A[i] in the array has an associated category (color) given by C[i] ∈ [σ]. A query then asks to report each distinct color c appearing in C[a...b] along with the highest element (and its position) in A[a...b] with color c. Let pc denote the position of the highest element in A[a...b] with color c. We investigate two variants of this problem: a threshold version and a top-k version. In threshold version, we only need to output the colors with A[pc] more than the input threshold τ, whereas top-k variant asks for k colors with the highest A[pc] values. In the word RAM model, we achieve linear space structure along with O(k) query time, that can report colors in sorted order of A[•]. In external memory, we present a data structure that answers queries in optimal O(1+k/B) I/O's using almost-linear O(n log* n) space, as well as a linear space data structure with O(log* n + k/B) query I/Os. Here k represents the output size, log* n is the iterated logarithm of n and B is the block size. CRMQ has applications to document retrieval and categorical range reporting -- giving a one-shot framework to obtain improved results in both these problems. Our results for CRMQ not only improve the existing best known results for three-sided categorical range reporting but also overcome the hurdle of maintaining color uniqueness in the output set.","PeriodicalId":302451,"journal":{"name":"Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2594538.2594557","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 19

Abstract

Given an array A[1...n] of n distinct elements from the set {1, 2, ..., n} a range maximum query RMQ(a, b) returns the highest element in A[a...b] along with its position. In this paper, we study a generalization of this classical problem called Categorical Range Maxima Query (CRMQ) problem, in which each element A[i] in the array has an associated category (color) given by C[i] ∈ [σ]. A query then asks to report each distinct color c appearing in C[a...b] along with the highest element (and its position) in A[a...b] with color c. Let pc denote the position of the highest element in A[a...b] with color c. We investigate two variants of this problem: a threshold version and a top-k version. In threshold version, we only need to output the colors with A[pc] more than the input threshold τ, whereas top-k variant asks for k colors with the highest A[pc] values. In the word RAM model, we achieve linear space structure along with O(k) query time, that can report colors in sorted order of A[•]. In external memory, we present a data structure that answers queries in optimal O(1+k/B) I/O's using almost-linear O(n log* n) space, as well as a linear space data structure with O(log* n + k/B) query I/Os. Here k represents the output size, log* n is the iterated logarithm of n and B is the block size. CRMQ has applications to document retrieval and categorical range reporting -- giving a one-shot framework to obtain improved results in both these problems. Our results for CRMQ not only improve the existing best known results for three-sided categorical range reporting but also overcome the hurdle of maintaining color uniqueness in the output set.
分类范围最大查询
给定数组A[1…集合{1,2,…]中的N个不同元素RMQ(a, b)返回a中最高的元素[a…]B]与它的位置一起。本文研究了范畴极值查询(CRMQ)问题的推广,其中数组中的每个元素a [i]都有一个由C[i]∈[σ]给出的关联范畴(颜色)。然后,查询要求报告c中出现的每种不同颜色c [A…]b]以及A中最高的元素(及其位置)[A…]c.设pc表示A中最高元素的位置[A…]我们研究了这个问题的两个变体:阈值版本和top-k版本。在阈值版本中,我们只需要输出A[pc]大于输入阈值τ的颜色,而top-k变体要求输出k个具有最高A[pc]值的颜色。在word RAM模型中,我们实现了线性空间结构和O(k)查询时间,可以按照A[•]的排序顺序报告颜色。在外部存储器中,我们提出了一种数据结构,该数据结构使用几乎线性的O(n log* n)空间以最优的O(1+k/B) I/O回答查询,以及具有O(log* n +k/B)查询I/O的线性空间数据结构。这里k表示输出大小,log* n是n的迭代对数,B是块大小。CRMQ有用于文档检索和分类范围报告的应用程序——提供一个一次性框架,以在这两个问题中获得改进的结果。我们的CRMQ结果不仅改进了现有最知名的三面分类范围报告结果,而且克服了在输出集中保持颜色唯一性的障碍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信