Manish Patil, Sharma V. Thankachan, R. Shah, Yakov Nekrich, J. Vitter
{"title":"Categorical range maxima queries","authors":"Manish Patil, Sharma V. Thankachan, R. Shah, Yakov Nekrich, J. Vitter","doi":"10.1145/2594538.2594557","DOIUrl":null,"url":null,"abstract":"Given an array A[1...n] of n distinct elements from the set {1, 2, ..., n} a range maximum query RMQ(a, b) returns the highest element in A[a...b] along with its position. In this paper, we study a generalization of this classical problem called Categorical Range Maxima Query (CRMQ) problem, in which each element A[i] in the array has an associated category (color) given by C[i] ∈ [σ]. A query then asks to report each distinct color c appearing in C[a...b] along with the highest element (and its position) in A[a...b] with color c. Let pc denote the position of the highest element in A[a...b] with color c. We investigate two variants of this problem: a threshold version and a top-k version. In threshold version, we only need to output the colors with A[pc] more than the input threshold τ, whereas top-k variant asks for k colors with the highest A[pc] values. In the word RAM model, we achieve linear space structure along with O(k) query time, that can report colors in sorted order of A[•]. In external memory, we present a data structure that answers queries in optimal O(1+k/B) I/O's using almost-linear O(n log* n) space, as well as a linear space data structure with O(log* n + k/B) query I/Os. Here k represents the output size, log* n is the iterated logarithm of n and B is the block size. CRMQ has applications to document retrieval and categorical range reporting -- giving a one-shot framework to obtain improved results in both these problems. Our results for CRMQ not only improve the existing best known results for three-sided categorical range reporting but also overcome the hurdle of maintaining color uniqueness in the output set.","PeriodicalId":302451,"journal":{"name":"Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2594538.2594557","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 19
Abstract
Given an array A[1...n] of n distinct elements from the set {1, 2, ..., n} a range maximum query RMQ(a, b) returns the highest element in A[a...b] along with its position. In this paper, we study a generalization of this classical problem called Categorical Range Maxima Query (CRMQ) problem, in which each element A[i] in the array has an associated category (color) given by C[i] ∈ [σ]. A query then asks to report each distinct color c appearing in C[a...b] along with the highest element (and its position) in A[a...b] with color c. Let pc denote the position of the highest element in A[a...b] with color c. We investigate two variants of this problem: a threshold version and a top-k version. In threshold version, we only need to output the colors with A[pc] more than the input threshold τ, whereas top-k variant asks for k colors with the highest A[pc] values. In the word RAM model, we achieve linear space structure along with O(k) query time, that can report colors in sorted order of A[•]. In external memory, we present a data structure that answers queries in optimal O(1+k/B) I/O's using almost-linear O(n log* n) space, as well as a linear space data structure with O(log* n + k/B) query I/Os. Here k represents the output size, log* n is the iterated logarithm of n and B is the block size. CRMQ has applications to document retrieval and categorical range reporting -- giving a one-shot framework to obtain improved results in both these problems. Our results for CRMQ not only improve the existing best known results for three-sided categorical range reporting but also overcome the hurdle of maintaining color uniqueness in the output set.