Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data最新文献_第10页

Reducing cache misses in hash join probing phase by pre-sorting strategy (abstract only) 通过预排序策略减少哈希连接探测阶段的缓存丢失(仅抽象)

Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data Pub Date : 2012-05-20 DOI: 10.1145/2213836.2213971

Gihwan Oh, Jae-Myung Kim, Woon-Hak Kang, Sang-Won Lee

{"title":"Reducing cache misses in hash join probing phase by pre-sorting strategy (abstract only)","authors":"Gihwan Oh, Jae-Myung Kim, Woon-Hak Kang, Sang-Won Lee","doi":"10.1145/2213836.2213971","DOIUrl":"https://doi.org/10.1145/2213836.2213971","url":null,"abstract":"Recently, several studies on multi-core cache-aware hash join have been carried out [Kim09VLDB, Blanas11SIGMOD]. In particular, the work of Blanas has shown that rather simple no-partitioning hash join can outperform the work of Kim. Meanwhile, the simple but best performing hash join of Blanas still experiences severe cache misses in probing phase. Because the key values of tuples in outer relation are not sorted or clustered, each outer record has different hashed key value and thus accesses the different hash bucket. Since the size of hash table of inner table is usually much larger than that of the CPU cache, it is highly probable that the reference to hash bucket of inner table by each outer record would encounter cache miss. To reduce the cache misses in hash join probing phase, we propose a new join algorithm, Sorted Probing (in short, SP), which pre-sorts the hashed key values of outer table of hash join so that the access to the hash bucket of inner table has strong temporal locality, thus minimizing the cache misses during the probing phase. As an optimization technique of sorting, we used the cache-aware AlphaSort technique, which extracts the key from each record of data set to be sorted and its pointer, and then sorts the pairs of (key, rec_ptr). For performance evaluation, we used two hash join algorithms from Blanas' work, no partitioning(NP) and independent partitioning(IP) in a standard C++ program, provided by Blanas. Also, we implemented the AlphaSort and added it before each probing phase of NP and IP, and we call each algorithm as NP+SP and IP+SP. For syntactic workload, IP+SP outperforms all other algorithms: IP+SP is faster than other altorithms up to 30%.","PeriodicalId":212616,"journal":{"name":"Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116089672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Mob data sourcing 暴民数据来源

Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data Pub Date : 2012-05-20 DOI: 10.1145/2213836.2213905

Daniel Deutch, T. Milo

引用次数: 3

Edgar F. Codd Innovations Award Talk 埃德加·f·科德创新奖演讲

Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data Pub Date : 2012-05-20 DOI: 10.1145/2213836.2370804

Bruce E. Lindsay

引用次数: 0

CrowdScreen: algorithms for filtering data with humans CrowdScreen:人工过滤数据的算法

Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data Pub Date : 2012-05-20 DOI: 10.1145/2213836.2213878

Aditya G. Parameswaran, H. Garcia-Molina, Hyunjung Park, N. Polyzotis, Aditya Ramesh, J. Widom

引用次数: 249

DP-tree: indexing multi-dimensional data under differential privacy (abstract only) DP-tree:在差分隐私下索引多维数据(仅摘要)

Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data Pub Date : 2012-05-20 DOI: 10.1145/2213836.2213972

Shangfu Peng, Y. Yang, Zhenjie Zhang, M. Winslett, Yong Yu

{"title":"DP-tree: indexing multi-dimensional data under differential privacy (abstract only)","authors":"Shangfu Peng, Y. Yang, Zhenjie Zhang, M. Winslett, Yong Yu","doi":"10.1145/2213836.2213972","DOIUrl":"https://doi.org/10.1145/2213836.2213972","url":null,"abstract":"e-differential privacy (e-DP) is a strong and rigorous scheme for protecting individuals' privacy while releasing useful statistical information. The main idea is to inject random noise into the results of statistical queries, such that the existence of any single record has negligible impact on the distributions of query results. The accuracy of such randomized results depends heavily upon the query processing technique, which has been an active research topic in recent years. So far, most existing methods focus on 1-dimensional queries. The only work that handles multi-dimensional query processing under e-DP is [1], which indexes the sensitive data using variants of the quad-tree and the k-d-tree. As we point out in this paper, these structures are inherently suboptimal for answering queries under e-DP. Consequently, the solutions in [1] suffer from several serious drawbacks, including limited and unstable query accuracy, as well as bias towards certain types of queries. Motivated by this, we propose the DP-tree, a novel index structure for multi-dimensional query processing under e-DP that eliminates the problems encountered by the methods in [1]. Further, we show that the effectiveness of the DP-tree can be improved using statistical information about the query workload. Extensive experiments using real and synthetic datasets confirm that the DP-tree achieves significantly higher query accuracy than existing methods. Interestingly, an adaptation of the DP-tree also outperforms previous 1D solutions in their restricted scope, by large margins.","PeriodicalId":212616,"journal":{"name":"Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114103785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 23

Authenticating location-based services without compromising location privacy 在不损害位置隐私的情况下验证基于位置的服务

Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data Pub Date : 2012-05-20 DOI: 10.1145/2213836.2213871

Haibo Hu, Jianliang Xu, Qian Chen, Ziwei Yang

引用次数: 69

Optimal top-k generation of attribute combinations based on ranked lists 基于排名列表的最优top-k属性组合生成

Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data Pub Date : 2012-05-20 DOI: 10.1145/2213836.2213883

Jiaheng Lu, P. Senellart, Chunbin Lin, Xiaoyong Du, Shan Wang, Xinxing Chen

{"title":"Optimal top-k generation of attribute combinations based on ranked lists","authors":"Jiaheng Lu, P. Senellart, Chunbin Lin, Xiaoyong Du, Shan Wang, Xinxing Chen","doi":"10.1145/2213836.2213883","DOIUrl":"https://doi.org/10.1145/2213836.2213883","url":null,"abstract":"In this work, we study a novel query type, called top-k,m queries. Suppose we are given a set of groups and each group contains a set of attributes, each of which is associated with a ranked list of tuples, with ID and score. All lists are ranked in decreasing order of the scores of tuples. We are interested in finding the best combinations of attributes, each combination involving one attribute from each group. More specifically, we want the top-k combinations of attributes according to the corresponding top-m tuples with matching IDs. This problem has a wide range of applications from databases to search engines on traditional and non-traditional types of data (relational data, XML, text, etc.). We show that a straightforward extension of an optimal top-k algorithm, the Threshold Algorithm (TA), has shortcomings in solving the km problem, as it needs to compute a large number of intermediate results for each combination and reads moreinputs than needed. To overcome this weakness, we provide here, for the first time, a provably instance-optimal algorithm and further develop optimizations for efficient query evaluation to reduce computational and memory costs and the number of accesses. We demonstrate experimentally the scalability and efficiency of our algorithms over three real applications.","PeriodicalId":212616,"journal":{"name":"Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data","volume":"135 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131192636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

SIGMOD Contributions Award Talk SIGMOD贡献奖演讲

Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data Pub Date : 2012-05-20 DOI: 10.1145/2213836.2370916

M. Winslett

引用次数: 0

SigSpot: mining significant anomalous regions from time-evolving networks (abstract only) SigSpot:从时间演化的网络中挖掘重要的异常区域(仅抽象)

Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data Pub Date : 2012-05-20 DOI: 10.1145/2213836.2213974

M. Mongiovì, Petko Bogdanov, R. Ranca, Ambuj K. Singh, E. Papalexakis, C. Faloutsos

{"title":"SigSpot: mining significant anomalous regions from time-evolving networks (abstract only)","authors":"M. Mongiovì, Petko Bogdanov, R. Ranca, Ambuj K. Singh, E. Papalexakis, C. Faloutsos","doi":"10.1145/2213836.2213974","DOIUrl":"https://doi.org/10.1145/2213836.2213974","url":null,"abstract":"Anomaly detection in dynamic networks has a rich gamut of application domains, such as road networks, communication networks and water distribution networks. An anomalous event, such as a traffic accident, denial of service attack or a chemical spill, can cause a local shift from normal behavior in the network state that persists over an interval of time. Detecting such anomalous regions of network and time extent in large real-world networks is a challenging task. Existing anomaly detection techniques focus on either the time series associated with individual network edges or on global anomalies that affect the entire network. In order to detect anomalous regions, one needs to consider both the time and the affected network substructure jointly, which brings forth computational challenges due to the combinatorial nature of possible solutions. We propose the problem of mining all Significant Anomalous Regions (SAR) in time-evolving networks that asks for the discovery of connected temporal subgraphs comprised of edges that significantly deviate from normal in a persistent manner. We propose an optimal Baseline algorithm for the problem and an efficient approximation, called S IG S POT. Compared to Baseline, SIGSPOT is up to one order of magnitude faster in real data, while achieving less than 10% average relative error rate. In synthetic datasets it is more than 30 times faster than Baseline with 94% accuracy and solves efficiently large instances that are infeasible (more than 10 hours running time) for Baseline. We demonstrate the utility of SIGSPOT for inferring accidents on road networks and study its scalability when detecting anomalies in social, transportation and synthetic evolving networks, spanning up to 1GB.","PeriodicalId":212616,"journal":{"name":"Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133163709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

SOFIA SEARCH: a tool for automating related-work search SOFIA SEARCH:一个自动化相关工作搜索的工具

Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data Pub Date : 2012-05-20 DOI: 10.1145/2213836.2213915

Behzad Golshan, Theodoros Lappas, Evimaria Terzi

引用次数: 22