Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data最新文献_第10页

Session details: Research session 2: social networks 1 会议详情:研究会议2:社交网络1

Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data Pub Date : 2014-06-18 DOI: 10.1145/3255748

L. Lakshmanan

引用次数: 0

A pivotal prefix based filtering algorithm for string similarity search 一种基于关键前缀的字符串相似度搜索过滤算法

Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data Pub Date : 2014-06-18 DOI: 10.1145/2588555.2593675

Dong Deng, Guoliang Li, Jianhua Feng

{"title":"A pivotal prefix based filtering algorithm for string similarity search","authors":"Dong Deng, Guoliang Li, Jianhua Feng","doi":"10.1145/2588555.2593675","DOIUrl":"https://doi.org/10.1145/2588555.2593675","url":null,"abstract":"We study the string similarity search problem with edit-distance constraints, which, given a set of data strings and a query string, finds the similar strings to the query. Existing algorithms use a signature-based framework. They first generate signatures for each string and then prune the dissimilar strings which have no common signatures to the query. However existing methods involve large numbers of signatures and many signatures are unnecessary. Reducing the number of signatures not only increases the pruning power but also decreases the filtering cost. To address this problem, we propose a novel pivotal prefix filter which significantly reduces the number of signatures. We prove the pivotal filter achieves larger pruning power and less filtering cost than state-of-the-art filters. We develop a dynamic programming method to select high-quality pivotal prefix signatures to prune dissimilar strings with non-consecutive errors to the query. We propose an alignment filter that considers the alignments between signatures to prune large numbers of dissimilar pairs with consecutive errors to the query. Experimental results on three real datasets show that our method achieves high performance and outperforms the state-of-the-art methods by an order of magnitude.","PeriodicalId":314442,"journal":{"name":"Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124038065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 65

Fusing data with correlations 融合数据与相关性

Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data Pub Date : 2014-06-18 DOI: 10.1145/2588555.2593674

R. Pochampally, A. Sarma, X. Dong, A. Meliou, D. Srivastava

{"title":"Fusing data with correlations","authors":"R. Pochampally, A. Sarma, X. Dong, A. Meliou, D. Srivastava","doi":"10.1145/2588555.2593674","DOIUrl":"https://doi.org/10.1145/2588555.2593674","url":null,"abstract":"Many applications rely on Web data and extraction systems to accomplish knowledge-driven tasks. Web information is not curated, so many sources provide inaccurate, or conflicting information. Moreover, extraction systems introduce additional noise to the data. We wish to automatically distinguish correct data and erroneous data for creating a cleaner set of integrated data. Previous work has shown that a naive voting strategy that trusts data provided by the majority or at least a certain number of sources may not work well in the presence of copying between the sources. However, correlation between sources can be much broader than copying: sources may provide data from complementary domains (negative correlation), extractors may focus on different types of information (negative correlation), and extractors may apply common rules in extraction (positive correlation, without copying). In this paper we present novel techniques modeling correlations between sources and applying it in truth finding. We provide a comprehensive evaluation of our approach on three real-world datasets with different characteristics, as well as on synthetic data, showing that our algorithms outperform the existing state-of-the-art techniques.","PeriodicalId":314442,"journal":{"name":"Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data","volume":"285 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131559868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 130

Explainable security for relational databases 关系数据库的可解释安全性

Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data Pub Date : 2014-06-18 DOI: 10.1145/2588555.2593663

G. Bender, Lucja Kot, J. Gehrke

引用次数: 35

PLANET: making progress with commit processing in unpredictable environments PLANET:在不可预测的环境中进行提交处理

Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data Pub Date : 2014-06-18 DOI: 10.1145/2588555.2588558

Gene Pang, Tim Kraska, M. Franklin, A. Fekete

{"title":"PLANET: making progress with commit processing in unpredictable environments","authors":"Gene Pang, Tim Kraska, M. Franklin, A. Fekete","doi":"10.1145/2588555.2588558","DOIUrl":"https://doi.org/10.1145/2588555.2588558","url":null,"abstract":"Latency unpredictability in a database system can come from many factors, such as load spikes in the workload, inter-query interactions from consolidation, or communication costs in cloud computing or geo-replication. High variance and high latency environments make developing interactive applications difficult, because transactions may take too long to complete, or fail unexpectedly. We propose Predictive Latency-Aware NEtworked Transactions (PLANET), a new transaction programming model and underlying system support to address this issue. The model exposes the internal progress of the transaction, provides opportunities for application callbacks, and incorporates commit likelihood prediction to enable good user experience even in the presence of significant transaction delays. The mechanisms underlying PLANET can be used for admission control, thus improving overall performance in high contention situations. In this paper, we present this new transaction programming model, demonstrate its expressiveness via several use cases, and evaluate its performance using a strongly consistent geo-replicated database across five data centers.","PeriodicalId":314442,"journal":{"name":"Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115045096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

Stratified-sampling over social networks using mapreduce 使用mapreduce对社交网络进行分层抽样

Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data Pub Date : 2014-06-18 DOI: 10.1145/2588555.2588577

R. Levin, Y. Kanza

{"title":"Stratified-sampling over social networks using mapreduce","authors":"R. Levin, Y. Kanza","doi":"10.1145/2588555.2588577","DOIUrl":"https://doi.org/10.1145/2588555.2588577","url":null,"abstract":"Sampling is being used in statistical surveys to select a subset of individuals from some population, to estimate properties of the population. In stratified sampling, the surveyed population is partitioned into homogeneous subgroups and individuals are selected within the subgroups, to reduce the sample size. In this paper we consider sampling of large-scale, distributed online social networks, and we show how to deal with cases where several surveys are conducted in parallel---in some surveys it may be desired to share individuals to reduce costs, while in other surveys, sharing should be minimized, e.g., to prevent survey fatigue. A multi-survey stratified sampling is the task of choosing the individuals for several surveys, in parallel, according to sharing constraints, without a bias. In this paper, we present a scalable distributed algorithm, designed for the MapReduce framework, for answering stratified-sampling queries over a population of a social network. We also present an algorithm to effectively answer multi-survey stratified sampling, and we show how to implement it using MapReduce. An experimental evaluation illustrates the efficiency of our algorithms and their effectiveness for multi-survey stratified sampling.","PeriodicalId":314442,"journal":{"name":"Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124760335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

A temporal context-aware model for user behavior modeling in social media systems 社交媒体系统中用户行为建模的时间上下文感知模型

Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data Pub Date : 2014-06-18 DOI: 10.1145/2588555.2593685

Hongzhi Yin, B. Cui, Ling Chen, Zhiting Hu, Zi Huang

{"title":"A temporal context-aware model for user behavior modeling in social media systems","authors":"Hongzhi Yin, B. Cui, Ling Chen, Zhiting Hu, Zi Huang","doi":"10.1145/2588555.2593685","DOIUrl":"https://doi.org/10.1145/2588555.2593685","url":null,"abstract":"Social media provides valuable resources to analyze user behaviors and capture user preferences. This paper focuses on analyzing user behaviors in social media systems and designing a latent class statistical mixture model, named temporal context-aware mixture model (TCAM), to account for the intentions and preferences behind user behaviors. Based on the observation that the behaviors of a user in social media systems are generally influenced by intrinsic interest as well as the temporal context (e.g., the public's attention at that time), TCAM simultaneously models the topics related to users' intrinsic interests and the topics related to temporal context and then combines the influences from the two factors to model user behaviors in a unified way. To further improve the performance of TCAM, an item-weighting scheme is proposed to enable TCAM to favor items that better represent topics related to user interests and topics related to temporal context, respectively. Based on TCAM, we design an efficient query processing technique to support fast online recommendation for large social media data. Extensive experiments have been conducted to evaluate the performance of TCAM on four real-world datasets crawled from different social media sites. The experimental results demonstrate the superiority of the TCAM models, compared with the state-of-the-art competitor methods, by modeling user behaviors more precisely and making more effective and efficient recommendations.","PeriodicalId":314442,"journal":{"name":"Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129920670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 139

Scalable atomic visibility with RAMP transactions 使用RAMP事务的可伸缩原子可见性

Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data Pub Date : 2014-06-18 DOI: 10.1145/2588555.2588562

Peter D. Bailis, A. Fekete, J. Hellerstein, A. Ghodsi, I. Stoica

{"title":"Scalable atomic visibility with RAMP transactions","authors":"Peter D. Bailis, A. Fekete, J. Hellerstein, A. Ghodsi, I. Stoica","doi":"10.1145/2588555.2588562","DOIUrl":"https://doi.org/10.1145/2588555.2588562","url":null,"abstract":"Databases can provide scalability by partitioning data across several servers. However, multi-partition, multi-operation transactional access is often expensive, employing coordination-intensive locking, validation, or scheduling mechanisms. Accordingly, many real-world systems avoid mechanisms that provide useful semantics for multi-partition operations. This leads to incorrect behavior for a large class of applications including secondary indexing, foreign key enforcement, and materialized view maintenance. In this work, we identify a new isolation model---Read Atomic (RA) isolation---that matches the requirements of these use cases by ensuring atomic visibility: either all or none of each transaction's updates are observed by other transactions. We present algorithms for Read Atomic Multi-Partition (RAMP) transactions that enforce atomic visibility while offering excellent scalability, guaranteed commit despite partial failures (via synchronization independence), and minimized communication between servers (via partition independence). These RAMP transactions correctly mediate atomic visibility of updates and provide readers with snapshot access to database state by using limited multi-versioning and by allowing clients to independently resolve non-atomic reads. We demonstrate that, in contrast with existing algorithms, RAMP transactions incur limited overhead---even under high contention---and scale linearly to 100 servers.","PeriodicalId":314442,"journal":{"name":"Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130076984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

ONTOCUBO: cube-based ontology construction and exploration ONTOCUBO:基于立方体的本体构建与探索

Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data Pub Date : 2014-06-18 DOI: 10.1145/2588555.2594521

Carlos Garcia-Alvarado, C. Ordonez

{"title":"ONTOCUBO: cube-based ontology construction and exploration","authors":"Carlos Garcia-Alvarado, C. Ordonez","doi":"10.1145/2588555.2594521","DOIUrl":"https://doi.org/10.1145/2588555.2594521","url":null,"abstract":"One of the major challenges of big data analytics is the diverse information content, which has no pre-defined structure or classification. This is in contrast to the well-designed structure of a database specified on an ER model. A standard mechanism for understanding interrelationships and the structure of documents is using ontologies. With such motivation in mind, we present a system that enables data management and querying of documents based on ontologies by leveraging the functionality of the DBMS. In this paper, we present ONTOCUBO, a novel system based on our research for text summarization using ontologies and automatic extraction of concepts for building ontologies using Online Analytical Processing (OLAP) cubes. ONTOCUBO is a database-centric approach that excels in its performance, due to an SQL-based single pass summarization phase through the original data set that computes values such as keyword frequency, standard deviation, and lift. This approach is complemented with a set of User-Defined-Function-based algorithms that analyze the summarization results for concepts and their interrelationships. Finally, we show in detail our application that extracts and builds an ontology, but also allows concept summarizations and allows domain experts to explore and modify the resulting ontology.","PeriodicalId":314442,"journal":{"name":"Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data","volume":"286 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127719193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Searching with XQ: the exemplar query search engine 使用XQ进行搜索:示例查询搜索引擎

Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data Pub Date : 2014-06-18 DOI: 10.1145/2588555.2594529

D. Mottin, Matteo Lissandrini, Yannis Velegrakis, Themis Palpanas

引用次数: 16