{"title":"Session details: Research session 2: social networks 1","authors":"L. Lakshmanan","doi":"10.1145/3255748","DOIUrl":"https://doi.org/10.1145/3255748","url":null,"abstract":"","PeriodicalId":314442,"journal":{"name":"Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128896217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A pivotal prefix based filtering algorithm for string similarity search","authors":"Dong Deng, Guoliang Li, Jianhua Feng","doi":"10.1145/2588555.2593675","DOIUrl":"https://doi.org/10.1145/2588555.2593675","url":null,"abstract":"We study the string similarity search problem with edit-distance constraints, which, given a set of data strings and a query string, finds the similar strings to the query. Existing algorithms use a signature-based framework. They first generate signatures for each string and then prune the dissimilar strings which have no common signatures to the query. However existing methods involve large numbers of signatures and many signatures are unnecessary. Reducing the number of signatures not only increases the pruning power but also decreases the filtering cost. To address this problem, we propose a novel pivotal prefix filter which significantly reduces the number of signatures. We prove the pivotal filter achieves larger pruning power and less filtering cost than state-of-the-art filters. We develop a dynamic programming method to select high-quality pivotal prefix signatures to prune dissimilar strings with non-consecutive errors to the query. We propose an alignment filter that considers the alignments between signatures to prune large numbers of dissimilar pairs with consecutive errors to the query. Experimental results on three real datasets show that our method achieves high performance and outperforms the state-of-the-art methods by an order of magnitude.","PeriodicalId":314442,"journal":{"name":"Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124038065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Pochampally, A. Sarma, X. Dong, A. Meliou, D. Srivastava
{"title":"Fusing data with correlations","authors":"R. Pochampally, A. Sarma, X. Dong, A. Meliou, D. Srivastava","doi":"10.1145/2588555.2593674","DOIUrl":"https://doi.org/10.1145/2588555.2593674","url":null,"abstract":"Many applications rely on Web data and extraction systems to accomplish knowledge-driven tasks. Web information is not curated, so many sources provide inaccurate, or conflicting information. Moreover, extraction systems introduce additional noise to the data. We wish to automatically distinguish correct data and erroneous data for creating a cleaner set of integrated data. Previous work has shown that a naive voting strategy that trusts data provided by the majority or at least a certain number of sources may not work well in the presence of copying between the sources. However, correlation between sources can be much broader than copying: sources may provide data from complementary domains (negative correlation), extractors may focus on different types of information (negative correlation), and extractors may apply common rules in extraction (positive correlation, without copying). In this paper we present novel techniques modeling correlations between sources and applying it in truth finding. We provide a comprehensive evaluation of our approach on three real-world datasets with different characteristics, as well as on synthetic data, showing that our algorithms outperform the existing state-of-the-art techniques.","PeriodicalId":314442,"journal":{"name":"Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data","volume":"285 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131559868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Explainable security for relational databases","authors":"G. Bender, Lucja Kot, J. Gehrke","doi":"10.1145/2588555.2593663","DOIUrl":"https://doi.org/10.1145/2588555.2593663","url":null,"abstract":"Companies and organizations collect and use vast troves of sensitive user data whose release must be carefully controlled. In practice, the access policies that govern this data are often fine-grained, complex, poorly documented, and difficult to reason about. As a result, principals frequently request and are granted access to data they never use. To encourage developers and administrators to use security mechanisms more effectively, we propose a novel security model in which all security decisions are formally explainable. Whether a query is accepted or denied, the system returns a concise yet formal explanation which can allow the issuer to reformulate a rejected query or adjust his/her security credentials. Our approach has a strong formal foundation based on previously unexplored connections between disclosure lattices and policy algebras. We build on this foundation and implement a disclosure control system that handles a wide variety of real SQL queries and can accommodate complex policy constraints.","PeriodicalId":314442,"journal":{"name":"Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127506496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PLANET: making progress with commit processing in unpredictable environments","authors":"Gene Pang, Tim Kraska, M. Franklin, A. Fekete","doi":"10.1145/2588555.2588558","DOIUrl":"https://doi.org/10.1145/2588555.2588558","url":null,"abstract":"Latency unpredictability in a database system can come from many factors, such as load spikes in the workload, inter-query interactions from consolidation, or communication costs in cloud computing or geo-replication. High variance and high latency environments make developing interactive applications difficult, because transactions may take too long to complete, or fail unexpectedly. We propose Predictive Latency-Aware NEtworked Transactions (PLANET), a new transaction programming model and underlying system support to address this issue. The model exposes the internal progress of the transaction, provides opportunities for application callbacks, and incorporates commit likelihood prediction to enable good user experience even in the presence of significant transaction delays. The mechanisms underlying PLANET can be used for admission control, thus improving overall performance in high contention situations. In this paper, we present this new transaction programming model, demonstrate its expressiveness via several use cases, and evaluate its performance using a strongly consistent geo-replicated database across five data centers.","PeriodicalId":314442,"journal":{"name":"Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115045096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Stratified-sampling over social networks using mapreduce","authors":"R. Levin, Y. Kanza","doi":"10.1145/2588555.2588577","DOIUrl":"https://doi.org/10.1145/2588555.2588577","url":null,"abstract":"Sampling is being used in statistical surveys to select a subset of individuals from some population, to estimate properties of the population. In stratified sampling, the surveyed population is partitioned into homogeneous subgroups and individuals are selected within the subgroups, to reduce the sample size. In this paper we consider sampling of large-scale, distributed online social networks, and we show how to deal with cases where several surveys are conducted in parallel---in some surveys it may be desired to share individuals to reduce costs, while in other surveys, sharing should be minimized, e.g., to prevent survey fatigue. A multi-survey stratified sampling is the task of choosing the individuals for several surveys, in parallel, according to sharing constraints, without a bias. In this paper, we present a scalable distributed algorithm, designed for the MapReduce framework, for answering stratified-sampling queries over a population of a social network. We also present an algorithm to effectively answer multi-survey stratified sampling, and we show how to implement it using MapReduce. An experimental evaluation illustrates the efficiency of our algorithms and their effectiveness for multi-survey stratified sampling.","PeriodicalId":314442,"journal":{"name":"Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124760335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hongzhi Yin, B. Cui, Ling Chen, Zhiting Hu, Zi Huang
{"title":"A temporal context-aware model for user behavior modeling in social media systems","authors":"Hongzhi Yin, B. Cui, Ling Chen, Zhiting Hu, Zi Huang","doi":"10.1145/2588555.2593685","DOIUrl":"https://doi.org/10.1145/2588555.2593685","url":null,"abstract":"Social media provides valuable resources to analyze user behaviors and capture user preferences. This paper focuses on analyzing user behaviors in social media systems and designing a latent class statistical mixture model, named temporal context-aware mixture model (TCAM), to account for the intentions and preferences behind user behaviors. Based on the observation that the behaviors of a user in social media systems are generally influenced by intrinsic interest as well as the temporal context (e.g., the public's attention at that time), TCAM simultaneously models the topics related to users' intrinsic interests and the topics related to temporal context and then combines the influences from the two factors to model user behaviors in a unified way. To further improve the performance of TCAM, an item-weighting scheme is proposed to enable TCAM to favor items that better represent topics related to user interests and topics related to temporal context, respectively. Based on TCAM, we design an efficient query processing technique to support fast online recommendation for large social media data. Extensive experiments have been conducted to evaluate the performance of TCAM on four real-world datasets crawled from different social media sites. The experimental results demonstrate the superiority of the TCAM models, compared with the state-of-the-art competitor methods, by modeling user behaviors more precisely and making more effective and efficient recommendations.","PeriodicalId":314442,"journal":{"name":"Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129920670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Peter D. Bailis, A. Fekete, J. Hellerstein, A. Ghodsi, I. Stoica
{"title":"Scalable atomic visibility with RAMP transactions","authors":"Peter D. Bailis, A. Fekete, J. Hellerstein, A. Ghodsi, I. Stoica","doi":"10.1145/2588555.2588562","DOIUrl":"https://doi.org/10.1145/2588555.2588562","url":null,"abstract":"Databases can provide scalability by partitioning data across several servers. However, multi-partition, multi-operation transactional access is often expensive, employing coordination-intensive locking, validation, or scheduling mechanisms. Accordingly, many real-world systems avoid mechanisms that provide useful semantics for multi-partition operations. This leads to incorrect behavior for a large class of applications including secondary indexing, foreign key enforcement, and materialized view maintenance. In this work, we identify a new isolation model---Read Atomic (RA) isolation---that matches the requirements of these use cases by ensuring atomic visibility: either all or none of each transaction's updates are observed by other transactions. We present algorithms for Read Atomic Multi-Partition (RAMP) transactions that enforce atomic visibility while offering excellent scalability, guaranteed commit despite partial failures (via synchronization independence), and minimized communication between servers (via partition independence). These RAMP transactions correctly mediate atomic visibility of updates and provide readers with snapshot access to database state by using limited multi-versioning and by allowing clients to independently resolve non-atomic reads. We demonstrate that, in contrast with existing algorithms, RAMP transactions incur limited overhead---even under high contention---and scale linearly to 100 servers.","PeriodicalId":314442,"journal":{"name":"Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130076984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ONTOCUBO: cube-based ontology construction and exploration","authors":"Carlos Garcia-Alvarado, C. Ordonez","doi":"10.1145/2588555.2594521","DOIUrl":"https://doi.org/10.1145/2588555.2594521","url":null,"abstract":"One of the major challenges of big data analytics is the diverse information content, which has no pre-defined structure or classification. This is in contrast to the well-designed structure of a database specified on an ER model. A standard mechanism for understanding interrelationships and the structure of documents is using ontologies. With such motivation in mind, we present a system that enables data management and querying of documents based on ontologies by leveraging the functionality of the DBMS. In this paper, we present ONTOCUBO, a novel system based on our research for text summarization using ontologies and automatic extraction of concepts for building ontologies using Online Analytical Processing (OLAP) cubes. ONTOCUBO is a database-centric approach that excels in its performance, due to an SQL-based single pass summarization phase through the original data set that computes values such as keyword frequency, standard deviation, and lift. This approach is complemented with a set of User-Defined-Function-based algorithms that analyze the summarization results for concepts and their interrelationships. Finally, we show in detail our application that extracts and builds an ontology, but also allows concept summarizations and allows domain experts to explore and modify the resulting ontology.","PeriodicalId":314442,"journal":{"name":"Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data","volume":"286 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127719193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Mottin, Matteo Lissandrini, Yannis Velegrakis, Themis Palpanas
{"title":"Searching with XQ: the exemplar query search engine","authors":"D. Mottin, Matteo Lissandrini, Yannis Velegrakis, Themis Palpanas","doi":"10.1145/2588555.2594529","DOIUrl":"https://doi.org/10.1145/2588555.2594529","url":null,"abstract":"We demonstrate XQ, a query engine that implements a novel technique for searching relevant information on the web and in various data sources, called Exemplar Queries. While the traditional query model expects the user to provide a set of specifications that the elements of interest need to satisfy, XQ expects the user to provide only an element of interest and we infer the desired answer set based on that element. Through the various examples we demonstrate the functionality of the system and its applicability in various cases. At the same time, we highlight the technical challenges for this type of query answering and illustrate the implementation approach we have materialized. The demo is intended for both researchers and practitioners and aims at illustrating the benefits of the adoption of this new form of query answering in practical applications and the further study and advancement of its technical solutions.","PeriodicalId":314442,"journal":{"name":"Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127804242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}