2014 IEEE 30th International Conference on Data Engineering最新文献_第9页

How to partition a billion-node graph 如何划分十亿节点图

2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-03-01 DOI: 10.1109/ICDE.2014.6816682

Lu Wang, Yanghua Xiao, Bin Shao, Haixun Wang

{"title":"How to partition a billion-node graph","authors":"Lu Wang, Yanghua Xiao, Bin Shao, Haixun Wang","doi":"10.1109/ICDE.2014.6816682","DOIUrl":"https://doi.org/10.1109/ICDE.2014.6816682","url":null,"abstract":"Billion-node graphs pose significant challenges at all levels from storage infrastructures to programming models. It is critical to develop a general purpose platform for graph processing. A distributed memory system is considered a feasible platform supporting online query processing as well as offline graph analytics. In this paper, we study the problem of partitioning a billion-node graph on such a platform, an important consideration because it has direct impact on load balancing and communication overhead. It is challenging not just because the graph is large, but because we can no longer assume that the data can be organized in arbitrary ways to maximize the performance of the partitioning algorithm. Instead, the algorithm must adopt the same data and programming model adopted by the system and other applications. In this paper, we propose a multi-level label propagation (MLP) method for graph partitioning. Experimental results show that our solution can partition billion-node graphs within several hours on a distributed memory system consisting of merely several machines, and the quality of the partitions produced by our approach is comparable to state-of-the-art approaches applied on toy-size graphs.","PeriodicalId":159130,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering","volume":"305 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116588556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 115

OCTOPUS: Efficient query execution on dynamic mesh datasets OCTOPUS:对动态网格数据集的高效查询执行

2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-03-01 DOI: 10.1109/ICDE.2014.6816718

F. Tauheed, T. Heinis, F. Schürmann, H. Markram, A. Ailamaki

{"title":"OCTOPUS: Efficient query execution on dynamic mesh datasets","authors":"F. Tauheed, T. Heinis, F. Schürmann, H. Markram, A. Ailamaki","doi":"10.1109/ICDE.2014.6816718","DOIUrl":"https://doi.org/10.1109/ICDE.2014.6816718","url":null,"abstract":"Scientists in many disciplines use spatial mesh models to study physical phenomena. Simulating natural phenomena by changing meshes over time helps to better understand the phenomena. The higher the precision of the mesh models, the more insight do the scientists gain and they thus continuously increase the detail of the meshes and build them as detailed as their instruments and the simulation hardware allow. In the process, the data volume also increases, slowing down the execution of spatial range queries needed to monitor the simulation considerably. Indexing speeds up range query execution, but the overhead to maintain the indexes is considerable because almost the entire mesh changes unpredictably at every simulation step. Using a simple linear scan, on the other hand, requires accessing the entire mesh and the performance deteriorates as the size of the dataset grows. In this paper we propose OCTOPUS, a strategy for executing range queries on mesh datasets that change unpredictably during simulations. In OCTOPUS we use the key insight that the mesh surface along with the mesh connectivity is sufficient to retrieve accurate query results efficiently. With this novel query execution strategy, OCTOPUS minimizes index maintenance cost and reduces query execution time considerably. Our experiments show that OCTOPUS achieves a speedup between 7.3 and 9.2× compared to the state of the art and that it scales better with increasing mesh dataset size and detail.","PeriodicalId":159130,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124932995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

AQUAS: A quality-aware scheduler for NoSQL data stores AQUAS:用于NoSQL数据存储的质量感知调度器

2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-03-01 DOI: 10.1109/ICDE.2014.6816743

Chen Xu, Fan Xia, M. Sharaf, Minqi Zhou, Aoying Zhou

引用次数: 3

PHiDJ: Parallel similarity self-join for high-dimensional vector data with MapReduce PHiDJ:使用MapReduce实现高维矢量数据的并行相似性自连接

2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-03-01 DOI: 10.1109/ICDE.2014.6816701

Sergej Fries, Brigitte Boden, Grzegorz Stepien, T. Seidl

{"title":"PHiDJ: Parallel similarity self-join for high-dimensional vector data with MapReduce","authors":"Sergej Fries, Brigitte Boden, Grzegorz Stepien, T. Seidl","doi":"10.1109/ICDE.2014.6816701","DOIUrl":"https://doi.org/10.1109/ICDE.2014.6816701","url":null,"abstract":"Join processing on large-scale vector data is an important problem in many applications, as vectors are a common representation for various data types. Especially, several data analysis tasks like near duplicate detection, density-based clustering or data cleaning are based on similarity self-joins, which are a special type of join. For huge data sets, MapReduce proved to be a suitable, error-tolerant framework for parallel join algorithms. Recent approaches exploit the vector-space properties for low-dimensional vector data for an efficient join computation. However, so far no parallel similarity self-join approaches aiming at high-dimensional vector data were proposed. In this work we propose the novel similarity self-join algorithm PHiDJ (Parallel High-Dimensional Join) for the MapReduce framework. PHiDJ is well suited for medium to high-dimensional data and exploits multiple filter techniques for reducing communication and computational costs. We provide a solution for efficient join computation for skewed distributed data. Our experimental evaluation on medium- to high-dimensional data shows that our approach outperforms existing techniques.","PeriodicalId":159130,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121468376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 33

Waste not… Efficient co-processing of relational data 关系数据的高效协同处理

2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-03-01 DOI: 10.1109/ICDE.2014.6816677

H. Pirk, S. Manegold, M. Kersten

引用次数: 41

Efficient and accurate query evaluation on uncertain graphs via recursive stratified sampling 基于递归分层抽样的不确定图查询评估

2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-03-01 DOI: 10.1109/ICDE.2014.6816709

Ronghua Li, J. Yu, Rui Mao, Tan Jin

{"title":"Efficient and accurate query evaluation on uncertain graphs via recursive stratified sampling","authors":"Ronghua Li, J. Yu, Rui Mao, Tan Jin","doi":"10.1109/ICDE.2014.6816709","DOIUrl":"https://doi.org/10.1109/ICDE.2014.6816709","url":null,"abstract":"In this paper, we introduce two types of query evaluation problems on uncertain graphs: expectation query evaluation and threshold query evaluation. Since these two problems are #P-complete, most previous solutions for these problems are based on naive Monte-Carlo (NMC) sampling. However, NMC typically leads to a large variance, which significantly reduces its effectiveness. To overcome this problem, we propose two classes of estimators, called class-I and class-II estimators, based on the idea of stratified sampling. More specifically, we first propose two classes of basic stratified sampling estimators, named BSS-I and BSS-II, which partition the entire population into 2r and r+1 strata by picking r edges respectively. Second, to reduce the variance, we find that both BSS-I and BSS-II can be recursively performed in each stratum. Therefore, we propose two classes of recursive stratified sampling estimators called RSS-I and RSS-II respectively. Third, for a particular kind of problem, we propose two cut-set based stratified sampling estimators, named BCSS and RCSS, to further improve the accuracy of the class-I and class-II estimators. For all the proposed estimators, we prove that they are unbiased and their variances are significantly smaller than that of NMC. Moreover, the time complexity of all the proposed estimators are the same as the time complexity of NMC under a mild assumption. In addition, we also apply the proposed estimators to influence function evaluation and expected-reliable distance query problem, which are two instances of the query evaluation problems on uncertain graphs. Finally, we conduct extensive experiments to evaluate our estimators, and the results demonstrate the efficiency, accuracy, and scalability of the proposed estimators.","PeriodicalId":159130,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115510894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 22

dbTouch in action database kernels for touch-based data exploration dbTouch操作数据库内核，用于基于触摸的数据探索

2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-03-01 DOI: 10.1109/ICDE.2014.6816756

Erietta Liarou, Stratos Idreos

{"title":"dbTouch in action database kernels for touch-based data exploration","authors":"Erietta Liarou, Stratos Idreos","doi":"10.1109/ICDE.2014.6816756","DOIUrl":"https://doi.org/10.1109/ICDE.2014.6816756","url":null,"abstract":"A fundamental need in the era of data deluge is data exploration through interactive tools, i.e., being able to quickly determine data and patterns of interest. dbTouch is a new research direction towards a next generation of data management systems that inherently support data exploration by allowing touch-based interaction. Data is represented in a visual format, while users can touch those shapes and interact/query with gestures. In a dbTouch system, the whole database kernel is geared towards quick responses in touch input; the user drives query processing (not just query construction) via touch gestures, dictating how fast or slow data flows through query plans and which data parts are processed at any time. dbTouch translates the gestures into interactive database operators, reacting continuously to the touch input and analytics tasks given by the user in real-time such as sliding a finger over a column to scan it progressively; zoom in with two fingers over a column to progressively get sample data; rotate a table to change the physical design from row-store to column-store, etc. This demo presents the first dbTouch prototype over iOS for iPad.","PeriodicalId":159130,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126583012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 21

Top-K interesting subgraph discovery in information networks 信息网络Top-K有趣子图发现

2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-03-01 DOI: 10.1109/ICDE.2014.6816703

Manish Gupta, Jing Gao, Xifeng Yan, H. Çam, Jiawei Han

{"title":"Top-K interesting subgraph discovery in information networks","authors":"Manish Gupta, Jing Gao, Xifeng Yan, H. Çam, Jiawei Han","doi":"10.1109/ICDE.2014.6816703","DOIUrl":"https://doi.org/10.1109/ICDE.2014.6816703","url":null,"abstract":"In the real world, various systems can be modeled using heterogeneous networks which consist of entities of different types. Many problems on such networks can be mapped to an underlying critical problem of discovering top-K subgraphs of entities with rare and surprising associations. Answering such subgraph queries efficiently involves two main challenges: (1) computing all matching subgraphs which satisfy the query and (2) ranking such results based on the rarity and the interestingness of the associations among entities in the subgraphs. Previous work on the matching problem can be harnessed for a naïve ranking-after-matching solution. However, for large graphs, subgraph queries may have enormous number of matches, and so it is inefficient to compute all matches when only the top-K matches are desired. In this paper, we address the two challenges of matching and ranking in top-K subgraph discovery as follows. First, we introduce two index structures for the network: topology index, and graph maximum metapath weight index, which are both computed offline. Second, we propose novel top-K mechanisms to exploit these indexes for answering interesting subgraph queries online efficiently. Experimental results on several synthetic datasets and the DBLP and Wikipedia datasets containing thousands of entities show the efficiency and the effectiveness of the proposed approach in computing interesting subgraphs.","PeriodicalId":159130,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123064666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 64

Guaranteed authenticity and integrity of data from untrusted servers 保证来自不可信服务器的数据的真实性和完整性

2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-03-01 DOI: 10.1109/ICDE.2014.6816761

R. Jain, Sunil Prabhakar

{"title":"Guaranteed authenticity and integrity of data from untrusted servers","authors":"R. Jain, Sunil Prabhakar","doi":"10.1109/ICDE.2014.6816761","DOIUrl":"https://doi.org/10.1109/ICDE.2014.6816761","url":null,"abstract":"Data are often stored at untrusted database servers. The lack of trust arises naturally when the database server is owned by a third party, as in the case of cloud computing. It also arises if the server may have been compromised, or there is a malicious insider. Ensuring the trustworthiness of data retrieved from such untrusted database is of utmost importance. Trustworthiness of data is defined by faithful execution of valid and authorized transactions on the initial data. Earlier work on this problem is limited to cases where data are either not updated, or data are updated by a single trustworthy entity. However, for a truly dynamic database, multiple clients should be allowed to update data without having to route the updates through a central server. In this demonstration, we present a system to establish authenticity and integrity of data in a dynamic database where the clients can run transactions directly on the database server. Our system provides provable authenticity and integrity of data with absolutely no requirement for the server to be trustworthy. Our system also provides assured provenance of data. This demonstration is built using the solutions proposed in our previous work[5]. Our system is built on top of Oracle with no modifications to the database internals. We show that the system can be easily adopted in existing databases without any internal changes to the database. We also demonstrate how our system can provide authentic provenance.","PeriodicalId":159130,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125554177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

PAQO: Preference-aware query optimization for decentralized database systems PAQO:分布式数据库系统的偏好感知查询优化

2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-03-01 DOI: 10.1109/ICDE.2014.6816670

Nicholas L. Farnan, Adam J. Lee, Panos K. Chrysanthis, Ting Yu

{"title":"PAQO: Preference-aware query optimization for decentralized database systems","authors":"Nicholas L. Farnan, Adam J. Lee, Panos K. Chrysanthis, Ting Yu","doi":"10.1109/ICDE.2014.6816670","DOIUrl":"https://doi.org/10.1109/ICDE.2014.6816670","url":null,"abstract":"The declarative nature of SQL has traditionally been a major strength. Users simply state what information they are interested in, and the database management system determines the best plan for retrieving it. A consequence of this model is that should a user ever want to specify some aspect of how their queries are evaluated (e.g., a preference to read data from a specific replica, or a requirement for all joins to be performed by a single server), they are unable to. This can leave database administrators shoehorning evaluation preferences into database cost models. Further, for distributed database users, it can result in query evaluation plans that violate data handling best practices or the privacy of the user. To address such issues, we have developed a framework for declarative, user-specified constraints on the query optimization process and implemented it within PosgreSQL. Our Preference-Aware Query Optimizer (PAQO) upholds both strict requirements and partially ordered preferences that are issued alongside of the queries that it processes. In this paper, we present the design of PAQO and thoroughly evaluate its performance.","PeriodicalId":159130,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering","volume":"556 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123144810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 28