2011 IEEE 27th International Conference on Data Engineering最新文献_第9页

Memory-constrained aggregate computation over data streams 数据流上内存受限的聚合计算

2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767860

K. Naidu, R. Rastogi, Scott Satkin, A. Srinivasan

引用次数: 14

A unified approach for computing top-k pairs in multidimensional space 多维空间中计算top-k对的统一方法

2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767903

M. A. Cheema, Xuemin Lin, Haixun Wang, Jianmin Wang, W. Zhang

{"title":"A unified approach for computing top-k pairs in multidimensional space","authors":"M. A. Cheema, Xuemin Lin, Haixun Wang, Jianmin Wang, W. Zhang","doi":"10.1109/ICDE.2011.5767903","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767903","url":null,"abstract":"Top-k pairs queries have many real applications. k closest pairs queries, k furthest pairs queries and their bichromatic variants are some of the examples of the top-k pairs queries that rank the pairs on distance functions. While these queries have received significant research attention, there does not exist a unified approach that can efficiently answer all these queries. Moreover, there is no existing work that supports top-k pairs queries based on generic scoring functions. In this paper, we present a unified approach that supports a broad class of top-k pairs queries including the queries mentioned above. Our proposed approach allows the users to define a local scoring function for each attribute involved in the query and a global scoring function that computes the final score of each pair by combining its scores on different attributes. We propose efficient internal and external memory algorithms and our theoretical analysis shows that the expected performance of the algorithms is optimal when two or less attributes are involved. Our approach does not require any pre-built indexes, is easy to implement and has low memory requirement. We conduct extensive experiments to demonstrate the efficiency of our proposed approach.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134031739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

On dimensionality reduction of massive graphs for indexing and retrieval 面向索引和检索的海量图的降维研究

2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767834

C. Aggarwal, Haixun Wang

{"title":"On dimensionality reduction of massive graphs for indexing and retrieval","authors":"C. Aggarwal, Haixun Wang","doi":"10.1109/ICDE.2011.5767834","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767834","url":null,"abstract":"In this paper, we will examine the problem of dimensionality reduction of massive disk-resident data sets. Graph mining has become important in recent years because of its numerous applications in community detection, social networking, and web mining. Many graph data sets are defined on massive node domains in which the number of nodes in the underlying domain is very large. As a result, it is often difficult to store and hold the information necessary in order to retrieve and index the data. Most known methods for dimensionality reduction are effective only for data sets defined on modest domains. Furthermore, while the problem of dimensionality reduction is most relevant to the problem of massive data sets, these algorithms are inherently not designed for the case of disk-resident data in terms of the order in which the data is accessed on disk. This is a serious limitation which restricts the applicability of current dimensionality reduction methods. Furthermore, since dimensionality reduction methods are typically designed for database applications such as indexing, it is important to design the underlying data reduction method, so that it can be effectively used for such applications. In this paper, we will examine the difficult problem of dimensionality reduction of graph data in the difficult case in which the underlying number of nodes are very large and the data set is disk-resident. We will propose an effective sampling algorithm for dimensionality reduction and show how to perform the dimensionality reduction in a limited number of passes on disk. We will also design the technique to be highly interpretable and friendly for indexing applications. We will illustrate the effectiveness and efficiency of the approach on a number of real data sets.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129278334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Top-k keyword search over probabilistic XML data 对概率XML数据进行Top-k关键字搜索

2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767875

Jianxin Li, Chengfei Liu, Rui Zhou, Wei Wang

引用次数: 74

CT-index: Fingerprint-based graph indexing combining cycles and trees ct索引:结合循环和树的基于指纹的图形索引

2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767909

K. Klein, Nils M. Kriege, Petra Mutzel

引用次数: 47

HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots HyPer:基于虚拟内存快照的混合OLTP&OLAP主内存数据库系统

2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767867

A. Kemper, Thomas Neumann

{"title":"HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots","authors":"A. Kemper, Thomas Neumann","doi":"10.1109/ICDE.2011.5767867","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767867","url":null,"abstract":"The two areas of online transaction processing (OLTP) and online analytical processing (OLAP) present different challenges for database architectures. Currently, customers with high rates of mission-critical transactions have split their data into two separate systems, one database for OLTP and one so-called data warehouse for OLAP. While allowing for decent transaction rates, this separation has many disadvantages including data freshness issues due to the delay caused by only periodically initiating the Extract Transform Load-data staging and excessive resource consumption due to maintaining two separate information systems. We present an efficient hybrid system, called HyPer, that can handle both OLTP and OLAP simultaneously by using hardware-assisted replication mechanisms to maintain consistent snapshots of the transactional data. HyPer is a main-memory database system that guarantees the ACID properties of OLTP transactions and executes OLAP query sessions (multiple queries) on the same, arbitrarily current and consistent snapshot. The utilization of the processor-inherent support for virtual memory management (address translation, caching, copy on update) yields both at the same time: unprecedentedly high transaction rates as high as 100000 per second and very fast OLAP query response times on a single system executing both workloads in parallel. The performance analysis is based on a combined TPC-C and TPC-H benchmark.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"148 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121793806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 651

Flexible use of cloud resources through profit maximization and price discrimination 通过利润最大化和价格歧视灵活使用云资源

2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767932

Konstantinos Tsakalozos, H. Kllapi, Evangelia A. Sitaridi, M. Roussopoulos, Dimitris Paparas, A. Delis

{"title":"Flexible use of cloud resources through profit maximization and price discrimination","authors":"Konstantinos Tsakalozos, H. Kllapi, Evangelia A. Sitaridi, M. Roussopoulos, Dimitris Paparas, A. Delis","doi":"10.1109/ICDE.2011.5767932","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767932","url":null,"abstract":"Modern frameworks, such as Hadoop, combined with abundance of computing resources from the cloud, offer a significant opportunity to address long standing challenges in distributed processing. Infrastructure-as-a-Service clouds reduce the investment cost of renting a large data center while distributed processing frameworks are capable of efficiently harvesting the rented physical resources. Yet, the performance users get out of these resources varies greatly because the cloud hardware is shared by all users. The value for money cloud consumers achieve renders resource sharing policies a key player in both cloud performance and user satisfaction. In this paper, we employ microeconomics to direct the allotment of cloud resources for consumption in highly scalable master-worker virtual infrastructures. Our approach is developed on two premises: the cloud-consumer always has a budget and cloud physical resources are limited. Using our approach, the cloud administration is able to maximize per-user financial profit. We show that there is an equilibrium point at which our method achieves resource sharing proportional to each user's budget. Ultimately, this approach allows us to answer the question of how many resources a consumer should request from the seemingly endless pool provided by the cloud.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128116633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 117

T-verifier: Verifying truthfulness of fact statements t -验证者:验证事实陈述的真实性

2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767859

Xian Li, W. Meng, Clement T. Yu

引用次数: 58

SmartTrace: Finding similar trajectories in smartphone networks without disclosing the traces SmartTrace:在智能手机网络中找到类似的轨迹，而不泄露痕迹

2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767934

Constantinos Costa, C. Laoudias, D. Zeinalipour-Yazti, D. Gunopulos

{"title":"SmartTrace: Finding similar trajectories in smartphone networks without disclosing the traces","authors":"Constantinos Costa, C. Laoudias, D. Zeinalipour-Yazti, D. Gunopulos","doi":"10.1109/ICDE.2011.5767934","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767934","url":null,"abstract":"In this demonstration paper, we present a powerful distributed framework for finding similar trajectories in a smartphone network, without disclosing the traces of participating users. Our framework, exploits opportunistic and participatory sensing in order to quickly answer queries of the form: “Report objects (i.e., trajectories) that follow a similar spatio-temporal motion to Q, where Q is some query trajectory.” SmartTrace, relies on an in-situ data storage model, where geo-location data is recorded locally on smartphones for both performance and privacy reasons. SmartTrace then deploys an efficient top-K query processing algorithm that exploits distributed trajectory similarity measures, resilient to spatial and temporal noise, in order to derive the most relevant answers to Q quickly and efficiently. Our demonstration shows how the SmartTrace algorithmics are ported on a network of Android-based smartphone devices with impressive query response times. To demonstrate the capabilities of SmartTrace during the conference, we will allow the attendees to query local smartphone networks in the following two modes: i) Interactive Mode, where devices will be handed out to participants aiming to identify who is moving similar to the querying node; and ii) Trace-driven Mode, where a large-scale deployment can be launched in order to show how the K most similar trajectories can be identified quickly and efficiently. The conference attendees will be able to appreciate how interesting spatio-temporal search applications can be implemented efficiently (for performance reasons) and without disclosing the complete user traces to the query processor (for privacy reasons)1. For instance, an attendee might be able to determine other attendees that have participated in common sessions, in order to initiate new discussions and collaborations, without knowing their trajectory or revealing his/her own trajectory either.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116607003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 39

Answering approximate string queries on large data sets using external memory 使用外部内存回答大型数据集上的近似字符串查询

2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767856

Alexander Behm, Chen Li, M. Carey

引用次数: 30