2015 31st IEEE International Conference on Data Engineering Workshops最新文献

SINCA: Scalable in-memory event aggregation using clustered operators SINCA:使用集群操作符的可伸缩内存事件聚合

2015 31st IEEE International Conference on Data Engineering Workshops Pub Date : 2015-04-13 DOI: 10.1109/ICDEW.2015.7129578

M. K. Behera, S. Kalyan, Prasanna Venkatesh, A. Wolski

{"title":"SINCA: Scalable in-memory event aggregation using clustered operators","authors":"M. K. Behera, S. Kalyan, Prasanna Venkatesh, A. Wolski","doi":"10.1109/ICDEW.2015.7129578","DOIUrl":"https://doi.org/10.1109/ICDEW.2015.7129578","url":null,"abstract":"Analytical processing of various information created in the operation of social media requires queries involving grouping and aggregating of large volumes of detail data. Any advanced query processing method should take into account two dominating hardware trends: increasing main memory capacities and increasing parallel processing capacity exposed as growing number of cores per processor chip. We introduce a scalable in-memory method for data aggregation (SINCA), using clustered operators, which profits from the hardware trends. The method uses a concept of a microengine being a set of resources that can be utilized in parallel, with great efficiency. The resulting parallelized aggregation algorithm is characterized by a low overhead and high volume, and is suitable to both real-time and extract-transform-load scenarios. The core idea of the method is to use real-time histograms to partition the data for grouping. As the data is already grouped during the partitioning phase, the group aggregation can be done very efficiently. Additionally, some of the grouped data can be cached for re-use in subsequent queries.","PeriodicalId":333151,"journal":{"name":"2015 31st IEEE International Conference on Data Engineering Workshops","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122867961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

POP: A Passenger-Oriented Partners matching system POP:以乘客为导向的伙伴匹配系统

2015 31st IEEE International Conference on Data Engineering Workshops Pub Date : 2015-04-13 DOI: 10.1109/ICDEW.2015.7129560

Xiaoyi Duan, Cheqing Jin, Xiaoling Wang

引用次数: 2

On the rise and fall of Sina Weibo: Analysis based on a fixed user group 论新浪微博的兴衰:基于固定用户群的分析

2015 31st IEEE International Conference on Data Engineering Workshops Pub Date : 2015-04-13 DOI: 10.1109/ICDEW.2015.7129580

Fan Xia, Qunyan Zhang, Chengyu Wang, Weining Qian, Aoying Zhou

{"title":"On the rise and fall of Sina Weibo: Analysis based on a fixed user group","authors":"Fan Xia, Qunyan Zhang, Chengyu Wang, Weining Qian, Aoying Zhou","doi":"10.1109/ICDEW.2015.7129580","DOIUrl":"https://doi.org/10.1109/ICDEW.2015.7129580","url":null,"abstract":"Micro-blogging service Sina Weibo in China has become the country's most free-flowing and important source of news and opinions just a few years ago. Following its launch in the summer of 2009, Sina Weibo grew quickly, attracting hundreds of millions of users and saw its biggest boom around 2011. However, several reports indicate a decrease in activity on Sina Weibo. In our study, we reveal the prosperity and decline of Sina Weibo by analyzing how a fixed user group's collective behaviors change throughout the whole development process. A huge dataset based on Sina Weibo along with search engine data is used in this study. In this paper we model the popularity of single tweet and multiple tweets. Then we define the statistic representing the capability of information propagation of Sina Weibo. The well-known time series prediction model, ARMA, is used to model and predict its trend. In addition, we extract both internal features, i.e. features of Sina Weibo, and external features, i.e. public's attention. Their trends are presented and analyzed. Then detailed experiments are conducted to measure the correlation and causality between them and our proposed statistic. The approaches we present in this paper clearly show the prosperity and decline of this microblogging community.","PeriodicalId":333151,"journal":{"name":"2015 31st IEEE International Conference on Data Engineering Workshops","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130349198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Hotel recommendation based on user preference analysis 基于用户偏好分析的酒店推荐

2015 31st IEEE International Conference on Data Engineering Workshops Pub Date : 2015-04-13 DOI: 10.1109/ICDEW.2015.7129564

Kai Zhang, Keqiang Wang, Xiaoling Wang, Cheqing Jin, Aoying Zhou

引用次数: 41

On crowdsensed data acquisition using multi-dimensional point processes 基于多维点过程的众感数据采集

2015 31st IEEE International Conference on Data Engineering Workshops Pub Date : 2015-04-13 DOI: 10.1109/ICDEW.2015.7129562

Saket K. Sathe, T. Sellis, K. Aberer

引用次数: 1

Large-scale spatial join query processing in Cloud 云中的大规模空间连接查询处理

2015 31st IEEE International Conference on Data Engineering Workshops Pub Date : 2015-04-13 DOI: 10.1109/ICDEW.2015.7129541

Simin You, Jianting Zhang, L. Gruenwald

引用次数: 192

An SVD-based Multimodal Clustering method for Social Event Detection 基于奇异值分解的多模态聚类社会事件检测方法

2015 31st IEEE International Conference on Data Engineering Workshops Pub Date : 2015-04-13 DOI: 10.1109/ICDEW.2015.7129577

Yun Ma, Qing Li, Zhenguo Yang, Zheng Lu, Haiwei Pan, Antoni B. Chan

引用次数: 1

Analyzing online news dissemination via structure learning: An experimental view 基于结构学习的网络新闻传播分析:实验视角

2015 31st IEEE International Conference on Data Engineering Workshops Pub Date : 2015-04-13 DOI: 10.1109/ICDEW.2015.7129572

Ruiqi Li, Yanli Hu, Jiuyang Tang, W. Xiao

引用次数: 0

AIR: Adaptive Index Replacement in Hadoop AIR: Hadoop中的自适应索引替换

2015 31st IEEE International Conference on Data Engineering Workshops Pub Date : 2015-04-13 DOI: 10.1109/ICDEW.2015.7129539

Stefan Schuh, J. Dittrich

{"title":"AIR: Adaptive Index Replacement in Hadoop","authors":"Stefan Schuh, J. Dittrich","doi":"10.1109/ICDEW.2015.7129539","DOIUrl":"https://doi.org/10.1109/ICDEW.2015.7129539","url":null,"abstract":"The Hadoop Distributed Filesystem has become the de-facto standard for storing large datasets in data management systems such as Hadoop MapReduce, Hive, and Stratosphere. Though HDFS was originally designed to support scan-oriented operations, recently several techniques for HDFS have been developed to allow for efficient indexing. One of these indexing techniques is aggressive indexing, i.e. HDFS replicas are immediately indexed at upload time before touching any disk - creating multiple clustered indexes almost for free on the way. A second technique is adaptive indexing, i.e. HDFS blocks are only indexed on demand as a side effect of query processing. Though these techniques provide impressive speed-ups in terms of query processing, they totally ignored the costs involved with storing a large number of replicas of a particular dataset. The HDFS-variants of adaptive indexing were already designed to leverage the natural redundancy that comes with HDFS, typically storing a dataset three times anyway. However, it is questionable whether storing an unlimited number of replicas for a dataset is a practical solution. Therefore, this paper is the first to analyze adaptive indexing under a space constraint, i.e. we assume that indexes are adaptively created and deleted. We coin this problem the Adaptive Index Replacement problem. We present a new algorithm to solve the online AIR problem called LeastExpectedBenefit-K and compare it with several existing state-of-the-art online Index Selection algorithms. We present a comprehensive study evaluating ten different algorithms. Our results show that our algorithm LEB-2 is efficient and robust and a good choice in practice.","PeriodicalId":333151,"journal":{"name":"2015 31st IEEE International Conference on Data Engineering Workshops","volume":"136 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132082021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Integrating query processing with parallel languages 将查询处理与并行语言集成

2015 31st IEEE International Conference on Data Engineering Workshops Pub Date : 2015-04-13 DOI: 10.1109/ICDEW.2015.7129583

Brandon Myers

引用次数: 1