2016 IEEE International Congress on Big Data (BigData Congress)最新文献

Software Evolution Information Driven Service-Oriented Software Clustering 软件演化信息驱动的面向服务的软件集群

2016 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2016-10-06 DOI: 10.1109/BigDataCongress.2016.75

Linhui Zhong, Jing He, Nengwei Zhang, P. Zhang, Jing Xia

引用次数: 2

Asset-centric Security-Aware Service Selection 以资产为中心的安全感知服务选择

2016 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2016-10-06 DOI: 10.1109/BigDataCongress.2016.50

Giannis Tziakouris, Marios Zinonos, Tom Chothia, R. Bahsoon

引用次数: 4

Monetizing the User's Information Asset in Internet Information Market 互联网信息市场中用户信息资产的货币化

2016 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2016-10-06 DOI: 10.1109/BigDataCongress.2016.52

D. Rao, W. Ng

引用次数: 0

Enhanced State History Tree (eSHT): A Stateful Data Structure for Analysis of Highly Parallel System Traces 增强状态历史树(eSHT):一种用于分析高度并行系统轨迹的有状态数据结构

2016 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2016-10-06 DOI: 10.1109/BigDataCongress.2016.19

Loic Prieur-Drevon, R. Beamonte, Naser Ezzati-Jivan, M. Dagenais

{"title":"Enhanced State History Tree (eSHT): A Stateful Data Structure for Analysis of Highly Parallel System Traces","authors":"Loic Prieur-Drevon, R. Beamonte, Naser Ezzati-Jivan, M. Dagenais","doi":"10.1109/BigDataCongress.2016.19","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.19","url":null,"abstract":"Behaviors of distributed systems with many cores and/or many threads are difficult to understand. This is why dynamic analysis tools such as tracers are useful to collect run-time data and help programmers debug and optimize complex programs. However, manual trace analysis on very large traces with billions of events can be a difficult problem which automated trace visualizers and analyzers aim to solve. Trace analysis and visualization software needs fast access to data which it cannot achieve by searching through the entire trace for every query. A number of solutions have adopted stateful analysis which rearranges events into a more query friendly structures after a single pass through the trace. In this paper, we look into current implementations and model the behavior of previous work, the State History Tree (SHT), on traces with many thread creation and deletion. This allows us to identify which properties of the SHT are responsible for inefficient disk usage and high memory consumption. We then propose a more efficient data structure, the enhanced State History Tree (eSHT), to store and query computed states, in order to limit disk usage and reduce the query time for any state. Next, we compare the use of SHT and eSHT on traces with many attributes. We finally verify the scalability of our new data structure according to trace size. As shown by our results, the proposed solution makes near optimal use of disk space, reduces the algorithm's memory usage logarithmically for previously problematic cases, and speeds up queries on traces with many attributes by an order of magnitude. The proposed solution builds upon our previous work, enabling it to easily scale up to traces containing a million threads.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128921106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

GOMA: Supporting Big Data Analytics with a Goal-Oriented Approach GOMA:用面向目标的方法支持大数据分析

2016 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2016-10-06 DOI: 10.1109/BigDataCongress.2016.26

Sam Supakkul, Liping Zhao, L. Chung

{"title":"GOMA: Supporting Big Data Analytics with a Goal-Oriented Approach","authors":"Sam Supakkul, Liping Zhao, L. Chung","doi":"10.1109/BigDataCongress.2016.26","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.26","url":null,"abstract":"The real value of Big Data lies in its hidden insights, but the current focus of the Big Data community is on the technologies for mining insights from massive data, rather than the data itself. The biggest challenge facing industries is not how to identify the right data, but instead, it is how to use insights obtained from Big Data to improve the business. To address this challenge, we propose GOMA, a goal-oriented modeling approach to Big Data analytics. Powered by Big Data insights, GOMA uses a goal-oriented approach to capture business goals, reason about business situations, and guide decision-making processes. GOMA provides a systematic approach for integrating two types of the resulting insight from data analytics to goal-oriented reasoning and decision-making processes: descriptive insights are the ones that describe the current state (e.g., the current customer retention rate) and predictive insights are the ones that predict likely future phenomena by inference from the data (e.g., customers who are likely to defect). To aid in the description and illustration of the GOMA approach, a retail banking churning scenario is used as a running example throughout this paper.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129620308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

Clustering Geo-tagged Tweets for Advanced Big Data Analytics 聚类地理标记推文用于高级大数据分析

2016 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2016-06-27 DOI: 10.1109/BigDataCongress.2016.78

Gloria Bordogna, Luca Frigerio, A. Cuzzocrea, G. Psaila

引用次数: 26

Towards Automatic Memory Tuning for In-Memory Big Data Analytics in Clusters 面向集群内内存大数据分析的自动内存调优

2016 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2016-06-27 DOI: 10.1109/BigDataCongress.2016.56

Aris-Kyriakos Koliopoulos, Paraskevas Yiapanis, F. Tekiner, G. Nenadic, J. Keane

{"title":"Towards Automatic Memory Tuning for In-Memory Big Data Analytics in Clusters","authors":"Aris-Kyriakos Koliopoulos, Paraskevas Yiapanis, F. Tekiner, G. Nenadic, J. Keane","doi":"10.1109/BigDataCongress.2016.56","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.56","url":null,"abstract":"Hadoop provides a scalable solution on traditional cluster-based Big Data platforms but imposes performance overheads due to only supporting on-disk data. Data Analytic algorithms usually require multiple iterations over a dataset and thus, multiple, slow, disk accesses. In contrast, modern clusters possess increasing amounts of main memory that can provide performance benefits by efficiently using main memory caching mechanisms. Apache Spark is an innovative distributed computing framework that supports in-memory computations. Even though this type of computations is very fast, memory is a scarce resource and this can cause bottlenecks to execution or, even worse, lead to failures. Spark offers various choices for memory tuning but this requires in-depth systems-level knowledge and the choices will be different across various workloads and cluster settings. Generally, the optimal choice is achieved by adopting a trial and error approach. This work describes a first step towards an automated selection mechanism for memory optimization that assesses workload and cluster characteristics and selects an appropriate caching scheme. The proposed caching mechanism decreases execution times by up to 25% compared to the default strategy and reduces the risk of main memory exceptions.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126810563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Software Metrics for Green Parallel Computing of Big Data Systems 大数据系统绿色并行计算的软件度量

2016 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2016-06-01 DOI: 10.1109/BigDataCongress.2016.54

H. Gürbüz, B. Tekinerdogan

引用次数: 7

Cloud-Based Core Text Processing Services for Sentiment Analysis 基于云的情感分析核心文本处理服务

2016 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2016-06-01 DOI: 10.1109/BigDataCongress.2016.37

Huan Chen, Xin-Nan Li, Liang-Jie Zhang, Yixuan Huang, Xiao-Sheng Cai

{"title":"Cloud-Based Core Text Processing Services for Sentiment Analysis","authors":"Huan Chen, Xin-Nan Li, Liang-Jie Zhang, Yixuan Huang, Xiao-Sheng Cai","doi":"10.1109/BigDataCongress.2016.37","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.37","url":null,"abstract":"In modern society, Web gradually becomes the portal and window of all kinds of information. People are more likely to express their views on the Internet, mostly would be over the form of text documents. In order to understand users, NLP (Natural Language Processing) methods, such as sentiment analysis, have been gaining popularity. At present, there are some classical methods to solve the text sentiment analysis problem, such as the machine learning method, the classification models NB (Naive Bayes), ME (Maximum Entropy) and SVM (Support Vector Machine). In this paper, we mainly study sentiment analysis for big data scenarios from engineering perspective. This paper proposes core text processing services and discusses the corresponding development details. The contributions are manifolds: Firstly, a new core text processing service Cloud-based Core Text Processing Services (CCTPS) is proposed. Secondly, we propose the use of KNN for regression purposes, resulting in a new algorithm KNNR. Thirdly, this paper formalizes the scenarios of personalized news recommendation and personas portraying in the context of CCTPS. Experimental results of two real-world applications, one for sentiment analysis and the other for personalized news recommendation, to demonstrate the wide practical usability of CCTPS system.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125052385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Model Transformation and Data Migration from Relational Database to MongoDB 关系型数据库到MongoDB的模型转换和数据迁移

2016 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2016-06-01 DOI: 10.1109/BigDataCongress.2016.16

Tianyu Jia, Xiaomeng Zhao, Zheng Wang, Dahan Gong, Guiguang Ding

引用次数: 31