Linhui Zhong, Jing He, Nengwei Zhang, P. Zhang, Jing Xia
{"title":"Software Evolution Information Driven Service-Oriented Software Clustering","authors":"Linhui Zhong, Jing He, Nengwei Zhang, P. Zhang, Jing Xia","doi":"10.1109/BigDataCongress.2016.75","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.75","url":null,"abstract":"Service-oriented software in business is often programmed using Java language. For purpose of making software evolvable and maintainable, the technology of software clustering is often used to make the software modularized. However, traditional software clustering methods have not considered the potential relation between software elements, which cannot be identified by using the static analysis method, so it can make the software not satisfy the principle of \"high cohesion, low coupling\" in the area of software engineering. For solving the problem, this paper proposes a method by introducing the software evolution information into the software clustering process, based on that we construct an extended software dependency model and use Agglomerative Hierarchical Clustering (AHC) algorithm to cluster software. Experiments on two open source project show that this method can improve the accuracy of software clustering and aid the maintainer refactoring business software.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123681238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Giannis Tziakouris, Marios Zinonos, Tom Chothia, R. Bahsoon
{"title":"Asset-centric Security-Aware Service Selection","authors":"Giannis Tziakouris, Marios Zinonos, Tom Chothia, R. Bahsoon","doi":"10.1109/BigDataCongress.2016.50","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.50","url":null,"abstract":"Catering for the runtime security of users and their assets (e.g. files, accounts, etc.) in service oriented environments is a challenging problem. We motivate the need for an adaptive framework that selects online services according to the runtime security requirements and cost constraints of assets. We report on a market-inspired approach (i.e. reversed Posted-Offer auction) that satisfies multiple, heterogeneous requests for online services in environments with shared and scarce resources. The solution is tested on the specific area of Cloud storage services.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124658819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Monetizing the User's Information Asset in Internet Information Market","authors":"D. Rao, W. Ng","doi":"10.1109/BigDataCongress.2016.52","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.52","url":null,"abstract":"With the massive amount of data being collected and collated every minute today, big data is what keeps the wheels of organizations running. This Big Data is generally collected by 'data brokers' who then sell this to whomsoever is willing to pay the right price. Due to the lack of any specific mechanism to ascertain the value of this information, the contributors and owners of this big data, i.e. the users from whom this information is collected from, do not realize how valuable it is and hence go largely uncompensated. In this paper, we put forth our idea of monetizing information by treating it like a tradeable asset utilizing the concept of portfolio optimization to create our information pricing model that offers a fair bargain to both the consumers and buyers of information.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130088100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"GOMA: Supporting Big Data Analytics with a Goal-Oriented Approach","authors":"Sam Supakkul, Liping Zhao, L. Chung","doi":"10.1109/BigDataCongress.2016.26","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.26","url":null,"abstract":"The real value of Big Data lies in its hidden insights, but the current focus of the Big Data community is on the technologies for mining insights from massive data, rather than the data itself. The biggest challenge facing industries is not how to identify the right data, but instead, it is how to use insights obtained from Big Data to improve the business. To address this challenge, we propose GOMA, a goal-oriented modeling approach to Big Data analytics. Powered by Big Data insights, GOMA uses a goal-oriented approach to capture business goals, reason about business situations, and guide decision-making processes. GOMA provides a systematic approach for integrating two types of the resulting insight from data analytics to goal-oriented reasoning and decision-making processes: descriptive insights are the ones that describe the current state (e.g., the current customer retention rate) and predictive insights are the ones that predict likely future phenomena by inference from the data (e.g., customers who are likely to defect). To aid in the description and illustration of the GOMA approach, a retail banking churning scenario is used as a running example throughout this paper.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129620308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Loic Prieur-Drevon, R. Beamonte, Naser Ezzati-Jivan, M. Dagenais
{"title":"Enhanced State History Tree (eSHT): A Stateful Data Structure for Analysis of Highly Parallel System Traces","authors":"Loic Prieur-Drevon, R. Beamonte, Naser Ezzati-Jivan, M. Dagenais","doi":"10.1109/BigDataCongress.2016.19","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.19","url":null,"abstract":"Behaviors of distributed systems with many cores and/or many threads are difficult to understand. This is why dynamic analysis tools such as tracers are useful to collect run-time data and help programmers debug and optimize complex programs. However, manual trace analysis on very large traces with billions of events can be a difficult problem which automated trace visualizers and analyzers aim to solve. Trace analysis and visualization software needs fast access to data which it cannot achieve by searching through the entire trace for every query. A number of solutions have adopted stateful analysis which rearranges events into a more query friendly structures after a single pass through the trace. In this paper, we look into current implementations and model the behavior of previous work, the State History Tree (SHT), on traces with many thread creation and deletion. This allows us to identify which properties of the SHT are responsible for inefficient disk usage and high memory consumption. We then propose a more efficient data structure, the enhanced State History Tree (eSHT), to store and query computed states, in order to limit disk usage and reduce the query time for any state. Next, we compare the use of SHT and eSHT on traces with many attributes. We finally verify the scalability of our new data structure according to trace size. As shown by our results, the proposed solution makes near optimal use of disk space, reduces the algorithm's memory usage logarithmically for previously problematic cases, and speeds up queries on traces with many attributes by an order of magnitude. The proposed solution builds upon our previous work, enabling it to easily scale up to traces containing a million threads.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128921106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gloria Bordogna, Luca Frigerio, A. Cuzzocrea, G. Psaila
{"title":"Clustering Geo-tagged Tweets for Advanced Big Data Analytics","authors":"Gloria Bordogna, Luca Frigerio, A. Cuzzocrea, G. Psaila","doi":"10.1109/BigDataCongress.2016.78","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.78","url":null,"abstract":"In this paper, we introduce an original approach that exploits time stamped geo-tagged messages posted by Twitter users through their smartphones when they travel to trace their trips.An original clustering technique is presented, that groups similartrips to define tours and analyze the popular tours in relation with local geo-located territorial resources. This objective is veryrelevant for emerging big data analytics tools.Tools developed to reconstruct and mine the most popular tours of tourists within a region are described which identify, track and group tourists' trips through a knowledge-based approach exploiting time stamped geo-tagged information associated with Twitter messages sent by tourists while traveling.The collected tracks are managed and shared on the Web in compliance with OGC standards so as to be able to analyze the characteristic of localities visited by the tourists by spatial overlaying with other open data, such as maps of Points Of Interest (POIs) of distinct type. The result is an novel Interoperable framework, based on web-service technology.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115592933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aris-Kyriakos Koliopoulos, Paraskevas Yiapanis, F. Tekiner, G. Nenadic, J. Keane
{"title":"Towards Automatic Memory Tuning for In-Memory Big Data Analytics in Clusters","authors":"Aris-Kyriakos Koliopoulos, Paraskevas Yiapanis, F. Tekiner, G. Nenadic, J. Keane","doi":"10.1109/BigDataCongress.2016.56","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.56","url":null,"abstract":"Hadoop provides a scalable solution on traditional cluster-based Big Data platforms but imposes performance overheads due to only supporting on-disk data. Data Analytic algorithms usually require multiple iterations over a dataset and thus, multiple, slow, disk accesses. In contrast, modern clusters possess increasing amounts of main memory that can provide performance benefits by efficiently using main memory caching mechanisms. Apache Spark is an innovative distributed computing framework that supports in-memory computations. Even though this type of computations is very fast, memory is a scarce resource and this can cause bottlenecks to execution or, even worse, lead to failures. Spark offers various choices for memory tuning but this requires in-depth systems-level knowledge and the choices will be different across various workloads and cluster settings. Generally, the optimal choice is achieved by adopting a trial and error approach. This work describes a first step towards an automated selection mechanism for memory optimization that assesses workload and cluster characteristics and selects an appropriate caching scheme. The proposed caching mechanism decreases execution times by up to 25% compared to the default strategy and reduces the risk of main memory exceptions.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126810563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Software Metrics for Green Parallel Computing of Big Data Systems","authors":"H. Gürbüz, B. Tekinerdogan","doi":"10.1109/BigDataCongress.2016.54","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.54","url":null,"abstract":"Big Data is typically organized around a distributed file system on top of which the parallel algorithms can be executed for realizing the Big Data analytics. In general, the parallel algorithms can be mapped in different alternative ways to the computing platform. Hereby each alternative will perform differently with respect to the environmentally relevant parameters such as energy and power consumption. Existing studies on deployment of parallel computing algorithms have mainly focused on addressing general computing metrics such as speedup with respect to serial computing and efficiency of the use of the computing nodes. In this paper, we report on the elicitation of green metrics for big data systems that are required when analyzing deployment alternatives. To this end we use the existing systematic literature reviews and identify, and discuss the important green computing metrics for big data systems.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127500676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Huan Chen, Xin-Nan Li, Liang-Jie Zhang, Yixuan Huang, Xiao-Sheng Cai
{"title":"Cloud-Based Core Text Processing Services for Sentiment Analysis","authors":"Huan Chen, Xin-Nan Li, Liang-Jie Zhang, Yixuan Huang, Xiao-Sheng Cai","doi":"10.1109/BigDataCongress.2016.37","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.37","url":null,"abstract":"In modern society, Web gradually becomes the portal and window of all kinds of information. People are more likely to express their views on the Internet, mostly would be over the form of text documents. In order to understand users, NLP (Natural Language Processing) methods, such as sentiment analysis, have been gaining popularity. At present, there are some classical methods to solve the text sentiment analysis problem, such as the machine learning method, the classification models NB (Naive Bayes), ME (Maximum Entropy) and SVM (Support Vector Machine). In this paper, we mainly study sentiment analysis for big data scenarios from engineering perspective. This paper proposes core text processing services and discusses the corresponding development details. The contributions are manifolds: Firstly, a new core text processing service Cloud-based Core Text Processing Services (CCTPS) is proposed. Secondly, we propose the use of KNN for regression purposes, resulting in a new algorithm KNNR. Thirdly, this paper formalizes the scenarios of personalized news recommendation and personas portraying in the context of CCTPS. Experimental results of two real-world applications, one for sentiment analysis and the other for personalized news recommendation, to demonstrate the wide practical usability of CCTPS system.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125052385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Model Transformation and Data Migration from Relational Database to MongoDB","authors":"Tianyu Jia, Xiaomeng Zhao, Zheng Wang, Dahan Gong, Guiguang Ding","doi":"10.1109/BigDataCongress.2016.16","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.16","url":null,"abstract":"With the data growing dramatically and the structure of data becoming increasingly flexible, MongoDB has replaced the relational database in many applications. However, not much work has been done to effectively transform the schema and migrate the data from relational database to MongoDB. In this paper, we propose an approach of model transformation and data migration from relational database to MongoDB. Our work i) considers the query characteristics and data characteristics of the relational database, ii) designs a model transformation algorithm based on description tags and action tags, iii) automatically migrates the data into MongoDB based on the result of model transformation, and iv) develops an useful tool. We have designed experiments to prove that using our approach can achieve a better read performance.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123958671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}