2019 IEEE International Congress on Big Data (BigDataCongress)最新文献

HyperSpark: A Data-Intensive Programming Environment for Parallel Metaheuristics HyperSpark:用于并行元启发式的数据密集型编程环境

2019 IEEE International Congress on Big Data (BigDataCongress) Pub Date : 2019-07-08 DOI: 10.1109/BigDataCongress.2019.00024

M. Ciavotta, S. Krstic, D. Tamburri, W. Heuvel

{"title":"HyperSpark: A Data-Intensive Programming Environment for Parallel Metaheuristics","authors":"M. Ciavotta, S. Krstic, D. Tamburri, W. Heuvel","doi":"10.1109/BigDataCongress.2019.00024","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2019.00024","url":null,"abstract":"Metaheuristics are search procedures used to solve complex, often intractable problems for which other approaches are unsuitable or unable to provide solutions in reasonable times. Although computing power has grown exponentially with the onset of Cloud Computing and Big Data platforms, the domain of metaheuristics has not yet taken full advantage of this new potential. In this paper, we address this gap by proposing HyperSpark, an optimization framework for the scalable execution of user-defined, computationally-intensive heuristics. We designed HyperSpark as a flexible tool meant to harness the benefits (e.g., scalability by design) and features (e.g., a simple programming model or ad-hoc infrastructure tuning) of state-of-the-art big data technology for the benefit of optimization methods. We elaborate on HyperSpark and assess its validity and generality on a library implementing several metaheuristics for the Permutation Flow-Shop Problem (PFSP). We observe that HyperSpark results are comparable with the best tools and solutions from the literature. We conclude that our proof-of-concept shows great potential for further research and practical use.","PeriodicalId":335850,"journal":{"name":"2019 IEEE International Congress on Big Data (BigDataCongress)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124868500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Big Data Analytics and Predictive Modeling Approaches for the Energy Sector 能源领域的大数据分析和预测建模方法

2019 IEEE International Congress on Big Data (BigDataCongress) Pub Date : 2019-07-08 DOI: 10.1109/BigDataCongress.2019.00020

Roberto Corizzo, Michelangelo Ceci, D. Malerba

引用次数: 3

A New Unsupervised Predictive-Model Self-Assessment Approach That SCALEs 一种新的无监督预测模型自评方法

2019 IEEE International Congress on Big Data (BigDataCongress) Pub Date : 2019-07-08 DOI: 10.1109/BigDataCongress.2019.00033

F. Ventura, Stefano Proto, D. Apiletti, T. Cerquitelli, S. Panicucci, Elena Baralis, E. Macii, A. Macii

引用次数: 8

Dynamic Resource Shaping for Compute Clusters 计算集群的动态资源整形

2019 IEEE International Congress on Big Data (BigDataCongress) Pub Date : 2019-07-08 DOI: 10.1109/BigDataCongress.2019.00019

Francesco Pace, D. Milios, D. Carra, P. Michiardi

引用次数: 1

Context-Aware Enforcement of Privacy Policies in Edge Computing 边缘计算中上下文感知隐私策略的实施

2019 IEEE International Congress on Big Data (BigDataCongress) Pub Date : 2019-07-08 DOI: 10.1109/BigDataCongress.2019.00014

Clemens Lachner, T. Rausch, S. Dustdar

引用次数: 4

PREMISES, a Scalable Data-Driven Service to Predict Alarms in Slowly-Degrading Multi-Cycle Industrial Processes PREMISES，一种可扩展的数据驱动服务，用于预测缓慢退化的多周期工业过程中的警报

2019 IEEE International Congress on Big Data (BigDataCongress) Pub Date : 2019-07-08 DOI: 10.1109/BigDataCongress.2019.00032

Stefano Proto, F. Ventura, D. Apiletti, T. Cerquitelli, Elena Baralis, E. Macii, A. Macii

{"title":"PREMISES, a Scalable Data-Driven Service to Predict Alarms in Slowly-Degrading Multi-Cycle Industrial Processes","authors":"Stefano Proto, F. Ventura, D. Apiletti, T. Cerquitelli, Elena Baralis, E. Macii, A. Macii","doi":"10.1109/BigDataCongress.2019.00032","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2019.00032","url":null,"abstract":"In recent years, the number of industry-4.0-enabled manufacturing sites has been continuously growing, and both the quantity and variety of signals and data collected in plants are increasing at an unprecedented rate. At the same time, the demand of Big Data processing platforms and analytical tools tailored to manufacturing environments has become more and more prominent. Manufacturing companies are collecting huge amounts of information during the production process through a plethora of sensors and networks. To extract value and actionable knowledge from such precious repositories, suitable data-driven approaches are required. They are expected to improve the production processes by reducing maintenance costs, reliably predicting equipment failures, and avoiding quality degradation. To this aim, Machine Learning techniques tailored for predictive maintenance analysis have been adopted in PREMISES (PREdictive Maintenance service for Industrial procesSES), an innovative framework providing a scalable Big Data service able to predict alarming conditions in slowly-degrading processes characterized by cyclic procedures. PREMISES has been experimentally tested and validated on a real industrial use case, resulting efficient and effective in predicting alarms. The framework has been designed to address the main Big Data and industrial requirements, by being developed on a solid and scalable processing framework, Apache Spark, and supporting the deployment on modularized containers, specifically upon the Docker technology stack.","PeriodicalId":335850,"journal":{"name":"2019 IEEE International Congress on Big Data (BigDataCongress)","volume":"9 24","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114085606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Mobility Prediction with Missing Locations Based on Modified Markov Model for Wireless Users 基于改进马尔可夫模型的无线用户缺失位置移动预测

2019 IEEE International Congress on Big Data (BigDataCongress) Pub Date : 2019-07-08 DOI: 10.1109/BigDataCongress.2019.00031

Junyao Guo, Lu Liu, Sihai Zhang, Jinkang Zhu

{"title":"Mobility Prediction with Missing Locations Based on Modified Markov Model for Wireless Users","authors":"Junyao Guo, Lu Liu, Sihai Zhang, Jinkang Zhu","doi":"10.1109/BigDataCongress.2019.00031","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2019.00031","url":null,"abstract":"Mobility prediction is an interesting topic attracting many researchers and both prediction theory and models are explored in the existing literature. The entropy metric to evaluate the mobility predictability of individuals gives a theoretical upper bound and lower bound of prediction probability, although the achieved accuracies of users with the same predictability vary. In this work, we investigate the missing locations phenomenon which means the users visit new locations in the testing set. The major difference of theoretical bound between with and without missing locations are found, which shows that users without missing locations are easier to predict. After discussing the impact of missing locations on the prediction accuracy, a modified Markov chain prediction model is proposed to deal with the presence of missing positions. Finally, the correlation between accuracy and predictability can be modeled as the Gaussian distribution and the standard deviation modeled with missing locations can be modeled as double Gaussian function, while that without missing locations can be modeled as the third-order polynomial function.","PeriodicalId":335850,"journal":{"name":"2019 IEEE International Congress on Big Data (BigDataCongress)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126232950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Distributed, Numerically Stable Distance and Covariance Computation with MPI for Extremely Large Datasets 用MPI计算超大数据集的分布、数值稳定距离和协方差

2019 IEEE International Congress on Big Data (BigDataCongress) Pub Date : 2019-07-08 DOI: 10.1109/BigDataCongress.2019.00023

Daniel Peralta, Y. Saeys

{"title":"Distributed, Numerically Stable Distance and Covariance Computation with MPI for Extremely Large Datasets","authors":"Daniel Peralta, Y. Saeys","doi":"10.1109/BigDataCongress.2019.00023","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2019.00023","url":null,"abstract":"The current explosion of data, which is impacting many different areas, is especially noticeable in biomedical research thanks to the development of new technologies that are able to capture high-dimensional and high-resolution data at the single-cell scale. Processing such data in an interpretable way often requires the computation of pairwise dissimilarity measures between the multiple features of the data, a task that can be very difficult to tackle when the dataset is large enough, and which is prone to numerical instability. In this paper we propose a distributed framework to efficiently compute dissimilarity matrices in arbitrarily large datasets in a numerically robust way. It implements a combination of the pairwise and two-pass algorithms for computing the variance, in order to maintain the numerical robustness of the former while reducing its overhead. The proposal is parallelizable both across multiple computers and multiple cores, maximizing the performance while maintaining the benefits of memory locality. The proposal is tested on a real use case: a dataset generated from high-content screening images composed by a billion individual cells and 786 features. The results showed linear scalability with respect to the size of the dataset and close to linear speedup.","PeriodicalId":335850,"journal":{"name":"2019 IEEE International Congress on Big Data (BigDataCongress)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123923737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

DLBench: An Experimental Evaluation of Deep Learning Frameworks DLBench:深度学习框架的实验评估

2019 IEEE International Congress on Big Data (BigDataCongress) Pub Date : 2019-07-08 DOI: 10.1109/BigDataCongress.2019.00034

Nesma Mahmoud, Youssef Essam, Radwa El Shawi, S. Sakr

{"title":"DLBench: An Experimental Evaluation of Deep Learning Frameworks","authors":"Nesma Mahmoud, Youssef Essam, Radwa El Shawi, S. Sakr","doi":"10.1109/BigDataCongress.2019.00034","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2019.00034","url":null,"abstract":"Recently, deep learning has become one of the most disruptive trends in the technology world. Deep learning techniques are increasingly achieving significant results in different domains such as speech recognition, image recognition and natural language processing. In general, there are various reasons behind the increasing popularity of deep learning techniques. These reasons include increasing data availability, the increasing availability of powerful hardware and computing resources in addition to the increasing availability of deep learning frameworks. In practice, the increasing popularity of deep learning frameworks calls for benchmarking studies that can effectively evaluate the performance characteristics of these systems. In this paper, we present an extensive experimental study of six popular deep learning frameworks, namely TensorFlow, MXNet, PyTorch, Theano, Chainer, and Keras. Our experimental evaluation covers different aspects for its comparison including accuracy, speed and resource consumption. Our experiments have been conducted on both CPU and GPU environments and using different datasets. We report and analyze the performance characteristics of the studied frameworks. In addition, we report a set of insights and important lessons that we have learned from conducting our experiments.","PeriodicalId":335850,"journal":{"name":"2019 IEEE International Congress on Big Data (BigDataCongress)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128454432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Efficient Re-Computation of Big Data Analytics Processes in the Presence of Changes: Computational Framework, Reference Architecture, and Applications 变化存在下大数据分析过程的高效再计算:计算框架、参考架构和应用

2019 IEEE International Congress on Big Data (BigDataCongress) Pub Date : 2019-07-08 DOI: 10.1109/BigDataCongress.2019.00017

P. Missier, J. Cala

{"title":"Efficient Re-Computation of Big Data Analytics Processes in the Presence of Changes: Computational Framework, Reference Architecture, and Applications","authors":"P. Missier, J. Cala","doi":"10.1109/BigDataCongress.2019.00017","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2019.00017","url":null,"abstract":"Insights generated from Big Data through analytics processes are often unstable over time and thus lose their value, as the analysis typically depends on elements that change and evolve dynamically. However, the cost of having to periodically \"redo\" computationally expensive data analytics is not normally taken into account when assessing the benefits of the outcomes. The ReComp project addresses the problem of efficiently re-computing, all or in part, outcomes from complex analytical processes in response to some of the changes that occur to process dependencies. While such dependencies may include application and system libraries, as well as the deployment environment, ReComp is focused exclusively on changes to reference datasets as well as to the original inputs. Our hypothesis is that an efficient re-computation strategy requires the ability to (i) observe and quantify data changes, (ii) estimate the impact of those changes on a population of prior outcomes, (iii) identify the minimal process fragments that can restore the currency of the impacted outcomes, and (iv) selectively drive their refresh. In this paper we present a generic framework that addresses these requirements, and show how it can be customised to operate on two case studies of very diverse domains, namely genomics and geosciences. We discuss lessons learnt and outline the next steps towards the ReComp vision.","PeriodicalId":335850,"journal":{"name":"2019 IEEE International Congress on Big Data (BigDataCongress)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115972244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1