2019 IEEE International Congress on Big Data (BigDataCongress)最新文献

筛选
英文 中文
HyperSpark: A Data-Intensive Programming Environment for Parallel Metaheuristics HyperSpark:用于并行元启发式的数据密集型编程环境
2019 IEEE International Congress on Big Data (BigDataCongress) Pub Date : 2019-07-08 DOI: 10.1109/BigDataCongress.2019.00024
M. Ciavotta, S. Krstic, D. Tamburri, W. Heuvel
{"title":"HyperSpark: A Data-Intensive Programming Environment for Parallel Metaheuristics","authors":"M. Ciavotta, S. Krstic, D. Tamburri, W. Heuvel","doi":"10.1109/BigDataCongress.2019.00024","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2019.00024","url":null,"abstract":"Metaheuristics are search procedures used to solve complex, often intractable problems for which other approaches are unsuitable or unable to provide solutions in reasonable times. Although computing power has grown exponentially with the onset of Cloud Computing and Big Data platforms, the domain of metaheuristics has not yet taken full advantage of this new potential. In this paper, we address this gap by proposing HyperSpark, an optimization framework for the scalable execution of user-defined, computationally-intensive heuristics. We designed HyperSpark as a flexible tool meant to harness the benefits (e.g., scalability by design) and features (e.g., a simple programming model or ad-hoc infrastructure tuning) of state-of-the-art big data technology for the benefit of optimization methods. We elaborate on HyperSpark and assess its validity and generality on a library implementing several metaheuristics for the Permutation Flow-Shop Problem (PFSP). We observe that HyperSpark results are comparable with the best tools and solutions from the literature. We conclude that our proof-of-concept shows great potential for further research and practical use.","PeriodicalId":335850,"journal":{"name":"2019 IEEE International Congress on Big Data (BigDataCongress)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124868500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Big Data Analytics and Predictive Modeling Approaches for the Energy Sector 能源领域的大数据分析和预测建模方法
2019 IEEE International Congress on Big Data (BigDataCongress) Pub Date : 2019-07-08 DOI: 10.1109/BigDataCongress.2019.00020
Roberto Corizzo, Michelangelo Ceci, D. Malerba
{"title":"Big Data Analytics and Predictive Modeling Approaches for the Energy Sector","authors":"Roberto Corizzo, Michelangelo Ceci, D. Malerba","doi":"10.1109/BigDataCongress.2019.00020","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2019.00020","url":null,"abstract":"This paper describes recent results achieved in the analysis of geo-distributed sensor data generated in the context of the energy sector. The approaches described have roots in the Big Data Analytics and Predictive Modeling research fields and are based on distributed architectures. They tackle the energy forecasting task for a network of energy production plants, by also taking into consideration the detection and treatment of anomalies in the data. This research is motivated by and consistent with the objectives of research projects funded by the European Commission and by many national governments.","PeriodicalId":335850,"journal":{"name":"2019 IEEE International Congress on Big Data (BigDataCongress)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131086815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A New Unsupervised Predictive-Model Self-Assessment Approach That SCALEs 一种新的无监督预测模型自评方法
2019 IEEE International Congress on Big Data (BigDataCongress) Pub Date : 2019-07-08 DOI: 10.1109/BigDataCongress.2019.00033
F. Ventura, Stefano Proto, D. Apiletti, T. Cerquitelli, S. Panicucci, Elena Baralis, E. Macii, A. Macii
{"title":"A New Unsupervised Predictive-Model Self-Assessment Approach That SCALEs","authors":"F. Ventura, Stefano Proto, D. Apiletti, T. Cerquitelli, S. Panicucci, Elena Baralis, E. Macii, A. Macii","doi":"10.1109/BigDataCongress.2019.00033","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2019.00033","url":null,"abstract":"Evaluating the degradation of predictive models over time has always been a difficult task, also considering that new unseen data might not fit the training distribution. This is a well-known problem in real-world use cases, where collecting the historical training set for all possible prediction labels may be very hard, too expensive or completely unfeasible. To solve this issue, we present a new unsupervised approach to detect and evaluate the degradation of classification and prediction models, based on a scalable variant of the Silhouette index, named Descriptor Silhouette, specifically designed to advance current Big Data state-of-the-art solutions. The newly proposed strategy has been tested and validated over both synthetic and real-world industrial use cases. To this aim, it has been included in a framework named SCALE and resulted to be efficient and more effective in assessing the degradation of prediction performance than current state-of-the-art best solutions.","PeriodicalId":335850,"journal":{"name":"2019 IEEE International Congress on Big Data (BigDataCongress)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133651819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Dynamic Resource Shaping for Compute Clusters 计算集群的动态资源整形
2019 IEEE International Congress on Big Data (BigDataCongress) Pub Date : 2019-07-08 DOI: 10.1109/BigDataCongress.2019.00019
Francesco Pace, D. Milios, D. Carra, P. Michiardi
{"title":"Dynamic Resource Shaping for Compute Clusters","authors":"Francesco Pace, D. Milios, D. Carra, P. Michiardi","doi":"10.1109/BigDataCongress.2019.00019","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2019.00019","url":null,"abstract":"Nowadays, data-centers are largely under-utilized because resource allocation is based on reservation mechanisms which ignore actual resource utilization. Indeed, it is common to reserve resources for peak demand, which may occur only for a small portion of the application life time. As a consequence, cluster resources often go under-utilized. In this work, we propose a mechanism that improves compute cluster utilization and their responsiveness, while preventing application failures due to contention in accessing finite resources such as RAM. Our method monitors resource utilization and employs a data-driven approach to resource demand forecasting, featuring quantification of uncertainty in the predictions. Using demand forecast and its confidence, our mechanism modulates cluster resources assigned to running applications, and reduces the turnaround time by more than one order of magnitude while keeping application failures under control. Thus, tenants enjoy a responsive system and providers benefit from an efficient cluster utilization.","PeriodicalId":335850,"journal":{"name":"2019 IEEE International Congress on Big Data (BigDataCongress)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124599343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Context-Aware Enforcement of Privacy Policies in Edge Computing 边缘计算中上下文感知隐私策略的实施
2019 IEEE International Congress on Big Data (BigDataCongress) Pub Date : 2019-07-08 DOI: 10.1109/BigDataCongress.2019.00014
Clemens Lachner, T. Rausch, S. Dustdar
{"title":"Context-Aware Enforcement of Privacy Policies in Edge Computing","authors":"Clemens Lachner, T. Rausch, S. Dustdar","doi":"10.1109/BigDataCongress.2019.00014","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2019.00014","url":null,"abstract":"Privacy is a fundamental concern that confronts systems dealing with sensitive data. The lack of robust solutions for defining and enforcing privacy measures continues to hinder the general acceptance and adoption of these systems. Edge computing has been recognized as a key enabler for privacy enhanced applications, and has opened new opportunities. In this paper, we propose a novel privacy model based on context-aware edge computing. Our model leverages the context of data to make decisions about how these data need to be processed and managed to achieve privacy. Based on a scenario from the eHealth domain, we show how our generalized model can be used to implement and enact complex domain-specific privacy policies. We illustrate our approach by constructing real world use cases involving a mobile Electronic Health Record that interacts with, and in different environments.","PeriodicalId":335850,"journal":{"name":"2019 IEEE International Congress on Big Data (BigDataCongress)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114503263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
PREMISES, a Scalable Data-Driven Service to Predict Alarms in Slowly-Degrading Multi-Cycle Industrial Processes PREMISES,一种可扩展的数据驱动服务,用于预测缓慢退化的多周期工业过程中的警报
2019 IEEE International Congress on Big Data (BigDataCongress) Pub Date : 2019-07-08 DOI: 10.1109/BigDataCongress.2019.00032
Stefano Proto, F. Ventura, D. Apiletti, T. Cerquitelli, Elena Baralis, E. Macii, A. Macii
{"title":"PREMISES, a Scalable Data-Driven Service to Predict Alarms in Slowly-Degrading Multi-Cycle Industrial Processes","authors":"Stefano Proto, F. Ventura, D. Apiletti, T. Cerquitelli, Elena Baralis, E. Macii, A. Macii","doi":"10.1109/BigDataCongress.2019.00032","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2019.00032","url":null,"abstract":"In recent years, the number of industry-4.0-enabled manufacturing sites has been continuously growing, and both the quantity and variety of signals and data collected in plants are increasing at an unprecedented rate. At the same time, the demand of Big Data processing platforms and analytical tools tailored to manufacturing environments has become more and more prominent. Manufacturing companies are collecting huge amounts of information during the production process through a plethora of sensors and networks. To extract value and actionable knowledge from such precious repositories, suitable data-driven approaches are required. They are expected to improve the production processes by reducing maintenance costs, reliably predicting equipment failures, and avoiding quality degradation. To this aim, Machine Learning techniques tailored for predictive maintenance analysis have been adopted in PREMISES (PREdictive Maintenance service for Industrial procesSES), an innovative framework providing a scalable Big Data service able to predict alarming conditions in slowly-degrading processes characterized by cyclic procedures. PREMISES has been experimentally tested and validated on a real industrial use case, resulting efficient and effective in predicting alarms. The framework has been designed to address the main Big Data and industrial requirements, by being developed on a solid and scalable processing framework, Apache Spark, and supporting the deployment on modularized containers, specifically upon the Docker technology stack.","PeriodicalId":335850,"journal":{"name":"2019 IEEE International Congress on Big Data (BigDataCongress)","volume":"9 24","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114085606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Mobility Prediction with Missing Locations Based on Modified Markov Model for Wireless Users 基于改进马尔可夫模型的无线用户缺失位置移动预测
2019 IEEE International Congress on Big Data (BigDataCongress) Pub Date : 2019-07-08 DOI: 10.1109/BigDataCongress.2019.00031
Junyao Guo, Lu Liu, Sihai Zhang, Jinkang Zhu
{"title":"Mobility Prediction with Missing Locations Based on Modified Markov Model for Wireless Users","authors":"Junyao Guo, Lu Liu, Sihai Zhang, Jinkang Zhu","doi":"10.1109/BigDataCongress.2019.00031","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2019.00031","url":null,"abstract":"Mobility prediction is an interesting topic attracting many researchers and both prediction theory and models are explored in the existing literature. The entropy metric to evaluate the mobility predictability of individuals gives a theoretical upper bound and lower bound of prediction probability, although the achieved accuracies of users with the same predictability vary. In this work, we investigate the missing locations phenomenon which means the users visit new locations in the testing set. The major difference of theoretical bound between with and without missing locations are found, which shows that users without missing locations are easier to predict. After discussing the impact of missing locations on the prediction accuracy, a modified Markov chain prediction model is proposed to deal with the presence of missing positions. Finally, the correlation between accuracy and predictability can be modeled as the Gaussian distribution and the standard deviation modeled with missing locations can be modeled as double Gaussian function, while that without missing locations can be modeled as the third-order polynomial function.","PeriodicalId":335850,"journal":{"name":"2019 IEEE International Congress on Big Data (BigDataCongress)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126232950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Distributed, Numerically Stable Distance and Covariance Computation with MPI for Extremely Large Datasets 用MPI计算超大数据集的分布、数值稳定距离和协方差
2019 IEEE International Congress on Big Data (BigDataCongress) Pub Date : 2019-07-08 DOI: 10.1109/BigDataCongress.2019.00023
Daniel Peralta, Y. Saeys
{"title":"Distributed, Numerically Stable Distance and Covariance Computation with MPI for Extremely Large Datasets","authors":"Daniel Peralta, Y. Saeys","doi":"10.1109/BigDataCongress.2019.00023","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2019.00023","url":null,"abstract":"The current explosion of data, which is impacting many different areas, is especially noticeable in biomedical research thanks to the development of new technologies that are able to capture high-dimensional and high-resolution data at the single-cell scale. Processing such data in an interpretable way often requires the computation of pairwise dissimilarity measures between the multiple features of the data, a task that can be very difficult to tackle when the dataset is large enough, and which is prone to numerical instability. In this paper we propose a distributed framework to efficiently compute dissimilarity matrices in arbitrarily large datasets in a numerically robust way. It implements a combination of the pairwise and two-pass algorithms for computing the variance, in order to maintain the numerical robustness of the former while reducing its overhead. The proposal is parallelizable both across multiple computers and multiple cores, maximizing the performance while maintaining the benefits of memory locality. The proposal is tested on a real use case: a dataset generated from high-content screening images composed by a billion individual cells and 786 features. The results showed linear scalability with respect to the size of the dataset and close to linear speedup.","PeriodicalId":335850,"journal":{"name":"2019 IEEE International Congress on Big Data (BigDataCongress)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123923737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
DLBench: An Experimental Evaluation of Deep Learning Frameworks DLBench:深度学习框架的实验评估
2019 IEEE International Congress on Big Data (BigDataCongress) Pub Date : 2019-07-08 DOI: 10.1109/BigDataCongress.2019.00034
Nesma Mahmoud, Youssef Essam, Radwa El Shawi, S. Sakr
{"title":"DLBench: An Experimental Evaluation of Deep Learning Frameworks","authors":"Nesma Mahmoud, Youssef Essam, Radwa El Shawi, S. Sakr","doi":"10.1109/BigDataCongress.2019.00034","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2019.00034","url":null,"abstract":"Recently, deep learning has become one of the most disruptive trends in the technology world. Deep learning techniques are increasingly achieving significant results in different domains such as speech recognition, image recognition and natural language processing. In general, there are various reasons behind the increasing popularity of deep learning techniques. These reasons include increasing data availability, the increasing availability of powerful hardware and computing resources in addition to the increasing availability of deep learning frameworks. In practice, the increasing popularity of deep learning frameworks calls for benchmarking studies that can effectively evaluate the performance characteristics of these systems. In this paper, we present an extensive experimental study of six popular deep learning frameworks, namely TensorFlow, MXNet, PyTorch, Theano, Chainer, and Keras. Our experimental evaluation covers different aspects for its comparison including accuracy, speed and resource consumption. Our experiments have been conducted on both CPU and GPU environments and using different datasets. We report and analyze the performance characteristics of the studied frameworks. In addition, we report a set of insights and important lessons that we have learned from conducting our experiments.","PeriodicalId":335850,"journal":{"name":"2019 IEEE International Congress on Big Data (BigDataCongress)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128454432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Efficient Re-Computation of Big Data Analytics Processes in the Presence of Changes: Computational Framework, Reference Architecture, and Applications 变化存在下大数据分析过程的高效再计算:计算框架、参考架构和应用
2019 IEEE International Congress on Big Data (BigDataCongress) Pub Date : 2019-07-08 DOI: 10.1109/BigDataCongress.2019.00017
P. Missier, J. Cala
{"title":"Efficient Re-Computation of Big Data Analytics Processes in the Presence of Changes: Computational Framework, Reference Architecture, and Applications","authors":"P. Missier, J. Cala","doi":"10.1109/BigDataCongress.2019.00017","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2019.00017","url":null,"abstract":"Insights generated from Big Data through analytics processes are often unstable over time and thus lose their value, as the analysis typically depends on elements that change and evolve dynamically. However, the cost of having to periodically \"redo\" computationally expensive data analytics is not normally taken into account when assessing the benefits of the outcomes. The ReComp project addresses the problem of efficiently re-computing, all or in part, outcomes from complex analytical processes in response to some of the changes that occur to process dependencies. While such dependencies may include application and system libraries, as well as the deployment environment, ReComp is focused exclusively on changes to reference datasets as well as to the original inputs. Our hypothesis is that an efficient re-computation strategy requires the ability to (i) observe and quantify data changes, (ii) estimate the impact of those changes on a population of prior outcomes, (iii) identify the minimal process fragments that can restore the currency of the impacted outcomes, and (iv) selectively drive their refresh. In this paper we present a generic framework that addresses these requirements, and show how it can be customised to operate on two case studies of very diverse domains, namely genomics and geosciences. We discuss lessons learnt and outline the next steps towards the ReComp vision.","PeriodicalId":335850,"journal":{"name":"2019 IEEE International Congress on Big Data (BigDataCongress)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115972244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信