2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA)最新文献_第3页

SMOTEBoost for Regression: Improving the Prediction of Extreme Values SMOTEBoost用于回归:改进极值的预测

2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA) Pub Date : 2018-10-01 DOI: 10.1109/DSAA.2018.00025

Nuno Moniz, Rita P. Ribeiro, Vítor Cerqueira, N. Chawla

引用次数: 18

Multivariate Time Series Early Classification Using Multi-Domain Deep Neural Network 基于多域深度神经网络的多元时间序列早期分类

2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA) Pub Date : 2018-10-01 DOI: 10.1109/DSAA.2018.00019

Huai-Shuo Huang, Chien-Liang Liu, V. Tseng

{"title":"Multivariate Time Series Early Classification Using Multi-Domain Deep Neural Network","authors":"Huai-Shuo Huang, Chien-Liang Liu, V. Tseng","doi":"10.1109/DSAA.2018.00019","DOIUrl":"https://doi.org/10.1109/DSAA.2018.00019","url":null,"abstract":"Early classification on multivariate time series is an important research topic in data mining with wide applications to various domains like medical diagnosis, motion detection and financial prediction, etc. Shapelet is probably one of the most commonly used approaches to tackle early classification problem, but one drawback of shaplet is its inefficiency. More importantly, the extracted shapelets may not be applicable to every test case at any time point. This work focuses on early classification of multivariate time series and proposes a novel framework named Multi-Domain Deep Neural Network (MDDNN), in which convolutional neural network (CNN) and long-short term memory (LSTM) are incorporated to learn feature representation and relationship embedding in the long sequences with long time lags. The proposed model can make predictions at any time point of a multivariate time series with the help of a truncation process. We conducted experiments on four real datasets and compared with state-of-the-art algorithms. The experimental results indicate that the proposed method outperforms the alternatives significantly on both of earliness and accuracy. Detailed analysis about the proposed model is also provided in this work. To the best of our knowledge, this is the first work that incorporates deep neural network methods (CNN and LSTM) and multi-domain approach to boost the problem of early classification on multivariate time series.","PeriodicalId":208455,"journal":{"name":"2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"35 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115984439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

DSAA 2018 Special Session: Data Science for Social Good DSAA 2018特别会议:社会公益数据科学

2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA) Pub Date : 2018-10-01 DOI: 10.1109/DSAA.2018.00060

D. Paolotti, M. Tizzoni

引用次数: 4

Opportunities and Risks for Data Science in Organizations: Banking, Finance, and Policy - Special Session Overview 数据科学在组织中的机遇和风险:银行，金融和政策-特别会议综述

2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA) Pub Date : 2018-10-01 DOI: 10.1109/DSAA.2018.00078

A. Azzini, S. Marrara, Amir Topalovic, M. P. Bach, Matthew J. Rattigan

引用次数: 1

Data Fusion to Describe and Quantify Search and Rescue Operations in the Mediterranean Sea 描述和量化地中海搜救行动的数据融合

2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA) Pub Date : 2018-10-01 DOI: 10.1109/DSAA.2018.00066

K. H. Pham, Jeremy Boy, M. Luengo-Oroz

{"title":"Data Fusion to Describe and Quantify Search and Rescue Operations in the Mediterranean Sea","authors":"K. H. Pham, Jeremy Boy, M. Luengo-Oroz","doi":"10.1109/DSAA.2018.00066","DOIUrl":"https://doi.org/10.1109/DSAA.2018.00066","url":null,"abstract":"The Mediterranean Sea is the stage of one of the biggest humanitarian crises to affect Europe. Since 2014, thousands of migrants and refugees have died or gone missing in dangerous attempts to cross into the continent. However, there is relatively little structured information available on how they attempt the crossing. Such information could be used to better target maritime rescue efforts or to anticipate smuggling patterns, which could potentially save lives. In this article, we provide an overview of data sources available for the study of migration in the Central Mediterranean. We describe how these data can be structured, combined, and analyzed to provide quantitative insights on the situation in the region. We define a quantified rescue framework for fusing different data sources around individual rescue operations, and we explore the potential of machine learning to perform automated rescue detection based on vessel trajectory information. We conclude with technical research questions, and potential policy and operational implications related to the use of these data sources.","PeriodicalId":208455,"journal":{"name":"2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130481836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

DSAA 2018 Keynotes

2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA) Pub Date : 2018-10-01 DOI: 10.1109/dsaa.2018.00009

引用次数: 0

DeepClean: Data Cleaning via Question Asking DeepClean:通过提问进行数据清理

2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA) Pub Date : 2018-10-01 DOI: 10.1109/DSAA.2018.00039

Xinyang Zhang, Yujie Ji, Chanh Nguyen, Ting Wang

{"title":"DeepClean: Data Cleaning via Question Asking","authors":"Xinyang Zhang, Yujie Ji, Chanh Nguyen, Ting Wang","doi":"10.1109/DSAA.2018.00039","DOIUrl":"https://doi.org/10.1109/DSAA.2018.00039","url":null,"abstract":"As one critical task in the data analysis pipeline, data cleaning is notoriously human labor-intensive and error-prone. Knowledge base-assisted data cleaning has proved a powerful tool for finding and fixing data defects; however, its applicability is inevitably bounded by the natural limitations of knowledge bases. Meanwhile, although a vast number of knowledge sources exist in the form of free-text corpora (e.g., Wikipedia), transforming them into formats usable by existing data cleaning tools can be prohibitively costly and error-prone, if not at all impossible. Here, we present DeepClean, the first end-to-end data cleaning framework powered by free-text knowledge sources. At a high level, DeepClean leverages a knowledge source through its question-answering (QA) interface and achieves high-quality cleaning via iterative question asking. Specifically, DeepClean detects and repairs data defects in three stages: (i) Pattern extraction - it automatically discovers the semantic types of the data attributes as well as their correlations; (ii) Question generation - it translates each data tuple into a minimal set of validation questions; (iii) Completion and repair - by checking the answers returned by the knowledge source against the data values, it identifies erroneous cases and suggests possible fixes. Through extensive empirical studies, we demonstrate that DeepClean is applicable to a range of domains, and can effectively repair a variety of data defects, highlighting data cleaning powered by free-text knowledge sources as a promising direction for future research.","PeriodicalId":208455,"journal":{"name":"2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"32 9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134484105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Pattern-Based Automatic Parallelization of Representative-Based Clustering Algorithms 基于表示的聚类算法的模式自动并行化

2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA) Pub Date : 2018-10-01 DOI: 10.1109/DSAA.2018.00020

Saiyedul Islam, S. Balasubramaniam, Shruti Gupta, Shikhar Brajesh, Rohan Badlani, Nitin Labhishetty, Abhinav Baid, Poonam Goyal, Navneet Goyal

{"title":"Pattern-Based Automatic Parallelization of Representative-Based Clustering Algorithms","authors":"Saiyedul Islam, S. Balasubramaniam, Shruti Gupta, Shikhar Brajesh, Rohan Badlani, Nitin Labhishetty, Abhinav Baid, Poonam Goyal, Navneet Goyal","doi":"10.1109/DSAA.2018.00020","DOIUrl":"https://doi.org/10.1109/DSAA.2018.00020","url":null,"abstract":"Ease of programming and optimal parallel performance have historically been on the opposite side of a tradeoff, forcing the user to choose. With the advent of the Big Data era and rapid evolution of sequential algorithms, the data analytics community can no longer afford the tradeoff. We observed that several clustering algorithms often share common traits - particularly, algorithms belonging to same class of clustering exhibit significant overlap in processing steps. Here, we present our observation on domain patterns in Representative-based clustering algorithms and how they manifest as clearly identifiable programming patterns when mapped to a Domain Specific Language (DSL). We have integrated the signatures of these patterns in the DSL compiler for parallelism identification and automatic parallel code generation. Our experiments on different state-of-the-art parallelization frameworks shows that our system is able to achieve near-optimal speedup while requiring a fraction of the programming effort, making it an ideal choice for the data analytics community.","PeriodicalId":208455,"journal":{"name":"2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133566676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Towards Simulation-Data Science – A Case Study on Material Failures 迈向模拟-数据科学-材料失效案例研究

2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA) Pub Date : 2018-10-01 DOI: 10.1109/DSAA.2018.00058

Holger Trittenbach, M. Gauch, Klemens Böhm, K. Schulz

{"title":"Towards Simulation-Data Science – A Case Study on Material Failures","authors":"Holger Trittenbach, M. Gauch, Klemens Böhm, K. Schulz","doi":"10.1109/DSAA.2018.00058","DOIUrl":"https://doi.org/10.1109/DSAA.2018.00058","url":null,"abstract":"Simulations let scientists study properties of complex systems. At first sight, data mining is a good choice when evaluating large numbers of simulations. But it is currently unclear whether there are general principles that might guide the deployment of respective methods to simulation data. In other words, is it worthwhile to target at simulation-data science as a distinct subdiscipline of data science? To identify a respective research agenda and to structure the research questions, we conduct a case study from the domain of materials science. One insight that simulation data may be different from other data regarding its structure and quality, which entails focal points different from the ones of conventional data-analysis projects. It also turns out that interpretability and usability are important notions in our context as well. More attention is needed to gather the various meanings of these terms to align them with the needs and priorities of domain scientists. Finally, we propose extensions to our case study which we deem necessary to generalize our insights towards the guidelines envisioned for simulation-data science.","PeriodicalId":208455,"journal":{"name":"2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121849579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Crowdsourcing Landforms for Open GIS Enrichment 开放GIS丰富的众包地貌

2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA) Pub Date : 2018-10-01 DOI: 10.1109/DSAA.2018.00077

Rocio Nahime Torres, Darian Frajberg, P. Fraternali, Sergio Luis Herrera Gonzales

引用次数: 1