2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA)最新文献

筛选
英文 中文
SMOTEBoost for Regression: Improving the Prediction of Extreme Values SMOTEBoost用于回归:改进极值的预测
Nuno Moniz, Rita P. Ribeiro, Vítor Cerqueira, N. Chawla
{"title":"SMOTEBoost for Regression: Improving the Prediction of Extreme Values","authors":"Nuno Moniz, Rita P. Ribeiro, Vítor Cerqueira, N. Chawla","doi":"10.1109/DSAA.2018.00025","DOIUrl":"https://doi.org/10.1109/DSAA.2018.00025","url":null,"abstract":"Supervised learning with imbalanced domains is one of the biggest challenges in machine learning. Such tasks differ from standard learning tasks by assuming a skewed distribution of target variables, and user domain preference towards under-represented cases. Most research has focused on imbalanced classification tasks, where a wide range of solutions has been tested. Still, little work has been done concerning imbalanced regression tasks. In this paper, we propose an adaptation of the SMOTEBoost approach for the problem of imbalanced regression. Originally designed for classification tasks, it combines boosting methods and the SMOTE resampling strategy. We present four variants of SMOTEBoost and provide an experimental evaluation using 30 datasets with an extensive analysis of results in order to assess the ability of SMOTEBoost methods in predicting extreme target values, and their predictive trade-off concerning baseline boosting methods. SMOTEBoost is publicly available in a software package.","PeriodicalId":208455,"journal":{"name":"2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"128 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122548464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Multivariate Time Series Early Classification Using Multi-Domain Deep Neural Network 基于多域深度神经网络的多元时间序列早期分类
Huai-Shuo Huang, Chien-Liang Liu, V. Tseng
{"title":"Multivariate Time Series Early Classification Using Multi-Domain Deep Neural Network","authors":"Huai-Shuo Huang, Chien-Liang Liu, V. Tseng","doi":"10.1109/DSAA.2018.00019","DOIUrl":"https://doi.org/10.1109/DSAA.2018.00019","url":null,"abstract":"Early classification on multivariate time series is an important research topic in data mining with wide applications to various domains like medical diagnosis, motion detection and financial prediction, etc. Shapelet is probably one of the most commonly used approaches to tackle early classification problem, but one drawback of shaplet is its inefficiency. More importantly, the extracted shapelets may not be applicable to every test case at any time point. This work focuses on early classification of multivariate time series and proposes a novel framework named Multi-Domain Deep Neural Network (MDDNN), in which convolutional neural network (CNN) and long-short term memory (LSTM) are incorporated to learn feature representation and relationship embedding in the long sequences with long time lags. The proposed model can make predictions at any time point of a multivariate time series with the help of a truncation process. We conducted experiments on four real datasets and compared with state-of-the-art algorithms. The experimental results indicate that the proposed method outperforms the alternatives significantly on both of earliness and accuracy. Detailed analysis about the proposed model is also provided in this work. To the best of our knowledge, this is the first work that incorporates deep neural network methods (CNN and LSTM) and multi-domain approach to boost the problem of early classification on multivariate time series.","PeriodicalId":208455,"journal":{"name":"2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"35 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115984439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
DSAA 2018 Special Session: Data Science for Social Good DSAA 2018特别会议:社会公益数据科学
D. Paolotti, M. Tizzoni
{"title":"DSAA 2018 Special Session: Data Science for Social Good","authors":"D. Paolotti, M. Tizzoni","doi":"10.1109/DSAA.2018.00060","DOIUrl":"https://doi.org/10.1109/DSAA.2018.00060","url":null,"abstract":"We provide an overview of the DSAA 2018 Data Science for Social Good special session, its aims and contributions.","PeriodicalId":208455,"journal":{"name":"2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127406482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Opportunities and Risks for Data Science in Organizations: Banking, Finance, and Policy - Special Session Overview 数据科学在组织中的机遇和风险:银行,金融和政策-特别会议综述
A. Azzini, S. Marrara, Amir Topalovic, M. P. Bach, Matthew J. Rattigan
{"title":"Opportunities and Risks for Data Science in Organizations: Banking, Finance, and Policy - Special Session Overview","authors":"A. Azzini, S. Marrara, Amir Topalovic, M. P. Bach, Matthew J. Rattigan","doi":"10.1109/DSAA.2018.00078","DOIUrl":"https://doi.org/10.1109/DSAA.2018.00078","url":null,"abstract":"In this paper, the DSAA 2018 special session \"Opportunities and Risks for Data Science in Organizations: Banking, Finance, and Policy\" is presented. This session is focused on discussing how banking and finance organizations can benefit from the huge amount of data they own and continue to gather. Moreover, the session aims at identifying and exploring the challenges of applying data science to financial policy questions. It is also planned to promote a special issue of the ACM Journal of Data and Information Quality on the workshop topics.","PeriodicalId":208455,"journal":{"name":"2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125660109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Data Fusion to Describe and Quantify Search and Rescue Operations in the Mediterranean Sea 描述和量化地中海搜救行动的数据融合
K. H. Pham, Jeremy Boy, M. Luengo-Oroz
{"title":"Data Fusion to Describe and Quantify Search and Rescue Operations in the Mediterranean Sea","authors":"K. H. Pham, Jeremy Boy, M. Luengo-Oroz","doi":"10.1109/DSAA.2018.00066","DOIUrl":"https://doi.org/10.1109/DSAA.2018.00066","url":null,"abstract":"The Mediterranean Sea is the stage of one of the biggest humanitarian crises to affect Europe. Since 2014, thousands of migrants and refugees have died or gone missing in dangerous attempts to cross into the continent. However, there is relatively little structured information available on how they attempt the crossing. Such information could be used to better target maritime rescue efforts or to anticipate smuggling patterns, which could potentially save lives. In this article, we provide an overview of data sources available for the study of migration in the Central Mediterranean. We describe how these data can be structured, combined, and analyzed to provide quantitative insights on the situation in the region. We define a quantified rescue framework for fusing different data sources around individual rescue operations, and we explore the potential of machine learning to perform automated rescue detection based on vessel trajectory information. We conclude with technical research questions, and potential policy and operational implications related to the use of these data sources.","PeriodicalId":208455,"journal":{"name":"2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130481836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
DSAA 2018 Keynotes
{"title":"DSAA 2018 Keynotes","authors":"","doi":"10.1109/dsaa.2018.00009","DOIUrl":"https://doi.org/10.1109/dsaa.2018.00009","url":null,"abstract":"","PeriodicalId":208455,"journal":{"name":"2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130322228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DeepClean: Data Cleaning via Question Asking DeepClean:通过提问进行数据清理
Xinyang Zhang, Yujie Ji, Chanh Nguyen, Ting Wang
{"title":"DeepClean: Data Cleaning via Question Asking","authors":"Xinyang Zhang, Yujie Ji, Chanh Nguyen, Ting Wang","doi":"10.1109/DSAA.2018.00039","DOIUrl":"https://doi.org/10.1109/DSAA.2018.00039","url":null,"abstract":"As one critical task in the data analysis pipeline, data cleaning is notoriously human labor-intensive and error-prone. Knowledge base-assisted data cleaning has proved a powerful tool for finding and fixing data defects; however, its applicability is inevitably bounded by the natural limitations of knowledge bases. Meanwhile, although a vast number of knowledge sources exist in the form of free-text corpora (e.g., Wikipedia), transforming them into formats usable by existing data cleaning tools can be prohibitively costly and error-prone, if not at all impossible. Here, we present DeepClean, the first end-to-end data cleaning framework powered by free-text knowledge sources. At a high level, DeepClean leverages a knowledge source through its question-answering (QA) interface and achieves high-quality cleaning via iterative question asking. Specifically, DeepClean detects and repairs data defects in three stages: (i) Pattern extraction - it automatically discovers the semantic types of the data attributes as well as their correlations; (ii) Question generation - it translates each data tuple into a minimal set of validation questions; (iii) Completion and repair - by checking the answers returned by the knowledge source against the data values, it identifies erroneous cases and suggests possible fixes. Through extensive empirical studies, we demonstrate that DeepClean is applicable to a range of domains, and can effectively repair a variety of data defects, highlighting data cleaning powered by free-text knowledge sources as a promising direction for future research.","PeriodicalId":208455,"journal":{"name":"2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"32 9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134484105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Pattern-Based Automatic Parallelization of Representative-Based Clustering Algorithms 基于表示的聚类算法的模式自动并行化
Saiyedul Islam, S. Balasubramaniam, Shruti Gupta, Shikhar Brajesh, Rohan Badlani, Nitin Labhishetty, Abhinav Baid, Poonam Goyal, Navneet Goyal
{"title":"Pattern-Based Automatic Parallelization of Representative-Based Clustering Algorithms","authors":"Saiyedul Islam, S. Balasubramaniam, Shruti Gupta, Shikhar Brajesh, Rohan Badlani, Nitin Labhishetty, Abhinav Baid, Poonam Goyal, Navneet Goyal","doi":"10.1109/DSAA.2018.00020","DOIUrl":"https://doi.org/10.1109/DSAA.2018.00020","url":null,"abstract":"Ease of programming and optimal parallel performance have historically been on the opposite side of a tradeoff, forcing the user to choose. With the advent of the Big Data era and rapid evolution of sequential algorithms, the data analytics community can no longer afford the tradeoff. We observed that several clustering algorithms often share common traits - particularly, algorithms belonging to same class of clustering exhibit significant overlap in processing steps. Here, we present our observation on domain patterns in Representative-based clustering algorithms and how they manifest as clearly identifiable programming patterns when mapped to a Domain Specific Language (DSL). We have integrated the signatures of these patterns in the DSL compiler for parallelism identification and automatic parallel code generation. Our experiments on different state-of-the-art parallelization frameworks shows that our system is able to achieve near-optimal speedup while requiring a fraction of the programming effort, making it an ideal choice for the data analytics community.","PeriodicalId":208455,"journal":{"name":"2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133566676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Towards Simulation-Data Science – A Case Study on Material Failures 迈向模拟-数据科学-材料失效案例研究
Holger Trittenbach, M. Gauch, Klemens Böhm, K. Schulz
{"title":"Towards Simulation-Data Science – A Case Study on Material Failures","authors":"Holger Trittenbach, M. Gauch, Klemens Böhm, K. Schulz","doi":"10.1109/DSAA.2018.00058","DOIUrl":"https://doi.org/10.1109/DSAA.2018.00058","url":null,"abstract":"Simulations let scientists study properties of complex systems. At first sight, data mining is a good choice when evaluating large numbers of simulations. But it is currently unclear whether there are general principles that might guide the deployment of respective methods to simulation data. In other words, is it worthwhile to target at simulation-data science as a distinct subdiscipline of data science? To identify a respective research agenda and to structure the research questions, we conduct a case study from the domain of materials science. One insight that simulation data may be different from other data regarding its structure and quality, which entails focal points different from the ones of conventional data-analysis projects. It also turns out that interpretability and usability are important notions in our context as well. More attention is needed to gather the various meanings of these terms to align them with the needs and priorities of domain scientists. Finally, we propose extensions to our case study which we deem necessary to generalize our insights towards the guidelines envisioned for simulation-data science.","PeriodicalId":208455,"journal":{"name":"2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121849579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Crowdsourcing Landforms for Open GIS Enrichment 开放GIS丰富的众包地貌
Rocio Nahime Torres, Darian Frajberg, P. Fraternali, Sergio Luis Herrera Gonzales
{"title":"Crowdsourcing Landforms for Open GIS Enrichment","authors":"Rocio Nahime Torres, Darian Frajberg, P. Fraternali, Sergio Luis Herrera Gonzales","doi":"10.1109/DSAA.2018.00077","DOIUrl":"https://doi.org/10.1109/DSAA.2018.00077","url":null,"abstract":"Open Source Geographical Information Systems, such as OpenStreetMap (OSM), offer a valuable alternative to proprietary solutions for the development of voluntary environment monitoring systems. However, the quantity and quality of information stored in such systems must be carefully evaluated and the contributions of volunteers must be boosted by means of effective engagement methods. This paper reports the results of the assessment of the quality and quantity of OpenStreetMap mountain information: different types of information and world regions have different gaps and improvement requirements. To address this issue, we propose a hybrid approach, in which an open Digital Elevation Model data set is processed with a heuristic algorithm to find candidate mountain information and uncertainty in the automatically extracted candidates is reduced by means of voluntary expert crowdsourcing. The improvement of landform information (not only about mountains, but also about orography and hydrography in general) can support the development of environment monitoring applications.","PeriodicalId":208455,"journal":{"name":"2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129788710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信