2020 International Conference on Data Mining Workshops (ICDMW)最新文献

筛选
英文 中文
Kennard-Stone Balance Algorithm for Time-series Big Data Stream Mining 时间序列大数据流挖掘的Kennard-Stone平衡算法
2020 International Conference on Data Mining Workshops (ICDMW) Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00122
Tengyue Li, S. Fong, Yaoyang Wu, A. J. Tallón-Ballesteros
{"title":"Kennard-Stone Balance Algorithm for Time-series Big Data Stream Mining","authors":"Tengyue Li, S. Fong, Yaoyang Wu, A. J. Tallón-Ballesteros","doi":"10.1109/ICDMW51313.2020.00122","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00122","url":null,"abstract":"Nowadays time series are generated relatively more easily and in larger quantity than ever, by the advances of IoT and sensor applications. Training a prediction model effectively using such big data streams poses certain challenges in machine learning. Data sampling has been an important technique in handling over-sized data in pre-processing which converts the huge data streams into a manageable and representative subset before loading them into a model induction process. In this paper a novel data conversion method, namely Kennard-Stone Balance (KSB) Algorithm is proposed. In the past decades, KS has been used by researchers for partitioning a bounded dataset into appropriate portions of training and testing data in cross-validation. In this new proposal, we extend KS into balancing the sub-sampled data in consideration of the class distribution by round-robin. It is also the first time KS is applied on time-series for the purpose of extracting a meaningful representation of big data streams, for improving the performance of a machine learning model. Preliminary simulation results show the advantages of KBS. Analysis, discussion and future works are reported in this short paper. It is anticipated that KBS brings a new alternative of data sampling to data stream mining with lots of potentials.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"28 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116728502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Data analysis and processing for spatio-temporal forecasting 时空预测的数据分析与处理
2020 International Conference on Data Mining Workshops (ICDMW) Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00106
Hyoungwoo Lee, J. Choo
{"title":"Data analysis and processing for spatio-temporal forecasting","authors":"Hyoungwoo Lee, J. Choo","doi":"10.1109/ICDMW51313.2020.00106","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00106","url":null,"abstract":"Spatio-temporal forecasting is a research area applicable to many industrial fields, such as forecasting power consumption in real-life and predicting traffic conditions of roads. For example, in the traffic forecasting, it is important to analyze spatial relations and temporal trends in order to predict traffic changes in roads over time. In the spatio-temporal forecasting task, previous studies applied graph modeling to capture spatial relations. However, existing models use only the recently available data to predict traffic conditions, leading to the degraded performance of the model. Further research is necessary for predicting the speed in the far future. As a study to tackle this issue, we aim to improve the performance of the model by providing the model with additional data through time-series segmentation. In order to verify whether the additional data could be meaningful to the model, an experiment was conducted to compare the performance of the model trained with existing data and the model trained with our data and analyze the distribution of the additional data.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114760428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
COAL: Convolutional Online Adaptation Learning for Opinion Mining 基于卷积在线适应学习的意见挖掘
2020 International Conference on Data Mining Workshops (ICDMW) Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00012
I. Chaturvedi, E. Ragusa, P. Gastaldo, E. Cambria
{"title":"COAL: Convolutional Online Adaptation Learning for Opinion Mining","authors":"I. Chaturvedi, E. Ragusa, P. Gastaldo, E. Cambria","doi":"10.1109/ICDMW51313.2020.00012","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00012","url":null,"abstract":"Thanks to recent advances in machine learning, some say AI is the new engine and data is the new coal. Mining this ‘coal’ from the ever-growing Social Web, however, can be a formidable task. In this work, we address this problem in the context of sentiment analysis using convolutional online adaptation learning (COAL). In particular, we consider semi-supervised learning of convolutional features, which we use to train an online model. Such a model, which can be trained in one domain but also used to predict sentiment in other domains, outperforms the baseline in the range of 5-20%.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"40 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134221778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Persistent Homology on Streaming Data 流数据的持久同源性
2020 International Conference on Data Mining Workshops (ICDMW) Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00090
Anindya Moitra, Nicholas O. Malott, P. Wilsey
{"title":"Persistent Homology on Streaming Data","authors":"Anindya Moitra, Nicholas O. Malott, P. Wilsey","doi":"10.1109/ICDMW51313.2020.00090","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00090","url":null,"abstract":"This paper introduces a framework to compute persistent homology, a principal tool in Topological Data Analysis, on potentially unbounded and evolving data streams. The framework is organized into online and offline components. The online element maintains a summary of the data that preserves the topological structure of the stream. The offline component computes the persistence intervals from the data captured by the summary. The framework is applied to the detection of horizontal or reticulate genomic exchanges during the evolution of species that cannot be identified by phylogenetic inference or traditional data mining. The method effectively detects reticulate evolution that occurs through reassortment and recombination in large streams of genomic sequences of Influenza and HIV viruses.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"156 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133305413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Predictive Nonlinear Modeling by Koopman Mode Decomposition 基于Koopman模态分解的预测非线性建模
2020 International Conference on Data Mining Workshops (ICDMW) Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00118
Akira Kusaba, Kilho Shin, D. Shepard, T. Kuboyama
{"title":"Predictive Nonlinear Modeling by Koopman Mode Decomposition","authors":"Akira Kusaba, Kilho Shin, D. Shepard, T. Kuboyama","doi":"10.1109/ICDMW51313.2020.00118","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00118","url":null,"abstract":"Machine learning has countless applications in time series analysis: controlling smart grids, detecting mechanical failures, and analyzing stock prices. Fourier mode decomposition (FMD) is the most common method of analysis because it decomposes time series into finite waveform components, or modes, but its principal shortcoming is that FMD assumes every mode has a constant amplitude, an assumption that rarely holds in real-world data. In contrast, Koopman mode decomposition (KMD) can detect modes with exponentially-increasing or - decreasing amplitudes, although it has mostly been applied to diagnosing data errors, not to prediction. What has kept KMD from being applied to prediction is partly a shortcoming in a mathematical formulation. This paper seeks to remedy that shortcoming: it provides a mathematically-precise formulation of KMD as a practical tool. This formulation, in turn, allows us to develop a novel practical method for prediction of future data. We further demonstrate our method's effectiveness using both synthetic data and real plasma flow data.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133973211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Interactive Knowledge Graph Attention Network for Recommender Systems 面向推荐系统的交互式知识图关注网络
2020 International Conference on Data Mining Workshops (ICDMW) Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00038
Li Yang, E. Shijia, Shiyao Xu, Yang Xiang
{"title":"Interactive Knowledge Graph Attention Network for Recommender Systems","authors":"Li Yang, E. Shijia, Shiyao Xu, Yang Xiang","doi":"10.1109/ICDMW51313.2020.00038","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00038","url":null,"abstract":"Recent progress in personalized recommendation has shown great potential in exploiting structure information provided by a knowledge graph (KG). As a heterogeneous information network, KG contains rich semantic relatedness among entities, which contributes to addressing notorious issues such as data sparsity and cold start. State-of-the-art KG-based recommendation approaches try to propagate information along KG links to encode long-range connectivities into hidden representations. However, most of them only model the user or item representation independently, lacking a focus on user-item interaction. To this end, we propose the Interactive Knowledge Graph Attention Network (IKGAT), which directly models user-item interaction and high-order structure information within KG. For the user representation, following an interactive attention mechanism, we use the item to attend over the user's neighbors and then propagate their information to update the representation. Such a process is extended to multi-hops away to obtain richer neighborhood information. Similarly, the item representation is updated under the supervision of the user. With that design, IKGAT can capture collaborative signals and user preferences effectively. Experiment results on three public datasets show that IKGAT consistently outperforms the state-of-the-art approaches, especially when the dataset is sparse.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"146 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133447280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Batch Mode Active Learning for Individual Treatment Effect Estimation 批处理模式主动学习的个体治疗效果估计
2020 International Conference on Data Mining Workshops (ICDMW) Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00123
Zoltán Puha, M. Kaptein, A. Lemmens
{"title":"Batch Mode Active Learning for Individual Treatment Effect Estimation","authors":"Zoltán Puha, M. Kaptein, A. Lemmens","doi":"10.1109/ICDMW51313.2020.00123","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00123","url":null,"abstract":"Field experimentation has become a well-established practice to estimate individual treatment effects. In recent years, the Active Learning (AL) literature has developed methods to optimize the design of field experiments and reduce their cost. In this paper, we propose a novel AL algorithm for individual treatment effect estimation that works in batch mode for cases where the outcomes of an intervention are not immediate. It uniquely combines Expected Model Change Maximization and Bayesian Additive Regression Trees. Our approach (B-EMCMITE) uses the predictive uncertainty around the individual treatment effects to actively sample new units for experimentation and decide which treatment they will receive. We perform extensive simulations and test our approach on semi-synthetic, real-life data. B-EMCMITE outperforms alternative approaches and substantially reduces the number of observations needed to estimate individual treatment effects compared to A/B tests.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122132433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Explainable Anomaly Detection for District Heating Based on Shapley Additive Explanations 基于Shapley加性解释的区域供热可解释异常检测
2020 International Conference on Data Mining Workshops (ICDMW) Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00111
Sungwoo Park, Jihoon Moon, Eenjun Hwang
{"title":"Explainable Anomaly Detection for District Heating Based on Shapley Additive Explanations","authors":"Sungwoo Park, Jihoon Moon, Eenjun Hwang","doi":"10.1109/ICDMW51313.2020.00111","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00111","url":null,"abstract":"One key component in the heat-using facility of district heating systems is the differential pressure control valve. This valve ensures a stable flow of water to the heat exchanger and the temperature control valve. It also makes a stable pressure difference between the supply and return lines. Hence, its malfunctioning could cause significant heat losses and, consequently, economic losses. To avoid this, it is necessary to monitor the abnormal operation of the valve in real-time. Despite various machine learning-based anomaly detection models, their decision is limited in practical use unless the rationale for the decision is appropriately explained. In this paper, we propose a Shapley additive explanation-based explainable anomaly detection scheme that can present the degree of contribution of input variables to the derived result. We report some of the experimental results.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124645465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Nonlinear Tensor Completion Using Domain Knowledge: An Application in Analysts' Earnings Forecast 基于领域知识的非线性张量补全:在分析师收益预测中的应用
2020 International Conference on Data Mining Workshops (ICDMW) Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00059
Ajim Uddin, Xinyuan Tao, Chia-Ching Chou, Dantong Yu
{"title":"Nonlinear Tensor Completion Using Domain Knowledge: An Application in Analysts' Earnings Forecast","authors":"Ajim Uddin, Xinyuan Tao, Chia-Ching Chou, Dantong Yu","doi":"10.1109/ICDMW51313.2020.00059","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00059","url":null,"abstract":"Financial analysts' earnings forecast is one of the most critical inputs for security valuation and investment decisions. However, it is challenging to utilize such information for two main reasons: missing values and heterogeneity among analysts. In this paper, we show that one recent breakthrough in nonlinear tensor completion algorithm, CoSTCo [1], overcomes the difficulty by imputing missing values and significantly improves the forecast accuracy in earnings. Compared with conventional imputation approaches, CoSTCo effectively captures latent information and reduces the tensor completion errors by 50%, even with 98% missing values. Furthermore, we show that using firm characteristics as auxiliary information we can improve firms' earnings prediction accuracy by 6%. Results are consistent using different performance metrics and across various industry sectors. Notably, the performance improvement is more salient for the sectors with high heterogeneity. Our findings imply the successful application of advanced ML techniques in a real financial problem.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"16 9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130164361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
One Belt, One Road, One Sentiment? A Hybrid Approach to Gauging Public Opinions on the New Silk Road Initiative 一带一路,一种情怀?新丝绸之路倡议民意调查的混合方法
2020 International Conference on Data Mining Workshops (ICDMW) Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00011
Jonathan Kevin Chandra, E. Cambria
{"title":"One Belt, One Road, One Sentiment? A Hybrid Approach to Gauging Public Opinions on the New Silk Road Initiative","authors":"Jonathan Kevin Chandra, E. Cambria","doi":"10.1109/ICDMW51313.2020.00011","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00011","url":null,"abstract":"With the rapid adoption of the Internet, fast-moving social media platforms have been able to extract and encapsulate real-time public sentiments on different entities. Real-time sentiment analysis on current dynamic events such as elections, global affairs and sports are essential in the understanding the public's reaction to the states and trajectories of these events. In this paper, we aim to extract the sentiments of the Belt and Road Initiative from Twitter. Using aspect-based sentiment analysis, we were able to obtain the tweet's sentiment polarity on the related aspect category to better understand the topics that were discussed. We have developed an end-to-end sentiment analysis system that collects relevant data from Twitter, processes it and visualizes it on an intuitive display. We employed a hybrid approach of symbolic and sub-symbolic techniques using gated convolutional networks, aspect embeddings and the SenticNet framework to solve the subtasks of aspect category detection and aspect category polarity. A confidence score threshold was used to decide on the results provided by the models from the differing approaches.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121144839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信