2020 International Conference on Data Mining Workshops (ICDMW)最新文献

筛选
英文 中文
Deal Closure Prediction based on User's Browsing Behaviour of Sales Content 基于用户销售内容浏览行为的交易成交预测
2020 International Conference on Data Mining Workshops (ICDMW) Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00021
Diana Nurbakova, Timothée Saumet
{"title":"Deal Closure Prediction based on User's Browsing Behaviour of Sales Content","authors":"Diana Nurbakova, Timothée Saumet","doi":"10.1109/ICDMW51313.2020.00021","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00021","url":null,"abstract":"We present PrediTilk, a data-driven win prediction service based on user's browsing of sales content, such as quotes, competitor comparisons or product sheets. It makes part of our GDPR-compliant system of electronic document tracking designed for marketing and sales, and addresses win prediction problem (also known as deal closure prediction). The latter consists in estimating the probability of a given opportunity to close, becoming a customer. Given the information about user's consultation of documents issued from our tracking system, our service predicts win probability of this opportunity using machine learning models. Our evaluation shows that PrediTilk provides accurate predictions, while being purely based on automatically collected data about user's browsing behaviour. Besides, it can provide objective signals to a CRM system, where most of the prospects data are entered manually. The combination of such sources can become a highly valuable asset for win prediction.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115363613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Human-in-the-loop Language-agnostic Extraction of Medication Data from Highly Unstructured Electronic Health Records 从高度非结构化的电子健康记录中提取药物数据的人在循环中的语言不可知
2020 International Conference on Data Mining Workshops (ICDMW) Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00091
Frank Ruis, Shreyasi Pathak, Jeroen Geerdink, J. H. Hegeman, C. Seifert, M. V. Keulen
{"title":"Human-in-the-loop Language-agnostic Extraction of Medication Data from Highly Unstructured Electronic Health Records","authors":"Frank Ruis, Shreyasi Pathak, Jeroen Geerdink, J. H. Hegeman, C. Seifert, M. V. Keulen","doi":"10.1109/ICDMW51313.2020.00091","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00091","url":null,"abstract":"Electronic health records contain important information written in free-form text. They are often highly unstructured and ungrammatical and contain misspellings and abbreviations, making it difficult to apply traditional natural language processing techniques. Annotated data is hard to come by due to restricted access, and supervised models often don't generalize well to other datasets. We propose a language-agnostic human-in-the-loop approach for extracting medication names from a large set of highly unstructured electronic health records, where we reach almost 97% recall on our test set after the second iteration while maintaining 100% precision. Starting with a bootstrap lexicon we perform a context based dictionary expansion curated by a human reviewer. The method can handle ambiguous lexicon entries and efficiently find fuzzy matches without producing false positives. The human review step ensures a high precision, which is especially important in healthcare, and is not subject to disagreements with annotations from an external source. The code is available online 11https://github.com/FrankRuis/medical_concept_extraction.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130103608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Individualized Context-Aware Tensor Factorization for Online Games Predictions 个性化上下文感知张量分解在线游戏预测
2020 International Conference on Data Mining Workshops (ICDMW) Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00048
Julie Jiang, Kristina Lerman, Emilio Ferrara
{"title":"Individualized Context-Aware Tensor Factorization for Online Games Predictions","authors":"Julie Jiang, Kristina Lerman, Emilio Ferrara","doi":"10.1109/ICDMW51313.2020.00048","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00048","url":null,"abstract":"Individual behavior and decisions are substantially influenced by their contexts, such as location, environment, and time. Changes along these dimensions can be readily observed in Multiplayer Online Battle Arena games (MOBA), where players face different in-game settings for each match and are subject to frequent game patches. Existing methods utilizing contextual information generalize the effect of a context over the entire population, but contextual information tailored to each individual can be more effective. To achieve this, we present the Neural Individualized Context-aware Embeddings (NICE) model for predicting user performance and game outcomes. Our proposed method identifies individual behavioral differences in different contexts by learning latent representations of users and contexts through non-negative tensor factorization. Using a dataset from the MOBA game League of Legends, we demonstrate that our model substantially improves the prediction of winning outcome, individual user performance, and user engagement.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122400386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Recommender Algorithm: Gradient Recurrent Neural Network Applied to Yang-Baxter-Like Equation 一种推荐算法:应用于类yang - baxter方程的梯度递归神经网络
2020 International Conference on Data Mining Workshops (ICDMW) Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00031
Ying Liufu, Long Jin, Mei Liu, Shuai Li
{"title":"A Recommender Algorithm: Gradient Recurrent Neural Network Applied to Yang-Baxter-Like Equation","authors":"Ying Liufu, Long Jin, Mei Liu, Shuai Li","doi":"10.1109/ICDMW51313.2020.00031","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00031","url":null,"abstract":"In this article, a traditional recommender algorithm termed gradient recurrent neural network (GRNN) model is introduced. Allowing for numerous practical problems such as the problems related to recommender systems or multi-agent systems that can be turned into matrix equation problems to resolve, the GRNN model becomes a more critical and promising role. The GRNN model, designed with the assistance of a square-norm-based energy function, is quite applicable to a recommender system and substantiated to be high-efficient in solving convex optimization linear or nonlinear problems. Simultaneously, implementing elaborately a theoretical analysis and numerical experiment computational simulation, the inherent exponential and stable convergence of the GRNN model is validated. With the aid of it, a theoretical nontrivial solution of the Yang-Baxter-like matrix equation $XAX=AXA$ can be obtained successfully.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127802264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Precipitation Nowcasting Using Grid-based Data in South Korea Region 基于网格数据的韩国地区降水临近预报
2020 International Conference on Data Mining Workshops (ICDMW) Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00099
Changhwan Kim, Seyoung Yun
{"title":"Precipitation Nowcasting Using Grid-based Data in South Korea Region","authors":"Changhwan Kim, Seyoung Yun","doi":"10.1109/ICDMW51313.2020.00099","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00099","url":null,"abstract":"Recently, precipitation nowcasting has gained significant attention. For instance, the demand for precise precipitation nowcasting is significantly increasing in South Korea since the economic damage has been severe in recent days because of frequent and unexpected heavy rainfall. In this paper, we propose a U-Net based deep learning model that predicts from a numerical model and then corrects the data using the U-Net based deep learning model so that it can improve the accuracy of the final prediction. We use two data sets: reanalysis data and LDAPS(Local Data Assimilation and Prediction System) prediction data. Both data sets are grid-based data that covers the whole South Korea region. We first experiment with reanalysis data to identify that our U-Net model can find atmospheric dynamics patterns, even if it is not image data. Next, we use LDAPS prediction data and apply it to the U-Net model. Because LDAPS prediction data is also a prediction, we essentially conduct correcting task for this data. To this aim, a learnable layer is added at the front of the U-Net model and concatenated with the input batch to learn location-specific information. The experiment shows that the U-Net based model can find patterns using reanalysis data. Further, it has the potential to improve the accuracy of LDAPS prediction data. We also find that the learnable layer enhances test accuracy.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130386452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Multi-class imbalanced semi-supervised learning from streams through online ensembles 从流到在线合奏的多类不平衡半监督学习
2020 International Conference on Data Mining Workshops (ICDMW) Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00124
P. Vafaie, H. Viktor, W. Michalowski
{"title":"Multi-class imbalanced semi-supervised learning from streams through online ensembles","authors":"P. Vafaie, H. Viktor, W. Michalowski","doi":"10.1109/ICDMW51313.2020.00124","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00124","url":null,"abstract":"Multi-class imbalance, in which the rates of instances in the various classes differ substantially, poses a major challenge when learning from evolving streams. In this setting, minority class instances may arrive infrequently and in bursts, making accurate model construction problematic. Further, skewed streams are not only susceptible to concept drifts, but class labels may also be absent, expensive to obtain, or only arrive after some delay. The combined effects of multi-class skew, concept drift and semi-supervised learning have received limited attention in the online learning community. In this paper, we introduce a multi-class online ensemble algorithm that is suitable for learning in such settings. Specifically, our algorithm uses sampling with replacement while dynamically increasing the weights of underrepresented classes based on recall in order to produce models that benefit all classes. Our approach addresses the potential lack of labels by incorporating a self-training semi-supervised learning method for labeling instances. Our experimental results show that our online ensemble performs well against multi-class imbalanced data containing concept drifts. In addition, our algorithm produces accurate predictions, even in the presence of unlabeled data.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131841475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Uncertain Time Series Classification with Shapelet Transform 基于Shapelet变换的不确定时间序列分类
2020 International Conference on Data Mining Workshops (ICDMW) Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00044
Michael Franklin Mbouopda, E. Nguifo
{"title":"Uncertain Time Series Classification with Shapelet Transform","authors":"Michael Franklin Mbouopda, E. Nguifo","doi":"10.1109/ICDMW51313.2020.00044","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00044","url":null,"abstract":"Time series classification is a task that aims at classifying chronological data. It is used in a diverse range of domains such as meteorology, medicine and physics. In the last decade, many algorithms have been built to perform this task with very appreciable accuracy. However, applications where time series have uncertainty has been under-explored. Using uncertainty propagation techniques, we propose a new uncertain dissimilarity measure based on Euclidean distance. We then propose the uncertain shapelet transform algorithm for the classification of uncertain time series. The large experiments we conducted on state of the art datasets show the effectiveness of our contribution. The source code of our contribution and the datasets we used are all available on a public repository.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128985651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Graph-based Topic Extraction Using Centroid Distance of Phrase Embeddings on Healthy Aging Open-ended Survey Questions 基于短语嵌入质心距离的健康老龄化开放式调查问题图主题提取
2020 International Conference on Data Mining Workshops (ICDMW) Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00088
D. Kosmajac, Kirstie Smith, Vlado Keselj, S. Kirkland
{"title":"Graph-based Topic Extraction Using Centroid Distance of Phrase Embeddings on Healthy Aging Open-ended Survey Questions","authors":"D. Kosmajac, Kirstie Smith, Vlado Keselj, S. Kirkland","doi":"10.1109/ICDMW51313.2020.00088","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00088","url":null,"abstract":"Open-ended questions are a very important part of research surveys. However, they can pose a challenge when it comes to processing since manual processing requires a labour-intensive human effort. Automation of the task requires application of NLP methods since free text does not ensure standardized structure. To tackle this problem, we present a solution for topic discovery and analysis of open-ended survey items. We use graph-based representation of the text that adds structure and enables easier manipulation and keyphrase retrieval. Additionally, we use pre-trained fastText aligned word vectors to cluster similar phrases even if they are written in different languages. The goal is to produce topic word and phrase representatives that are easy to interpret by a domain expert. We compare the method with traditional LDA and two state-of-the-art algorithms: BTM and WNTM. The resulting keyphrases representing topics are more intuitive to the domain experts than the ones obtained by reference topic models in similar experimental settings.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122394878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Linking Personally Identifiable Information from the Dark Web to the Surface Web: A Deep Entity Resolution Approach 链接个人身份信息从暗网到表面网:一种深度实体解析方法
2020 International Conference on Data Mining Workshops (ICDMW) Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00072
Fangyu Lin, Yizhi Liu, Mohammadreza Ebrahimi, Zara Ahmad-Post, J. Hu, Jingyu Xin, S. Samtani, Weifeng Li, Hsinchun Chen
{"title":"Linking Personally Identifiable Information from the Dark Web to the Surface Web: A Deep Entity Resolution Approach","authors":"Fangyu Lin, Yizhi Liu, Mohammadreza Ebrahimi, Zara Ahmad-Post, J. Hu, Jingyu Xin, S. Samtani, Weifeng Li, Hsinchun Chen","doi":"10.1109/ICDMW51313.2020.00072","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00072","url":null,"abstract":"The information privacy of the Internet users has become a major societal concern. The rapid growth of online services increases the risk of unauthorized access to Personally Identifiable Information (PII) of at-risk populations, who are unaware of their PII exposure. To proactively identify online at-risk populations and increase their privacy awareness, it is crucial to conduct a holistic privacy risk assessment across the internet. Current privacy risk assessment studies are limited to a single platform within either the surface web or the dark web. A comprehensive privacy risk assessment requires matching exposed PII on heterogeneous online platforms across the surface web and the dark web. However, due to the incompleteness and inaccuracy of PII records in each platform, linking the exposed PII to users is a non-trivial task. While Entity Resolution (ER) techniques can be used to facilitate this task, they often require ad-hoc, manual rule development and feature engineering. Recently, Deep Learning (DL)-based ER has outperformed manual entity matching rules by automatically extracting prominent features from incomplete or inaccurate records. In this study, we enhance the existing privacy risk assessment with a DL-based ER method, namely Multi-Context Attention (MCA), to comprehensively evaluate individuals' PII exposure across the different online platforms in the dark web and surface web. Evaluation against benchmark ER models indicates the efficacy of MCA. Using MCA on a random sample of data breach victims in the dark web, we are able to identify 4.3% of the victims on the surface web platforms and calculate their privacy risk scores.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116406384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Partially Shared Semi-supervised Deep Matrix Factorization with Multi-view Data 多视图数据的部分共享半监督深度矩阵分解
2020 International Conference on Data Mining Workshops (ICDMW) Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00081
Haonan Huang, Naiyao Liang, Wei Yan, Zuyuan Yang, Weijun Sun
{"title":"Partially Shared Semi-supervised Deep Matrix Factorization with Multi-view Data","authors":"Haonan Huang, Naiyao Liang, Wei Yan, Zuyuan Yang, Weijun Sun","doi":"10.1109/ICDMW51313.2020.00081","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00081","url":null,"abstract":"Since many real-world data can be described from multiple views, multi-view learning has attracted considerable attention. Various methods have been proposed and successfully applied to multi-view learning, typically based on matrix factorization models. Recently, it is extended to the deep structure to exploit the hierarchical information of multi-view data, but the view-specific features and the label information are seldom considered. To address these concerns, we present a partially shared semi-supervised deep matrix factorization model (PSDMF). By integrating the partially shared deep decomposition structure, graph regularization and the semi-supervised regression model, PSDMF can learn a compact and discriminative representation through eliminating the effects of uncorrelated information. In addition, we develop an efficient iterative updating algorithm for PSDMF. Extensive experiments on five benchmark datasets demonstrate that PSDMF can achieve better performance than the state-of-the-art multi-view learning approaches. The MATLAB source code is available at https://github.com/libertyhhn/PartiallySharedDMF.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115365130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信