2020 International Conference on Data Mining Workshops (ICDMW)最新文献_第3页

Deal Closure Prediction based on User's Browsing Behaviour of Sales Content 基于用户销售内容浏览行为的交易成交预测

2020 International Conference on Data Mining Workshops (ICDMW) Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00021

Diana Nurbakova, Timothée Saumet

引用次数: 1

Individualized Context-Aware Tensor Factorization for Online Games Predictions 个性化上下文感知张量分解在线游戏预测

2020 International Conference on Data Mining Workshops (ICDMW) Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00048

Julie Jiang, Kristina Lerman, Emilio Ferrara

引用次数: 1

Uncertain Time Series Classification with Shapelet Transform 基于Shapelet变换的不确定时间序列分类

2020 International Conference on Data Mining Workshops (ICDMW) Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00044

Michael Franklin Mbouopda, E. Nguifo

引用次数: 2

Precipitation Nowcasting Using Grid-based Data in South Korea Region 基于网格数据的韩国地区降水临近预报

2020 International Conference on Data Mining Workshops (ICDMW) Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00099

Changhwan Kim, Seyoung Yun

{"title":"Precipitation Nowcasting Using Grid-based Data in South Korea Region","authors":"Changhwan Kim, Seyoung Yun","doi":"10.1109/ICDMW51313.2020.00099","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00099","url":null,"abstract":"Recently, precipitation nowcasting has gained significant attention. For instance, the demand for precise precipitation nowcasting is significantly increasing in South Korea since the economic damage has been severe in recent days because of frequent and unexpected heavy rainfall. In this paper, we propose a U-Net based deep learning model that predicts from a numerical model and then corrects the data using the U-Net based deep learning model so that it can improve the accuracy of the final prediction. We use two data sets: reanalysis data and LDAPS(Local Data Assimilation and Prediction System) prediction data. Both data sets are grid-based data that covers the whole South Korea region. We first experiment with reanalysis data to identify that our U-Net model can find atmospheric dynamics patterns, even if it is not image data. Next, we use LDAPS prediction data and apply it to the U-Net model. Because LDAPS prediction data is also a prediction, we essentially conduct correcting task for this data. To this aim, a learnable layer is added at the front of the U-Net model and concatenated with the input batch to learn location-specific information. The experiment shows that the U-Net based model can find patterns using reanalysis data. Further, it has the potential to improve the accuracy of LDAPS prediction data. We also find that the learnable layer enhances test accuracy.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130386452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Human-in-the-loop Language-agnostic Extraction of Medication Data from Highly Unstructured Electronic Health Records 从高度非结构化的电子健康记录中提取药物数据的人在循环中的语言不可知

2020 International Conference on Data Mining Workshops (ICDMW) Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00091

Frank Ruis, Shreyasi Pathak, Jeroen Geerdink, J. H. Hegeman, C. Seifert, M. V. Keulen

{"title":"Human-in-the-loop Language-agnostic Extraction of Medication Data from Highly Unstructured Electronic Health Records","authors":"Frank Ruis, Shreyasi Pathak, Jeroen Geerdink, J. H. Hegeman, C. Seifert, M. V. Keulen","doi":"10.1109/ICDMW51313.2020.00091","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00091","url":null,"abstract":"Electronic health records contain important information written in free-form text. They are often highly unstructured and ungrammatical and contain misspellings and abbreviations, making it difficult to apply traditional natural language processing techniques. Annotated data is hard to come by due to restricted access, and supervised models often don't generalize well to other datasets. We propose a language-agnostic human-in-the-loop approach for extracting medication names from a large set of highly unstructured electronic health records, where we reach almost 97% recall on our test set after the second iteration while maintaining 100% precision. Starting with a bootstrap lexicon we perform a context based dictionary expansion curated by a human reviewer. The method can handle ambiguous lexicon entries and efficiently find fuzzy matches without producing false positives. The human review step ensures a high precision, which is especially important in healthcare, and is not subject to disagreements with annotations from an external source. The code is available online 11https://github.com/FrankRuis/medical_concept_extraction.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130103608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Synthetic Data by Principal Component Analysis 主成分分析合成数据

2020 International Conference on Data Mining Workshops (ICDMW) Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00023

Natsuki Sano

引用次数: 1

A Recommender Algorithm: Gradient Recurrent Neural Network Applied to Yang-Baxter-Like Equation 一种推荐算法:应用于类yang - baxter方程的梯度递归神经网络

2020 International Conference on Data Mining Workshops (ICDMW) Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00031

Ying Liufu, Long Jin, Mei Liu, Shuai Li

引用次数: 0

Graph-based Topic Extraction Using Centroid Distance of Phrase Embeddings on Healthy Aging Open-ended Survey Questions 基于短语嵌入质心距离的健康老龄化开放式调查问题图主题提取

2020 International Conference on Data Mining Workshops (ICDMW) Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00088

D. Kosmajac, Kirstie Smith, Vlado Keselj, S. Kirkland

引用次数: 0

Linking Personally Identifiable Information from the Dark Web to the Surface Web: A Deep Entity Resolution Approach 链接个人身份信息从暗网到表面网:一种深度实体解析方法

2020 International Conference on Data Mining Workshops (ICDMW) Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00072

Fangyu Lin, Yizhi Liu, Mohammadreza Ebrahimi, Zara Ahmad-Post, J. Hu, Jingyu Xin, S. Samtani, Weifeng Li, Hsinchun Chen

{"title":"Linking Personally Identifiable Information from the Dark Web to the Surface Web: A Deep Entity Resolution Approach","authors":"Fangyu Lin, Yizhi Liu, Mohammadreza Ebrahimi, Zara Ahmad-Post, J. Hu, Jingyu Xin, S. Samtani, Weifeng Li, Hsinchun Chen","doi":"10.1109/ICDMW51313.2020.00072","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00072","url":null,"abstract":"The information privacy of the Internet users has become a major societal concern. The rapid growth of online services increases the risk of unauthorized access to Personally Identifiable Information (PII) of at-risk populations, who are unaware of their PII exposure. To proactively identify online at-risk populations and increase their privacy awareness, it is crucial to conduct a holistic privacy risk assessment across the internet. Current privacy risk assessment studies are limited to a single platform within either the surface web or the dark web. A comprehensive privacy risk assessment requires matching exposed PII on heterogeneous online platforms across the surface web and the dark web. However, due to the incompleteness and inaccuracy of PII records in each platform, linking the exposed PII to users is a non-trivial task. While Entity Resolution (ER) techniques can be used to facilitate this task, they often require ad-hoc, manual rule development and feature engineering. Recently, Deep Learning (DL)-based ER has outperformed manual entity matching rules by automatically extracting prominent features from incomplete or inaccurate records. In this study, we enhance the existing privacy risk assessment with a DL-based ER method, namely Multi-Context Attention (MCA), to comprehensively evaluate individuals' PII exposure across the different online platforms in the dark web and surface web. Evaluation against benchmark ER models indicates the efficacy of MCA. Using MCA on a random sample of data breach victims in the dark web, we are able to identify 4.3% of the victims on the surface web platforms and calculate their privacy risk scores.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116406384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Partially Shared Semi-supervised Deep Matrix Factorization with Multi-view Data 多视图数据的部分共享半监督深度矩阵分解

2020 International Conference on Data Mining Workshops (ICDMW) Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00081

Haonan Huang, Naiyao Liang, Wei Yan, Zuyuan Yang, Weijun Sun

{"title":"Partially Shared Semi-supervised Deep Matrix Factorization with Multi-view Data","authors":"Haonan Huang, Naiyao Liang, Wei Yan, Zuyuan Yang, Weijun Sun","doi":"10.1109/ICDMW51313.2020.00081","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00081","url":null,"abstract":"Since many real-world data can be described from multiple views, multi-view learning has attracted considerable attention. Various methods have been proposed and successfully applied to multi-view learning, typically based on matrix factorization models. Recently, it is extended to the deep structure to exploit the hierarchical information of multi-view data, but the view-specific features and the label information are seldom considered. To address these concerns, we present a partially shared semi-supervised deep matrix factorization model (PSDMF). By integrating the partially shared deep decomposition structure, graph regularization and the semi-supervised regression model, PSDMF can learn a compact and discriminative representation through eliminating the effects of uncorrelated information. In addition, we develop an efficient iterative updating algorithm for PSDMF. Extensive experiments on five benchmark datasets demonstrate that PSDMF can achieve better performance than the state-of-the-art multi-view learning approaches. The MATLAB source code is available at https://github.com/libertyhhn/PartiallySharedDMF.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115365130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4