2018 IEEE International Conference on Data Mining (ICDM)最新文献_第5页

Clustering on Sparse Data in Non-overlapping Feature Space with Applications to Cancer Subtyping 非重叠特征空间稀疏数据聚类及其在癌症亚型分型中的应用

2018 IEEE International Conference on Data Mining (ICDM) Pub Date : 2018-11-01 DOI: 10.1109/ICDM.2018.00138

Tianyu Kang, Kourosh Zarringhalam, M. Kuijjer, Ping Chen, John Quackenbush, W. Ding

{"title":"Clustering on Sparse Data in Non-overlapping Feature Space with Applications to Cancer Subtyping","authors":"Tianyu Kang, Kourosh Zarringhalam, M. Kuijjer, Ping Chen, John Quackenbush, W. Ding","doi":"10.1109/ICDM.2018.00138","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00138","url":null,"abstract":"This paper presents a new algorithm, Reinforced and Informed Network-based Clustering(RINC), for finding unknown groups of similar data objects in sparse and largely non-overlapping feature space where a network structure among features can be observed. Sparse and non-overlapping unlabeled data become increasingly common and available especially in text mining and biomedical data mining. RINC inserts a domain informed model into a modelless neural network. In particular, our approach integrates physically meaningful feature dependencies into the neural network architecture and soft computational constraint. Our learning algorithm efficiently clusters sparse data through integrated smoothing and sparse auto-encoder learning. The informed design requires fewer samples for training and at least part of the model becomes explainable. The architecture of the reinforced network layers smooths sparse data over the network dependency in the feature space. Most importantly, through back-propagation, the weights of the reinforced smoothing layers are simultaneously constrained by the remaining sparse auto-encoder layers that set the target values to be equal to the raw inputs. Empirical results demonstrate that RINC achieves improved accuracy and renders physically meaningful clustering results.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132650420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Interpretable Word Embeddings for Medical Domain 医学领域的可解释词嵌入

2018 IEEE International Conference on Data Mining (ICDM) Pub Date : 2018-11-01 DOI: 10.1109/ICDM.2018.00135

Kishlay Jha, Yaqing Wang, Guangxu Xun, Aidong Zhang

{"title":"Interpretable Word Embeddings for Medical Domain","authors":"Kishlay Jha, Yaqing Wang, Guangxu Xun, Aidong Zhang","doi":"10.1109/ICDM.2018.00135","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00135","url":null,"abstract":"Word embeddings are finding their increasing application in a variety of biomedical Natural Language Processing (bioNLP) tasks, ranging from drug discovery to automated disease diagnosis. While these word embeddings in their entirety have shown meaningful syntactic and semantic regularities, however, the meaning of individual dimensions remains elusive. This becomes problematic both in general and particularly in sensitive domains such as bio-medicine, wherein, the interpretability of results is crucial to its widespread adoption. To address this issue, in this study, we aim to improve the interpretability of pre-trained word embeddings generated from a text corpora, and in doing so provide a systematic approach to formalize the problem. More specifically, we exploit the rich categorical knowledge present in the biomedical domain, and propose to learn a transformation matrix that transforms the input embeddings to a new space where they are both interpretable and retain their original expressive features. Experiments conducted on the largest available biomedical corpus suggests that the model is capable of performing interpretability that resembles closely to the human-level intuition.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122244329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 22

Robust Distributed Anomaly Detection Using Optimal Weighted One-Class Random Forests 基于最优加权一类随机森林的鲁棒分布式异常检测

2018 IEEE International Conference on Data Mining (ICDM) Pub Date : 2018-11-01 DOI: 10.1109/ICDM.2018.00171

Yu-Lin Tsou, Hong-Min Chu, Cong Li, Shao-Wen Yang

引用次数: 10

Deep Headline Generation for Clickbait Detection 深度标题生成点击党检测

2018 IEEE International Conference on Data Mining (ICDM) Pub Date : 2018-11-01 DOI: 10.1109/ICDM.2018.00062

Kai Shu, Suhang Wang, Thai Le, Dongwon Lee, Huan Liu

{"title":"Deep Headline Generation for Clickbait Detection","authors":"Kai Shu, Suhang Wang, Thai Le, Dongwon Lee, Huan Liu","doi":"10.1109/ICDM.2018.00062","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00062","url":null,"abstract":"Clickbaits are catchy social posts or sensational headlines that attempt to lure readers to click. Clickbaits are pervasive on social media and can have significant negative impacts on both users and media ecosystems. For example, users may be misled to receive inaccurate information or fall into click-jacking attacks. Similarly, media platforms could lose readers' trust and revenues due to the prevalence of clickbaits. To computationally detect such clickbaits on social media using a supervised learning framework, one of the major obstacles is the lack of large-scale labeled training data, due to the high cost of labeling. With the recent advancements of deep generative models, to address this challenge, we propose to generate synthetic headlines with specific styles and explore their utilities to help improve clickbait detection. In particular, we propose to generate stylized headlines from original documents with style transfer. Furthermore, as it is non-trivial to generate stylized headlines due to several challenges such as the discrete nature of texts and the requirements of preserving semantic meaning of document while achieving style transfer, we propose a novel solution, named as Stylized Headline Generation (SHG), that can not only generate readable and realistic headlines to enlarge original training data, but also help improve the classification capacity of supervised learning. The experimental results on real-world datasets demonstrate the effectiveness of SHG in generating high-quality and high-utility headlines for clickbait detection.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116533765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 49

Spatial Contextualization for Closed Itemset Mining 封闭项集挖掘的空间语境化

2018 IEEE International Conference on Data Mining (ICDM) Pub Date : 2018-11-01 DOI: 10.1109/ICDM.2018.00155

Altobelli B. Mantuan, L. Fernandes

引用次数: 1

Distribution Preserving Multi-task Regression for Spatio-Temporal Data 时空数据的保分布多任务回归

2018 IEEE International Conference on Data Mining (ICDM) Pub Date : 2018-11-01 DOI: 10.1109/ICDM.2018.00148

Xi Liu, P. Tan, Zubin Abraham, L. Luo, P. Hatami

引用次数: 4

Volatility Drift Prediction for Transactional Data Streams 交易数据流的波动漂移预测

2018 IEEE International Conference on Data Mining (ICDM) Pub Date : 2018-11-01 DOI: 10.1109/ICDM.2018.00140

Yun Sing Koh, David Tse Jung Huang, C. Pearce, G. Dobbie

{"title":"Volatility Drift Prediction for Transactional Data Streams","authors":"Yun Sing Koh, David Tse Jung Huang, C. Pearce, G. Dobbie","doi":"10.1109/ICDM.2018.00140","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00140","url":null,"abstract":"The reasons for concept drift in a data stream can vary widely, from deterioration of a machine to a change in peoples' buying patterns. In order to effectively detect concept drifts, most predictive stream mining systems contain a drift detector that monitors and signals concept drifts. However, few of these systems are designed to find drifts in transactional datasets, which have unlabelled data. Transactional datasets describe events, such as orders or payments, which are traditionally analysed using association rules. In this paper, we propose a novel drift detection technique, ProChange, that has two parts. The first part is a drift detector, VR-Change, that finds both real and virtual drifts in unlabelled transactional data streams using the Hellinger distance. The second part is a drift predictor, which models the volatility of drifts using a probabilistic network to predict the location of future drifts. Using the predictor, we can dynamically adapt the confidence threshold, enabling VR-Change to be more sensitive around potential future drift points. We evaluated the performance of ProChange by comparing it against traditional detectors showing that it detects both real and virtual drifts effectively and efficiently in terms of accuracy.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"111 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129049232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Outlier Detection in Urban Traffic Flow Distributions 城市交通流分布中的离群值检测

2018 IEEE International Conference on Data Mining (ICDM) Pub Date : 2018-11-01 DOI: 10.1109/ICDM.2018.00114

Y. Djenouri, A. Zimek, Marco Chiarandini

{"title":"Outlier Detection in Urban Traffic Flow Distributions","authors":"Y. Djenouri, A. Zimek, Marco Chiarandini","doi":"10.1109/ICDM.2018.00114","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00114","url":null,"abstract":"Urban traffic data consists of observations like number and speed of cars or other vehicles at certain locations as measured by deployed sensors. These numbers can be interpreted as traffic flow which in turn relates to the capacity of streets and the demand of the traffic system. City planners are interested in studying the impact of various conditions on the traffic flow, leading to unusual patterns, i.e., outliers. Existing approaches to outlier detection in urban traffic data take into account only individual flow values (i.e., an individual observation). This can be interesting for real time detection of sudden changes. Here, we face a different scenario: The city planners want to learn from historical data, how special circumstances (e.g., events or festivals) relate to unusual patterns in the traffic flow, in order to support improved planing of both, events and the layout of the traffic system. Therefore, we propose to consider the sequence of traffic flow values observed within some time interval. Such flow sequences can be modeled as probability distributions of flows. We adapt an established outlier detection method, the local outlier factor (LOF), to handling flow distributions rather than individual observations. We apply the outlier detection online to extend the database with new flow distributions that are considered inliers. For the validation we consider a special case of our framework for comparison with state-of-the-art outlier detection on flows. In addition, a real case study on urban traffic flow data showcases that our method finds meaningful outliers in the traffic flow data.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129531981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 26

GINA: Group Gender Identification Using Privacy-Sensitive Audio Data 使用隐私敏感音频数据进行群体性别识别

2018 IEEE International Conference on Data Mining (ICDM) Pub Date : 2018-11-01 DOI: 10.1109/ICDM.2018.00061

Jiaxing Shen, Oren Lederman, Jiannong Cao, Florian Berg, Shaojie Tang, A. Pentland

{"title":"GINA: Group Gender Identification Using Privacy-Sensitive Audio Data","authors":"Jiaxing Shen, Oren Lederman, Jiannong Cao, Florian Berg, Shaojie Tang, A. Pentland","doi":"10.1109/ICDM.2018.00061","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00061","url":null,"abstract":"Group gender is essential in understanding social interaction and group dynamics. With the increasing privacy concerns of studying face-to-face communication in natural settings, many participants are not open to raw audio recording. Existing voice-based gender identification methods rely on acoustic characteristics caused by physiological differences and phonetic differences. However, these methods might become ineffective with privacy-sensitive audio for two main reasons. First, compared to raw audio, privacy-sensitive audio contains significantly fewer acoustic features. Moreover, natural settings generate various uncertainties in the audio data. In this paper, we make the first attempt to identify group gender using privacy-sensitive audio. Instead of extracting acoustic features from privacy-sensitive audio, we focus on conversational features including turn-taking behaviors and interruption patterns. However, conversational behaviors are unstable in gender identification as human behaviors are affected by many factors like emotion and environment. We utilize ensemble feature selection and a two-stage classification to improve the effectiveness and robustness of our approach. Ensemble feature selection could reduce the risk of choosing an unstable subset of features by aggregating the outputs of multiple feature selectors. In the first stage, we infer the gender composition (mixed-gender or same-gender) of a group which is used as an additional input feature for identifying group gender in the second stage. The estimated gender composition significantly improves the performance as it could partially account for the dynamics in conversational behaviors. According to the experimental evaluation of 100 people in 273 meetings, the proposed method outperforms baseline approaches and achieves an F1-score of 0.77 using linear SVM.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130562966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

A Machine Reading Comprehension-Based Approach for Featured Snippet Extraction 基于机器阅读理解的特征片段提取方法

2018 IEEE International Conference on Data Mining (ICDM) Pub Date : 2018-11-01 DOI: 10.1109/ICDM.2018.00195

Chen Zhang, Xuanyu Zhang, Hao Wang

引用次数: 6