2017 IEEE International Conference on Data Mining Workshops (ICDMW)最新文献

筛选
英文 中文
IP2Vec: Learning Similarities Between IP Addresses IP2Vec:学习IP地址之间的相似性
2017 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2017-11-01 DOI: 10.1109/ICDMW.2017.93
Markus Ring, Alexander Dallmann, D. Landes, A. Hotho
{"title":"IP2Vec: Learning Similarities Between IP Addresses","authors":"Markus Ring, Alexander Dallmann, D. Landes, A. Hotho","doi":"10.1109/ICDMW.2017.93","DOIUrl":"https://doi.org/10.1109/ICDMW.2017.93","url":null,"abstract":"IP Addresses are a central part of packet- and flow-based network data. However, visualization and similarity computation of IP Addresses are challenging to due the missing natural order. This paper presents a novel similarity measure IP2Vec for IP Addresses that builds on ideas from Word2Vec, a popular approach in text mining. The key idea is to learn similarities by extracting available context information from network data. IP Addresses are similar if they appear in similar contexts. Thus, IP2Vec is automatically derived from the given network data set. The proposed approach is evaluated experimentally on two public flow-based data sets. In particular, we demonstrate the effectiveness of clustering IP Addresses within a botnet data set. In addition, we use visualization methods to analyse the learned similarities in more detail. These experiments indicate that IP2Vec is well suited to capture the similarity of IP Addresses based on their network communications.","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"474 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123880282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 50
Exploring Uncertainty Methods for Centrality Analysis in Social Networks 探索社会网络中中心性分析的不确定性方法
2017 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2017-11-01 DOI: 10.1109/ICDMW.2017.27
Xianglin Zuo, Bo Yang, Wanli Zuo
{"title":"Exploring Uncertainty Methods for Centrality Analysis in Social Networks","authors":"Xianglin Zuo, Bo Yang, Wanli Zuo","doi":"10.1109/ICDMW.2017.27","DOIUrl":"https://doi.org/10.1109/ICDMW.2017.27","url":null,"abstract":"Network centrality reflects node importance in networks, which is a challenging problem in social network analysis. Based on Fuzzy Set and MYCIN theory, this paper proposes a novel node centrality measuring method and models n-monkeys dataset, where n is 20. Initially, we created monkeys relationship graph and generated relationship matrix based on the monkeys' encountering times in a specific time period and location, and calculated degree and average distance for each individuals. Then, we performed fuzzy processing on degree, average distance, age and sex, and define authority in the different domains. At last, we modeled centrality using MYCIN combination for the social network node. On the standards dataset WOLFE PRIMATES with 20 monkeys, we evaluated our algorithm and compared it with original rankings in terms of precision, which reached 82.5% for fuzzy set based approach and 76.6% for MYCIN based approach, with 10.9% and 5% improvements over current best practices respectively, indicating that fuzzy set and MYCIN models is reasonable and effective in social network analysis.","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130101424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Semi-Supervised Prediction of Comorbid Rare Conditions Using Medical Claims Data 利用医疗索赔数据的半监督预测共病罕见病
2017 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2017-11-01 DOI: 10.1109/ICDMW.2017.68
Chirag Nagpal, K. Miller, Tiffany Pellathy, M. Hravnak, G. Clermont, M. Pinsky, A. Dubrawski
{"title":"Semi-Supervised Prediction of Comorbid Rare Conditions Using Medical Claims Data","authors":"Chirag Nagpal, K. Miller, Tiffany Pellathy, M. Hravnak, G. Clermont, M. Pinsky, A. Dubrawski","doi":"10.1109/ICDMW.2017.68","DOIUrl":"https://doi.org/10.1109/ICDMW.2017.68","url":null,"abstract":"Medical insurance claims data offer a coarse view of a patient's medical profile, including information about previous diagnoses and procedures performed. These data have been exploited in the past to predict presence of unmanifested conditions. Rarer conditions however, provide an extremely limited amount of ground truth to train supervised models, but predicting relevant co-morbidities can help reduce failure to rescue from a treatable, yet potentially life threatening condition. In this paper, we aim at a formidable task of improving models built to predict comorbidity of rare conditions that emerge during hospitalization and present PreCoRC, a novel approach that leverages hierarchical structures of diagnosis and procedure codes to alleviate the relatively low prevalence of specific types of Failure to Rescue (FTR) incidents. It can be applied post-hoc over previously learnt predictive models, and used to discover parts of the underlying hierarchies that contribute to the task. Our experimental results demonstrate that PreCoRC carries promise for operational utility in clinical settings, and offer insights into potential leading indicators of life threatening complications.","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127897245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Failure Prediction with Adaptive Multi-scale Sampling and Activation Pattern Regularization 基于自适应多尺度采样和激活模式正则化的故障预测
2017 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2017-11-01 DOI: 10.1109/ICDMW.2017.17
Yujin Tang, Shinya Wada, K. Yoshihara
{"title":"Failure Prediction with Adaptive Multi-scale Sampling and Activation Pattern Regularization","authors":"Yujin Tang, Shinya Wada, K. Yoshihara","doi":"10.1109/ICDMW.2017.17","DOIUrl":"https://doi.org/10.1109/ICDMW.2017.17","url":null,"abstract":"We treat failure prediction in a supervised learning framework using a convolutional neural network (CNN). Due to the nature of the problem, learning a CNN model on this kind of dataset is generally associated with three primary problems: 1) negative samples (indicating a healthy system) outnumber positives (indicating system failures) by a great margin; 2) implementation design often requires chopping an original time series into sub-sequences, defining a segmentation window size with sufficient data augmentation and avoiding serious multiple-instance learning issue is non-trivial; 3) positive samples may have a common underlying cause and thus present similar features, negative samples can have various latent characteristics which can \"distract\" CNN in the learning process. While the first problem has been extensively discussed in literatures, the last two issues are less explored in the context of deep learning using CNN. We mitigate the second problem by introducing a random variable on sample scaling parameters, whose distribution's parameters are jointly learnt with CNN and leads to what we call adaptive multi-scale sampling (AMS). To address the third problem, we propose activation pattern regularization (APR) on only positive samples such that the CNN focuses on learning representations pertaining to the underlying common cause. We demonstrate the effectiveness of our proposals on a past Kaggle contest dataset that predicts seizures from EEG data. Compared to the baseline method with a CNN trained in traditional scheme, we observe significant performance improvement for both proposed methods. When combined, our model without any sophisticated hyper-parameter tuning or ensemble methods shows a near 10% relative improvement on AUROC and is able to send us to the 14th place on the contest's leaderboard while the highest rank the baseline can reach is 77th.","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122524257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Semantic Visualization Support for Innovators Marketplace on Data Jackets 数据夹克上创新者市场的语义可视化支持
2017 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2017-11-01 DOI: 10.1109/ICDMW.2017.85
Qi Wang
{"title":"Semantic Visualization Support for Innovators Marketplace on Data Jackets","authors":"Qi Wang","doi":"10.1109/ICDMW.2017.85","DOIUrl":"https://doi.org/10.1109/ICDMW.2017.85","url":null,"abstract":"Following the trend of big data, the business value of data is becoming a hot research field in recent years. The novel concept of Data Jacket introduced by Ohsawa et al. solved the difficult problem of data transactions due to the particular characteristic of data, i.e. the safeguarding privacy. In order to make sure the mechanism of the market of data, there are some researchers proposed a gamified workshop to simulate the real data transactions terms Innovators Marketplace on Data Jackets. But the problem is that in the workshop, participants can hardly combine useful data jackets to consider valuable solutions. In order to motivate participants to propose reasonable solutions helping for data transactions, this paper proposes a new visualization method to cluster data jackets by semantic similarity applying word mover's distance (WMD) and multidimensional scaling (MDS), and to verify the hypothesis whether solutions from combining different domains of data jackets are more valuable. The result shows the feasibility of this visualization method which can help providing valuable solutions by questionnaire.","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"149 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115910924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dataset Selection for Controlling Swarms by Visual Demonstration 蜂群控制的可视化演示数据集选择
2017 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2017-11-01 DOI: 10.1109/ICDMW.2017.128
K. K. Budhraja, T. Oates
{"title":"Dataset Selection for Controlling Swarms by Visual Demonstration","authors":"K. K. Budhraja, T. Oates","doi":"10.1109/ICDMW.2017.128","DOIUrl":"https://doi.org/10.1109/ICDMW.2017.128","url":null,"abstract":"Agent-based modeling is a paradigm of modeling dynamic systems of interacting agents that are individually governed by specified behavioral rules. Training a model of such agents to produce an emergent behavior by specification of the emergent (as opposed to agent) behavior is easier from a demonstration perspective. Without the involvement of manual behavior specification via code or reliance on a defined taxonomy of possible behaviors, the demonstrator specifies spatial motion of the agents over time, and retrieves agent-level parameters required to execute that motion. A framework for reproducing emergent behavior, given an abstract demonstration, is discussed in existing work. Our work extends that framework by addressing the variation in reproduced behavior over several executions of the framework. The cause for such variation is identified to be the capacity of training data to represent the demonstration. Addressing this problem produces more favorable (more similar to the demonstration) replicated emergent behaviors. Our work is evaluated using demonstrations and visual features as in the aforementioned work. Experimental results show an improvement in the coherence between demonstrated behavior, and the corresponding replicated behavior produced by the framework.","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115290360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Distributed Representations of Subgraphs 子图的分布式表示
2017 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2017-11-01 DOI: 10.1109/ICDMW.2017.20
B. Adhikari, Yao Zhang, Naren Ramakrishnan, B. Prakash
{"title":"Distributed Representations of Subgraphs","authors":"B. Adhikari, Yao Zhang, Naren Ramakrishnan, B. Prakash","doi":"10.1109/ICDMW.2017.20","DOIUrl":"https://doi.org/10.1109/ICDMW.2017.20","url":null,"abstract":"There has been a surge in research interest in learning feature representation of networks in recent times. Researchers, motivated by the recent successes of embeddings in natural language processing and advances in deep learning, have explored various means for network embedding. Network embedding is useful as it can exploit off-the-shelf machine learning algorithms for network mining tasks like node classification and link prediction. However, most recent works focus on learning feature representation of nodes, which are ill-suited to tasks such as community detection which are intuitively dependent on subgraphs. In this work, we formulate a novel subgraph embedding problem based on an intuitive property of subgraphs and propose SubVec, an unsupervised scalable algorithm to learn feature representations of arbitrary subgraphs. We demonstrate usability of features learned by SubVec by leveraging them for community detection problem, where it significantly out performs non-trivial baselines. We also conduct case-studies in two distinct domains to demonstrate wide applicability of SubVec.","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132533850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Apollo: Near-Duplicate Detection for Job Ads in the Online Recruitment Domain 阿波罗:在线招聘领域招聘广告的近重复检测
2017 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2017-11-01 DOI: 10.1109/ICDMW.2017.29
Hunter Burk, F. Javed, Janani Balaji
{"title":"Apollo: Near-Duplicate Detection for Job Ads in the Online Recruitment Domain","authors":"Hunter Burk, F. Javed, Janani Balaji","doi":"10.1109/ICDMW.2017.29","DOIUrl":"https://doi.org/10.1109/ICDMW.2017.29","url":null,"abstract":"Job ad data has become an essential part of the recruiting world, helping recruiters to construct views of the labor market to determine emerging skills, closest competitors, and where to get the most value for each recruiting dollar spent. Collecting this data, however, can be problematic, as job ads are posted redundantly at numerous online locations. In this paper, we detail a domain-specific near-duplicate detection methodology aimed at tackling this problem. More specifically, we discuss Apollo, a near-duplicate detection system for job ads. Apollo is in production at CareerBuilder, a large online recruitment company and powers many downstream analytics applications. Its effectiveness, predicated on precision, recall, F-score, and run time, is then compared against other industry-standard deduplication methods to prove its viability over existing paradigms.","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131944653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Extracting Field Oversees’ Features in Risk Recognition from Data of Eyes and Utterances 从眼睛和话语数据中提取风险识别中的场监督特征
2017 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2017-11-01 DOI: 10.1109/ICDMW.2017.83
N. Kushiro, Yuji Fujita, Yusuke Aoyama
{"title":"Extracting Field Oversees’ Features in Risk Recognition from Data of Eyes and Utterances","authors":"N. Kushiro, Yuji Fujita, Yusuke Aoyama","doi":"10.1109/ICDMW.2017.83","DOIUrl":"https://doi.org/10.1109/ICDMW.2017.83","url":null,"abstract":"In this study, we have developed the video based risk recognition training tool with an eye tracking device and a motion sensor. We applied the tool on the risk recognition training in a construction company and extracted features in risk recognition of expert field overseers from their eyes and utterances during the training. As the results of the examinations, typical risk recognition processes for the experts (Meta-knowledge) and the experts' knowledge used for individual risk recognitions (Domain knowledge) were elicited.","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132187686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Discovery of Action Rules at Lowest Cost in Spark 在Spark中以最低成本发现动作规则
2017 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2017-11-01 DOI: 10.1109/ICDMW.2017.173
A. Tzacheva, A. Bagavathi, Lavanya Ayila
{"title":"Discovery of Action Rules at Lowest Cost in Spark","authors":"A. Tzacheva, A. Bagavathi, Lavanya Ayila","doi":"10.1109/ICDMW.2017.173","DOIUrl":"https://doi.org/10.1109/ICDMW.2017.173","url":null,"abstract":"Action Rules or Actionable patterns is a type of rule-based approach in data mining that recommends to a user specific actions, in order to achieve a desired result or goal. The amount of data in the world is growing at an exponential rate, doubling almost every two years. Distributed computing platforms like Hadoop and Spark, have eased the computation of this high velocity data. Leveraging these cutting-edge technologies in the field of Data Mining to process huge volumes of data can improve the performance and allow user to gain insights from large datasets with quick turnaround time. In this paper, we present an approach for discovering low cost actionable patterns, and provide actionable recommendations. We adapt this algorithm to distributed environment using Apache Spark framework. We evaluate the performance of the algorithm with two datasets in transportation and medical domain.","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131288929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信