2013 IEEE 13th International Conference on Data Mining最新文献

筛选
英文 中文
An Unsupervised Algorithm for Learning Blocking Schemes 一种学习块方案的无监督算法
2013 IEEE 13th International Conference on Data Mining Pub Date : 2013-12-01 DOI: 10.1109/ICDM.2013.60
M. Kejriwal, Daniel P. Miranker
{"title":"An Unsupervised Algorithm for Learning Blocking Schemes","authors":"M. Kejriwal, Daniel P. Miranker","doi":"10.1109/ICDM.2013.60","DOIUrl":"https://doi.org/10.1109/ICDM.2013.60","url":null,"abstract":"A pair wise comparison of data objects is a requisite step in many data mining applications, but has quadratic complexity. In applications such as record linkage, blocking methods may be applied to reduce the cost. That is, the data is first partitioned into a set of blocks, and pair wise comparisons computed for pairs within each block. To date, blocking methods have required the blocking scheme be given, or the provision of training data enabling supervised learning algorithms to determine a blocking scheme. In either case, a domain expert is required. This paper develops an unsupervised method for learning a blocking scheme for tabular data sets. The method is divided into two phases. First, a weakly labeled training set is generated automatically in time linear in the number of records of the entire dataset. The second phase casts blocking key discovery as a Fisher feature selection problem. The approach is compared to a state-of-the-art supervised blocking key discovery algorithm on three real-world databases and achieves favorable results.","PeriodicalId":308676,"journal":{"name":"2013 IEEE 13th International Conference on Data Mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125806405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 65
Binary Time-Series Query Framework for Efficient Quantitative Trait Association Study 高效数量性状关联研究的二元时间序列查询框架
2013 IEEE 13th International Conference on Data Mining Pub Date : 2013-12-01 DOI: 10.1109/ICDM.2013.42
Hongfei Wang, Xiang Zhang
{"title":"Binary Time-Series Query Framework for Efficient Quantitative Trait Association Study","authors":"Hongfei Wang, Xiang Zhang","doi":"10.1109/ICDM.2013.42","DOIUrl":"https://doi.org/10.1109/ICDM.2013.42","url":null,"abstract":"Quantitative trait association study examines the association between quantitative traits and genetic variants. As a promising tool, it has been widely applied to dissect the genetic basis of complex diseases. However, such study usually involves testing trillions of variant-trait pairs and demands intensive computational resources. Recently, several algorithms have been developed to improve its efficiency. In this paper, we propose a framework, Fabrique, which models quantitative trait association study as querying binary time-series and bridges the two seemly different problems. Specifically, in the proposed framework, genetic variants are treated as a database consisting of binary time-series. Finding trait-associated variants is equivalent to finding the nearest neighbors of the trait. For efficient query process, Fabrique partitions and normalizes the binary time-series, and estimates a tight upper bound for each group of time-series to prune the search space. Extensive experimental results demonstrate that Fabrique only needs to search a very small portion of the database to locate the target variants and significantly outperforms the state-of-the-art method. We also show that Fabrique can be applied to other binary time-series query problem in addition to the genetic association study.","PeriodicalId":308676,"journal":{"name":"2013 IEEE 13th International Conference on Data Mining","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125498820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Discriminatively Enhanced Topic Models 判别增强主题模型
2013 IEEE 13th International Conference on Data Mining Pub Date : 2013-12-01 DOI: 10.1109/ICDM.2013.107
Snigdha Chaturvedi, Hal Daumé, Taesun Moon
{"title":"Discriminatively Enhanced Topic Models","authors":"Snigdha Chaturvedi, Hal Daumé, Taesun Moon","doi":"10.1109/ICDM.2013.107","DOIUrl":"https://doi.org/10.1109/ICDM.2013.107","url":null,"abstract":"This paper proposes a space-efficient, discriminatively enhanced topic model: a V structured topic model with an embedded log-linear component. The discriminative log-linear component reduces the number of parameters to be learnt while outperforming baseline generative models. At the same time, the explanatory power of the generative component is not compromised. We establish its superiority over a purely generative model by applying it to two different ranking tasks: (a) In the first task, we look at the problem of proposing alternative citations given textual and bibliographic evidence. We solve it as a ranking problem in itself and as a platform for further qualitative analysis of convergence of scientific phenomenon. (b) In the second task we address the problem of ranking potential email recipients based on email content and sender information.","PeriodicalId":308676,"journal":{"name":"2013 IEEE 13th International Conference on Data Mining","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125521792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Context-Aware MIML Instance Annotation 上下文感知的MIML实例注释
2013 IEEE 13th International Conference on Data Mining Pub Date : 2013-12-01 DOI: 10.1109/ICDM.2013.115
Forrest Briggs, Xiaoli Z. Fern, R. Raich
{"title":"Context-Aware MIML Instance Annotation","authors":"Forrest Briggs, Xiaoli Z. Fern, R. Raich","doi":"10.1109/ICDM.2013.115","DOIUrl":"https://doi.org/10.1109/ICDM.2013.115","url":null,"abstract":"In multi-instance multi-label (MIML) instance annotation, the goal is to learn an instance classifier while training on a MIML dataset, which consists of bags of instances paired with label sets, instance labels are not provided in the training data. The MIML formulation can be applied in many domains. For example, in an image domain, bags are images, instances are feature vectors representing segments in the images, and the label sets are lists of objects or categories present in each image. Although many MIML algorithms have been developed for predicting the label set of a new bag, only a few have been specifically designed to predict instance labels. We propose MIML-ECC (ensemble of classifier chains), which exploits bag-level context through label correlations to improve instance-level prediction accuracy. The proposed method is scalable in all dimensions of a problem (bags, instances, classes, and feature dimension), and has no parameters that require tuning (which is a problem for prior methods). In experiments on two image datasets, a bioacoustics dataset, and two artificial datasets, MIML-ECC achieves higher or comparable accuracy in comparison to several recent methods and baselines.","PeriodicalId":308676,"journal":{"name":"2013 IEEE 13th International Conference on Data Mining","volume":"200 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116006215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Markov Blanket Feature Selection with Non-faithful Data Distributions 非忠实数据分布下的马尔可夫毯子特征选择
2013 IEEE 13th International Conference on Data Mining Pub Date : 2013-12-01 DOI: 10.1109/ICDM.2013.154
Kui Yu, Xindong Wu, Zan Zhang, Yang Mu, Hao Wang, W. Ding
{"title":"Markov Blanket Feature Selection with Non-faithful Data Distributions","authors":"Kui Yu, Xindong Wu, Zan Zhang, Yang Mu, Hao Wang, W. Ding","doi":"10.1109/ICDM.2013.154","DOIUrl":"https://doi.org/10.1109/ICDM.2013.154","url":null,"abstract":"In faithful Bayesian networks, the Markov blanket of the class attribute is a unique and minimal feature subset for optimal feature selection. However, little attention has been paid to Markov blanket feature selection in a non-faithful environment which widely exists in the real world. To tackle this issue, in this paper, we deal with non-faithful data distributions and propose the concept of representative sets instead of Markov blankets. With a standard sparse group lasso for selection of features from the representative sets, we design an effective algorithm, SRS, for Markov blanket feature Selection via Representative Sets with non-faithful data distributions. Empirical studies demonstrate that SRS outperforms the state-of-the-art Markov blanket feature selectors and other well-established feature selection methods.","PeriodicalId":308676,"journal":{"name":"2013 IEEE 13th International Conference on Data Mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116385646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Spatio-Temporal Topic Modeling in Mobile Social Media for Location Recommendation 面向位置推荐的移动社交媒体时空主题建模
2013 IEEE 13th International Conference on Data Mining Pub Date : 2013-12-01 DOI: 10.1109/ICDM.2013.139
Bo Hu, Mohsen Jamali, M. Ester
{"title":"Spatio-Temporal Topic Modeling in Mobile Social Media for Location Recommendation","authors":"Bo Hu, Mohsen Jamali, M. Ester","doi":"10.1109/ICDM.2013.139","DOIUrl":"https://doi.org/10.1109/ICDM.2013.139","url":null,"abstract":"Mobile networks enable users to post on social media services (e.g., Twitter) from anywhere and anytime. This new phenomenon led to the emergence of a new line of work of mining the behavior of mobile users taking into account the spatio-temporal aspects of their engagement with online social media. In this paper, we address the problem of recommending the right locations to users at the right time. We claim to propose the first comprehensive model, called STT (Spatio-Temporal Topic), to capture the spatio-temporal aspects of user check-ins in a single probabilistic model for location recommendation. Our proposed generative model does not only captures spatio-temporal aspects of check-ins, but also profiles users. We conduct experiments on real life data sets from Twitter, Go Walla, and Bright kite. We evaluate the effectiveness of STT by evaluating the accuracy of location recommendation. The experimental results show that STT achieves better performance than the state-of-the-art models in the areas of recommender systems as well as topic modeling.","PeriodicalId":308676,"journal":{"name":"2013 IEEE 13th International Conference on Data Mining","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122290726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 68
Co-ClusterD: A Distributed Framework for Data Co-Clustering with Sequential Updates Co-ClusterD:一种具有顺序更新的数据共聚的分布式框架
2013 IEEE 13th International Conference on Data Mining Pub Date : 2013-12-01 DOI: 10.1109/ICDM.2013.76
Sen Su, Xiang Cheng, Lixin Gao, Jiangtao Yin
{"title":"Co-ClusterD: A Distributed Framework for Data Co-Clustering with Sequential Updates","authors":"Sen Su, Xiang Cheng, Lixin Gao, Jiangtao Yin","doi":"10.1109/ICDM.2013.76","DOIUrl":"https://doi.org/10.1109/ICDM.2013.76","url":null,"abstract":"Co-clustering is a powerful data mining tool for co-occurrence and dyadic data. As data sets become increasingly large, the scalability of co-clustering becomes more and more important. In this paper, we propose two approaches to parallelize co-clustering with sequential updates in a distributed environment. Based on these two approaches, we present a new distributed framework, Co-ClusterD, that supports efficient implementations of co-clustering algorithms with sequential updates. We design and implement Co-ClusterD, and show its efficiency through two co-clustering algorithms: fast nonnegative matrix tri-factorization (FNMTF) and information theoretic co-clustering (ITCC). We evaluate our framework on both a local cluster of machines and the Amazon EC2 cloud. Our evaluation shows that co-clustering algorithms implemented in Co-ClusterD can achieve better results and run faster than their traditional concurrent counterparts.","PeriodicalId":308676,"journal":{"name":"2013 IEEE 13th International Conference on Data Mining","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122704841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Reconstructing Individual Mobility from Smart Card Transactions: A Space Alignment Approach 从智能卡交易中重构个人流动性:一种空间对齐方法
2013 IEEE 13th International Conference on Data Mining Pub Date : 2013-12-01 DOI: 10.1109/ICDM.2013.37
Nicholas Jing Yuan, Yingzi Wang, Fuzheng Zhang, Xing Xie, Guangzhong Sun
{"title":"Reconstructing Individual Mobility from Smart Card Transactions: A Space Alignment Approach","authors":"Nicholas Jing Yuan, Yingzi Wang, Fuzheng Zhang, Xing Xie, Guangzhong Sun","doi":"10.1109/ICDM.2013.37","DOIUrl":"https://doi.org/10.1109/ICDM.2013.37","url":null,"abstract":"Smart card transactions capture rich information of human mobility and urban dynamics, therefore are of particular interest to urban planners and location-based service providers. However, since most transaction systems are only designated for billing purpose, typically, fine-grained location information, such as the exact boarding and alighting stops of a bus trip, is only partially or not available at all, which blocks deep exploitation of this rich and valuable data at individual level. This paper presents a \"space alignment\" framework to reconstruct individual mobility history from a large-scale smart card transaction dataset pertaining to a metropolitan city. Specifically, we show that by delicately aligning the monetary space and geospatial space with the temporal space, we are able to extrapolate a series of critical domain specific constraints. Later, these constraints are naturally incorporated into a semi-supervised conditional random field to infer the exact boarding and alighting stops of all transit routes with a surprisingly high accuracy, e.g., given only 10% trips with known alighting/boarding stops, we successfully inferred more than 78% alighting and boarding stops from all unlabeled trips. In addition, we demonstrated that the smart card data enriched by the proposed approach dramatically improved the performance of a conventional method for identifying users' home and work places (with 88% improvement on home detection and 35% improvement on work place detection). The proposed method offers the possibility to mine individual mobility from common public transit transactions, and showcases how uncertain data can be leveraged with domain knowledge and constraints, to support cross-application data mining tasks.","PeriodicalId":308676,"journal":{"name":"2013 IEEE 13th International Conference on Data Mining","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128304888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 56
Hibernating Process: Modelling Mobile Calls at Multiple Scales 休眠过程:在多个尺度上模拟移动呼叫
2013 IEEE 13th International Conference on Data Mining Pub Date : 2013-12-01 DOI: 10.1109/ICDM.2013.82
Siyuan Liu, Lei Li, Rammaya Krishnan
{"title":"Hibernating Process: Modelling Mobile Calls at Multiple Scales","authors":"Siyuan Liu, Lei Li, Rammaya Krishnan","doi":"10.1109/ICDM.2013.82","DOIUrl":"https://doi.org/10.1109/ICDM.2013.82","url":null,"abstract":"Do mobile phone calls at larger granularities behave in the same pattern as in smaller ones? How can we forecast the distribution of a whole month's phone calls with only one day's observation? There are many models developed to interpret large scale social graphs. However, all of the existing models focus on graph at one time scale. Many dynamical behaviors were either ignored, or handled at one scale. In particular new users might join or current users quit social networks at any time. In this paper, we propose HiP, a novel model to capture longitudinal behaviors in modeling degree distribution of evolving social graphs. We analyze a large scale phone call dataset using HiP, and compare with several previous models in literature. Our model is able to fit phone call distribution at multiple scales with 30% to 75% improvement over the best existing method on each scale.","PeriodicalId":308676,"journal":{"name":"2013 IEEE 13th International Conference on Data Mining","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128394686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient Proper Length Time Series Motif Discovery 有效的适当长度时间序列基序发现
2013 IEEE 13th International Conference on Data Mining Pub Date : 2013-12-01 DOI: 10.1109/ICDM.2013.111
Sorrachai Yingchareonthawornchai, Haemwaan Sivaraks, T. Rakthanmanon, C. Ratanamahatana
{"title":"Efficient Proper Length Time Series Motif Discovery","authors":"Sorrachai Yingchareonthawornchai, Haemwaan Sivaraks, T. Rakthanmanon, C. Ratanamahatana","doi":"10.1109/ICDM.2013.111","DOIUrl":"https://doi.org/10.1109/ICDM.2013.111","url":null,"abstract":"As one of the most essential data mining tasks, finding frequently occurring patterns, i.e., motif discovery, has drawn a lot of attention in the past decade. Despite successes in speedup of motif discovery algorithms, most of the existing algorithms still require predefined parameters. The critical and most cumbersome one is time series motif length since it is difficult to manually determine the proper length of the motifs-even for the domain experts. In addition, with variability in the motif lengths, ranking among these motifs becomes another major problem. In this work, we propose a novel algorithm using compression ratio as a heuristic to discover meaningful motifs in proper lengths. The ranking of these various length motifs relies on an ability to compress time series by its own motif as a hypothesis. Furthermore, other than being an anytime algorithm, our experimental evaluation also demonstrates that our proposed method outperforms existing works in various domains both in terms of speed and accuracy.","PeriodicalId":308676,"journal":{"name":"2013 IEEE 13th International Conference on Data Mining","volume":"72 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129679180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信