2011 IEEE 11th International Conference on Data Mining最新文献

筛选
英文 中文
Learning with Minimum Supervision: A General Framework for Transductive Transfer Learning 最小监督下的学习:转导迁移学习的一般框架
2011 IEEE 11th International Conference on Data Mining Pub Date : 2011-12-11 DOI: 10.1109/ICDM.2011.92
M. T. Bahadori, Yan Liu, Dan Zhang
{"title":"Learning with Minimum Supervision: A General Framework for Transductive Transfer Learning","authors":"M. T. Bahadori, Yan Liu, Dan Zhang","doi":"10.1109/ICDM.2011.92","DOIUrl":"https://doi.org/10.1109/ICDM.2011.92","url":null,"abstract":"Transductive transfer learning is one special type of transfer learning problem, in which abundant labeled examples are available in the source domain and only textit{unlabeled} examples are available in the target domain. It easily finds applications in spam filtering, microblogging mining and so on. In this paper, we propose a general framework to solve the problem by mapping the input features in both the source domain and target domain into a shared latent space and simultaneously minimizing the feature reconstruction loss and prediction loss. We develop one specific example of the framework, namely latent large-margin transductive transfer learning (LATTL) algorithm, and analyze its theoretic bound of classification loss via Rademacher complexity. We also provide a unified view of several popular transfer learning algorithms under our framework. Experiment results on one synthetic dataset and three application datasets demonstrate the advantages of the proposed algorithm over the other state-of-the-art ones.","PeriodicalId":106216,"journal":{"name":"2011 IEEE 11th International Conference on Data Mining","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116357727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Learning Dirichlet Processes from Partially Observed Groups 从部分观察群中学习狄利克雷过程
2011 IEEE 11th International Conference on Data Mining Pub Date : 2011-12-11 DOI: 10.1109/ICDM.2011.85
Kumar Avinava Dubey, Indrajit Bhattacharya, M. Das, T. Faruquie, C. Bhattacharyya
{"title":"Learning Dirichlet Processes from Partially Observed Groups","authors":"Kumar Avinava Dubey, Indrajit Bhattacharya, M. Das, T. Faruquie, C. Bhattacharyya","doi":"10.1109/ICDM.2011.85","DOIUrl":"https://doi.org/10.1109/ICDM.2011.85","url":null,"abstract":"Motivated by the task of vernacular news analysis using known news topics from national news-papers, we study the task of topic analysis, where given source datasets with observed topics, data items from a target dataset need to be assigned either to observed source topics or to new ones. Using Hierarchical Dirichlet Processes for addressing this task imposes unnecessary and often inappropriate generative assumptions on the observed source topics. In this paper, we explore Dirichlet Processes with partially observed groups (POG-DP). POG-DP avoids modeling the given source topics. Instead, it directly models the conditional distribution of the target data as a mixture of a Dirichlet Process and the posterior distribution of a Hierarchical Dirichlet Process with known groups and topics. This introduces coupling between selection probabilities of all topics within a source, leading to effective identification of source topics. We further improve on this with a Combinatorial Dirichlet Process with partially observed groups (POG-CDP) that captures finer grained coupling between related topics by choosing intersections between sources. We evaluate our models in three different real-world applications. Using extensive experimentation, we compare against several baselines to show that our model performs significantly better in all three applications.","PeriodicalId":106216,"journal":{"name":"2011 IEEE 11th International Conference on Data Mining","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123766506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Efficient Mining of Closed Sequential Patterns on Stream Sliding Window 流滑动窗口上封闭序列模式的高效挖掘
2011 IEEE 11th International Conference on Data Mining Pub Date : 2011-12-11 DOI: 10.1109/ICDM.2011.61
Chuancong Gao, Jianyong Wang, Qingyan Yang
{"title":"Efficient Mining of Closed Sequential Patterns on Stream Sliding Window","authors":"Chuancong Gao, Jianyong Wang, Qingyan Yang","doi":"10.1109/ICDM.2011.61","DOIUrl":"https://doi.org/10.1109/ICDM.2011.61","url":null,"abstract":"As a typical data mining research topic, sequential pattern mining has been studied extensively for the past decade. Recently, mining various sequential patterns incrementally over stream data has raised great interest. Due to the challenges of mining stream data, many difficulties not so obvious in static data mining have to be reconsidered carefully. In this paper, we propose a novel algorithm which stores only frequent closed prefixes in its enumeration tree structure, used for mining and maintaining patterns in the current sliding window, to solve the frequent closed sequential pattern mining problem efficiently over stream data. Some effective search space pruning and pattern closure checking strategies have been also devised to accelerate the algorithm. Experimental results show that our algorithm outperforms other state-of-the-art algorithm significantly in both running time and memory use.","PeriodicalId":106216,"journal":{"name":"2011 IEEE 11th International Conference on Data Mining","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125710455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Personalized Travel Package Recommendation 个性化旅游套餐推荐
2011 IEEE 11th International Conference on Data Mining Pub Date : 2011-12-11 DOI: 10.1109/ICDM.2011.118
Qi Liu, Yong Ge, Zhongmou Li, Enhong Chen, Hui Xiong
{"title":"Personalized Travel Package Recommendation","authors":"Qi Liu, Yong Ge, Zhongmou Li, Enhong Chen, Hui Xiong","doi":"10.1109/ICDM.2011.118","DOIUrl":"https://doi.org/10.1109/ICDM.2011.118","url":null,"abstract":"As the worlds of commerce, entertainment, travel, and Internet technology become more inextricably linked, new types of business data become available for creative use and formal analysis. Indeed, this paper provides a study of exploiting online travel information for personalized travel package recommendation. A critical challenge along this line is to address the unique characteristics of travel data, which distinguish travel packages from traditional items for recommendation. To this end, we first analyze the characteristics of the travel packages and develop a Tourist-Area-Season Topic (TAST) model, which can extract the topics conditioned on both the tourists and the intrinsic features (i.e. locations, travel seasons) of the landscapes. Based on this TAST model, we propose a cocktail approach on personalized travel package recommendation. Finally, we evaluate the TAST model and the cocktail approach on real-world travel package data. The experimental results show that the TAST model can effectively capture the unique characteristics of the travel data and the cocktail approach is thus much more effective than traditional recommendation methods for travel package recommendation.","PeriodicalId":106216,"journal":{"name":"2011 IEEE 11th International Conference on Data Mining","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125185742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 217
Simple Multiple Noisy Label Utilization Strategies 简单的多噪声标签利用策略
2011 IEEE 11th International Conference on Data Mining Pub Date : 2011-12-11 DOI: 10.1109/ICDM.2011.133
V. Sheng
{"title":"Simple Multiple Noisy Label Utilization Strategies","authors":"V. Sheng","doi":"10.1109/ICDM.2011.133","DOIUrl":"https://doi.org/10.1109/ICDM.2011.133","url":null,"abstract":"With the outsourcing of small tasks becoming easier, it is possible to obtain non-expert/imperfect labels at low cost. With low-cost imperfect labeling, it is straightforward to collect multiple labels for the same data items. This paper addresses the strategies of utilizing these multiple labels for improving the performance of supervised learning, based on two basic ideas: majority voting and pair wise solutions. We show several interesting results based on our experiments. The soft majority voting strategies can reduce the bias and roughness, and improve the performance of the directed hard majority voting strategy. Pair wise strategies can completely avoid the bias by having both sides (potential correct and incorrect/noisy information) considered (for binary classification). They have very good performance whenever there are a few or many labels available. However, it could also keep the noise. The improved variation that reduces the impact of the noisy information is recommended. All five strategies investigated are labeling quality agnostic strategies, and can be applied to real world applications directly. The experimental results show some of them perform better than or at least very close to the gnostic strategies.","PeriodicalId":106216,"journal":{"name":"2011 IEEE 11th International Conference on Data Mining","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126979224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Detecting Community Kernels in Large Social Networks 在大型社会网络中检测社区核
2011 IEEE 11th International Conference on Data Mining Pub Date : 2011-12-11 DOI: 10.1109/ICDM.2011.48
Liaoruo Wang, Tiancheng Lou, Jie Tang, J. Hopcroft
{"title":"Detecting Community Kernels in Large Social Networks","authors":"Liaoruo Wang, Tiancheng Lou, Jie Tang, J. Hopcroft","doi":"10.1109/ICDM.2011.48","DOIUrl":"https://doi.org/10.1109/ICDM.2011.48","url":null,"abstract":"In many social networks, there exist two types of users that exhibit different influence and different behavior. For instance, statistics have shown that less than 1% of the Twitter users (e.g. entertainers, politicians, writers) produce 50% of its content, while the others (e.g. fans, followers, readers) have much less influence and completely different social behavior. In this paper, we define and explore a novel problem called community kernel detection in order to uncover the hidden community structure in large social networks. We discover that influential users pay closer attention to those who are more similar to them, which leads to a natural partition into different community kernels. We propose Greedy and We BA, two efficient algorithms for finding community kernels in large social networks. Greedy is based on maximum cardinality search, while We BA formalizes the problem in an optimization framework. We conduct experiments on three large social networks: Twitter, Wikipedia, and Coauthor, which show that We BA achieves an average 15%-50% performance improvement over the other state-of-the-art algorithms, and We BA is on average 6-2,000 times faster in detecting community kernels.","PeriodicalId":106216,"journal":{"name":"2011 IEEE 11th International Conference on Data Mining","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127039592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 75
Modeling High-Level Behavior Patterns for Precise Similarity Analysis of Software 为精确的软件相似度分析建模高级行为模式
2011 IEEE 11th International Conference on Data Mining Pub Date : 2011-12-11 DOI: 10.1109/ICDM.2011.104
Taeho Kwon, Z. Su
{"title":"Modeling High-Level Behavior Patterns for Precise Similarity Analysis of Software","authors":"Taeho Kwon, Z. Su","doi":"10.1109/ICDM.2011.104","DOIUrl":"https://doi.org/10.1109/ICDM.2011.104","url":null,"abstract":"The analysis of software similarity has many applications such as detecting code clones, software plagiarism, code theft, and polymorphic malware. Because often source code is unavailable and code obfuscation is used to avoid detection, there has been much research on developing effective models to capture runtime behavior to aid detection. Existing models focus on low-level information such as dependency or purely occurrence of function calls, and suffer from poor precision, poor scalability, or both. To overcome limitations of existing models, this paper introduces a precise and succinct behavior representation that characterizes high-level object-accessing patterns as regular expressions. We first distill a set of high-level patterns (the alphabet S of the regular language) based on two pieces of information: function call patterns to access objects and type state information of the objects. Then we abstract a runtime trace of a program P into a regular expression e over the pattern alphabet S to produce P's behavior signature. We show that software instances derived from the same code exhibit similar behavior signatures and develop effective algorithms to cluster and match behavior signatures. To evaluate the effectiveness of our behavior model, we have applied it to the similarity analysis of polymorphic malware. Our results on a large malware collection demonstrate that our model is both precise and succinct for effective and scalable matching and detection of polymorphic malware.","PeriodicalId":106216,"journal":{"name":"2011 IEEE 11th International Conference on Data Mining","volume":"03 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127256686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
BibClus: A Clustering Algorithm of Bibliographic Networks by Message Passing on Center Linkage Structure 基于中心链接结构信息传递的书目网络聚类算法
2011 IEEE 11th International Conference on Data Mining Pub Date : 2011-12-11 DOI: 10.1109/ICDM.2011.27
Xiaoran Xu, Zhihong Deng
{"title":"BibClus: A Clustering Algorithm of Bibliographic Networks by Message Passing on Center Linkage Structure","authors":"Xiaoran Xu, Zhihong Deng","doi":"10.1109/ICDM.2011.27","DOIUrl":"https://doi.org/10.1109/ICDM.2011.27","url":null,"abstract":"Multi-type objects with multi-type relations are ubiquitous in real-world networks, e.g. bibliographic networks. Such networks are also called heterogeneous information networks. However, the research on clustering for heterogeneous information networks is little. A new algorithm, called NetClus, has been proposed in recent two years. Although NetClus is applied on a heterogeneous information network with a star network schema, considering the relations between center objects and all attribute objects linking to them, it ignores the relations between center objects such as citation relations, which also contain rich information. Hence, we think the star network schema cannot be used to characterize all possible relations without integrating the linkage structure among center objects, which we call the Center Linkage Structure, and there has been no practical way good enough to solve it. In this paper, we present a novel algorithm, BibClus, for clustering heterogeneous objects with center linkage structure by taking a bibliographic information network as an example. In BibClus, we build a probabilistic model of pair wise hidden Markov random field (P-HMRF) to characterize the center linkage structure, and convert it to a factor graph. We further combine EM algorithm with factor graph theory, and design an efficient way based on message passing algorithm to inference marginal probabilities and estimate parameters at each iteration of EM. We also study how factor functions affect clustering performance with different function forms and constraints. For evaluating our proposed method, we have conducted thorough experiments on a real dataset that we had crawled from ACM Digital Library. The experimental results show that BibClus is effective and has a much higher quantity than the recently proposed algorithm, NetClus, in both recall and precision.","PeriodicalId":106216,"journal":{"name":"2011 IEEE 11th International Conference on Data Mining","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131821336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Context-Aware Multi-instance Learning Based on Hierarchical Sparse Representation 基于层次稀疏表示的上下文感知多实例学习
2011 IEEE 11th International Conference on Data Mining Pub Date : 2011-12-11 DOI: 10.1109/ICDM.2011.43
Bing Li, Weihua Xiong, Weiming Hu
{"title":"Context-Aware Multi-instance Learning Based on Hierarchical Sparse Representation","authors":"Bing Li, Weihua Xiong, Weiming Hu","doi":"10.1109/ICDM.2011.43","DOIUrl":"https://doi.org/10.1109/ICDM.2011.43","url":null,"abstract":"Multi-instance learning (MIL), a variant of supervised learning framework, has been applied in many applications. More recently, researchers focus on two important issues for MIL: Instances' contextual structures representation in the same bag and online MIL schemes. In this paper, we present an effective context-aware multi-instance learning technique using a hierarchical sparse representation (HSR-MIL) that addresses the two challenges simultaneously. We firstly construct the inner contextual structure among instances in the same bag based on a novel sparse $varepsilon$-graph. We then propose a graph kernel based sparse bag classifier through a modified kernel sparse coding in higher-dimension feature space. At last, the HSR-MIL approach is extended to achieve online learning manner with an incremental kernel matrix update scheme. The experiments on several data sets demonstrate that our method has better performances and online learning ability.","PeriodicalId":106216,"journal":{"name":"2011 IEEE 11th International Conference on Data Mining","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133422828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Multi-Class L2,1-Norm Support Vector Machine 多类L2,1范数支持向量机
2011 IEEE 11th International Conference on Data Mining Pub Date : 2011-12-11 DOI: 10.1109/ICDM.2011.105
Xiao Cai, F. Nie, Heng Huang, C. Ding
{"title":"Multi-Class L2,1-Norm Support Vector Machine","authors":"Xiao Cai, F. Nie, Heng Huang, C. Ding","doi":"10.1109/ICDM.2011.105","DOIUrl":"https://doi.org/10.1109/ICDM.2011.105","url":null,"abstract":"Feature selection is an essential component of data mining. In many data analysis tasks where the number of data point is much less than the number of features, efficient feature selection approaches are desired to extract meaningful features and to eliminate redundant ones. In the previous study, many data mining techniques have been applied to tackle the above challenging problem. In this paper, we propose a new $ell_{2,1}$-norm SVM, that is, multi-class hinge loss with a structured regularization term for all the classes to naturally select features for multi-class without bothering further heuristic strategy. Rather than directly solving the multi-class hinge loss with $ell_{2,1}$-norm regularization minimization, which has not been solved before due to its optimization difficulty, we are the first to give an efficient algorithm bridging the new problem with a previous solvable optimization problem to do multi-class feature selection. A global convergence proof for our method is also presented. Via the proposed efficient algorithm, we select features across multiple classes with jointly sparsity, emph{i.e.}, each feature has either small or large score over all classes. Comprehensive experiments have been performed on six bioinformatics data sets to show that our method can obtain better or competitive performance compared with exiting state-of-art multi-class feature selection approaches.","PeriodicalId":106216,"journal":{"name":"2011 IEEE 11th International Conference on Data Mining","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129384803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 57
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信