Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining最新文献_第7页

Session details: Research track 12: algorithms for recommendations 研究专题12:推荐算法

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2010-07-25 DOI: 10.1145/3248792

Y. Koren

引用次数: 0

Session details: Research track 3: feature selection 研究专题3:特征选择

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2010-07-25 DOI: 10.1145/3248783

Alexandru Niculescu-Mizil

引用次数: 0

New perspectives and methods in link prediction 链接预测的新视角和新方法

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2010-07-25 DOI: 10.1145/1835804.1835837

Ryan Lichtenwalter, Jake T. Lussier, N. Chawla

{"title":"New perspectives and methods in link prediction","authors":"Ryan Lichtenwalter, Jake T. Lussier, N. Chawla","doi":"10.1145/1835804.1835837","DOIUrl":"https://doi.org/10.1145/1835804.1835837","url":null,"abstract":"This paper examines important factors for link prediction in networks and provides a general, high-performance framework for the prediction task. Link prediction in sparse networks presents a significant challenge due to the inherent disproportion of links that can form to links that do form. Previous research has typically approached this as an unsupervised problem. While this is not the first work to explore supervised learning, many factors significant in influencing and guiding classification remain unexplored. In this paper, we consider these factors by first motivating the use of a supervised framework through a careful investigation of issues such as network observational period, generality of existing methods, variance reduction, topological causes and degrees of imbalance, and sampling approaches. We also present an effective flow-based predicting algorithm, offer formal bounds on imbalance in sparse network link prediction, and employ an evaluation method appropriate for the observed imbalance. Our careful consideration of the above issues ultimately leads to a completely general framework that outperforms unsupervised link prediction methods by more than 30% AUC.","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"134 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85612290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 686

Frequent regular itemset mining 频繁的定期项目集挖掘

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2010-07-25 DOI: 10.1145/1835804.1835840

S. Ruggieri

引用次数: 22

Boosting with structure information in the functional space: an application to graph classification 功能空间结构信息增强:在图分类中的应用

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2010-07-25 DOI: 10.1145/1835804.1835886

Hongliang Fei, Jun Huan

{"title":"Boosting with structure information in the functional space: an application to graph classification","authors":"Hongliang Fei, Jun Huan","doi":"10.1145/1835804.1835886","DOIUrl":"https://doi.org/10.1145/1835804.1835886","url":null,"abstract":"Boosting is a very successful classification algorithm that produces a linear combination of \"weak\" classifiers (a.k.a. base learners) to obtain high quality classification models. In this paper we propose a new boosting algorithm where base learners have structure relationships in the functional space. Though such relationships are generic, our work is particularly motivated by the emerging topic of pattern based classification for semi-structured data including graphs. Towards an efficient incorporation of the structure information, we have designed a general model where we use an undirected graph to capture the relationship of subgraph-based base learners. In our method, we combine both L1 norm and Laplacian based L2 norm penalty with Logit loss function of Logit Boost. In this approach, we enforce model sparsity and smoothness in the functional space spanned by the basis functions. We have derived efficient optimization algorithms based on coordinate decent for the new boosting formulation and theoretically prove that it exhibits a natural grouping effect for nearby spatial or overlapping features. Using comprehensive experimental study, we have demonstrated the effectiveness of the proposed learning methods.","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"49 1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76369819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 43

User browsing models: relevance versus examination 用户浏览模型:相关性vs .检查

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2010-07-25 DOI: 10.1145/1835804.1835835

R. Srikant, Sugato Basu, Ni Wang, D. Pregibon

{"title":"User browsing models: relevance versus examination","authors":"R. Srikant, Sugato Basu, Ni Wang, D. Pregibon","doi":"10.1145/1835804.1835835","DOIUrl":"https://doi.org/10.1145/1835804.1835835","url":null,"abstract":"There has been considerable work on user browsing models for search engine results, both organic and sponsored. The click-through rate (CTR) of a result is the product of the probability of examination (will the user look at the result) times the perceived relevance of the result (probability of a click given examination). Past papers have assumed that when the CTR of a result varies based on the pattern of clicks in prior positions, this variation is solely due to changes in the probability of examination. We show that, for sponsored search results, a substantial portion of the change in CTR when conditioned on prior clicks is in fact due to a change in the relevance of results for that query instance, not just due to a change in the probability of examination. We then propose three new user browsing models, which attribute CTR changes solely to changes in relevance, solely to changes in examination (with an enhanced model of user behavior), or to both changes in relevance and examination. The model that attributes all the CTR change to relevance yields substantially better predictors of CTR than models that attribute all the change to examination, and does only slightly worse than the model that attributes CTR change to both relevance and examination. For predicting relevance, the model that attributes all the CTR change to relevance again does better than the model that attributes the change to examination. Surprisingly, we also find that one model might do better than another in predicting CTR, but worse in predicting relevance. Thus it is essential to evaluate user browsing models with respect to accuracy in predicting relevance, not just CTR.","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"31 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75553042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 53

On community outliers and their efficient detection in information networks 信息网络中社区异常值及其有效检测

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2010-07-25 DOI: 10.1145/1835804.1835907

Jing Gao, Feng Liang, Wei Fan, Chi Wang, Yizhou Sun, Jiawei Han

{"title":"On community outliers and their efficient detection in information networks","authors":"Jing Gao, Feng Liang, Wei Fan, Chi Wang, Yizhou Sun, Jiawei Han","doi":"10.1145/1835804.1835907","DOIUrl":"https://doi.org/10.1145/1835804.1835907","url":null,"abstract":"Linked or networked data are ubiquitous in many applications. Examples include web data or hypertext documents connected via hyperlinks, social networks or user profiles connected via friend links, co-authorship and citation information, blog data, movie reviews and so on. In these datasets (called \"information networks\"), closely related objects that share the same properties or interests form a community. For example, a community in blogsphere could be users mostly interested in cell phone reviews and news. Outlier detection in information networks can reveal important anomalous and interesting behaviors that are not obvious if community information is ignored. An example could be a low-income person being friends with many rich people even though his income is not anomalously low when considered over the entire population. This paper first introduces the concept of community outliers (interesting points or rising stars for a more positive sense), and then shows that well-known baseline approaches without considering links or community information cannot find these community outliers. We propose an efficient solution by modeling networked data as a mixture model composed of multiple normal communities and a set of randomly generated outliers. The probabilistic model characterizes both data and links simultaneously by defining their joint distribution based on hidden Markov random fields (HMRF). Maximizing the data likelihood and the posterior of the model gives the solution to the outlier inference problem. We apply the model on both synthetic data and DBLP data sets, and the results demonstrate importance of this concept, as well as the effectiveness and efficiency of the proposed approach.","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"22 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77540249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 267

Discriminative topic modeling based on manifold learning 基于流形学习的判别主题建模

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2010-07-25 DOI: 10.1145/1835804.1835888

Seungil Huh, S. Fienberg

{"title":"Discriminative topic modeling based on manifold learning","authors":"Seungil Huh, S. Fienberg","doi":"10.1145/1835804.1835888","DOIUrl":"https://doi.org/10.1145/1835804.1835888","url":null,"abstract":"Topic modeling has been popularly used for data analysis in various domains including text documents. Previous topic models, such as probabilistic Latent Semantic Analysis (pLSA) and Latent Dirichlet Allocation (LDA), have shown impressive success in discovering low-rank hidden structures for modeling text documents. These models, however, do not take into account the manifold structure of data, which is generally informative for the non-linear dimensionality reduction mapping. More recent models, namely Laplacian PLSI (LapPLSI) and Locally-consistent Topic Model (LTM), have incorporated the local manifold structure into topic models and have shown the resulting benefits. But these approaches fall short of the full discriminating power of manifold learning as they only enhance the proximity between the low-rank representations of neighboring pairs without any consideration for non-neighboring pairs. In this paper, we propose Discriminative Topic Model (DTM) that separates non-neighboring pairs from each other in addition to bringing neighboring pairs closer together, thereby preserving the global manifold structure as well as improving the local consistency. We also present a novel model fitting algorithm based on the generalized EM and the concept of Pareto improvement. As a result, DTM achieves higher classification performance in a semi-supervised setting by effectively exposing the manifold structure of data. We provide empirical evidence on text corpora to demonstrate the success of DTM in terms of classification accuracy and robustness to parameters compared to state-of-the-art techniques.","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"24 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76022022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

Topic models with power-law using Pitman-Yor process 使用Pitman-Yor过程的幂律主题模型

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2010-07-25 DOI: 10.1145/1835804.1835890

Issei Sato, Hiroshi Nakagawa

{"title":"Topic models with power-law using Pitman-Yor process","authors":"Issei Sato, Hiroshi Nakagawa","doi":"10.1145/1835804.1835890","DOIUrl":"https://doi.org/10.1145/1835804.1835890","url":null,"abstract":"One important approach for knowledge discovery and data mining is to estimate unobserved variables because latent variables can indicate hidden specific properties of observed data. The latent factor model assumes that each item in a record has a latent factor; the co-occurrence of items can then be modeled by latent factors. In document modeling, a record indicates a document represented as a \"bag of words,\" meaning that the order of words is ignored, an item indicates a word and a latent factor indicates a topic. Latent Dirichlet allocation (LDA) is a widely used Bayesian topic model applying the Dirichlet distribution over the latent topic distribution of a document having multiple topics. LDA assumes that latent topics, i.e., discrete latent variables, are distributed according to a multinomial distribution whose parameters are generated from the Dirichlet distribution. LDA also models a word distribution by using a multinomial distribution whose parameters follows the Dirichlet distribution. This Dirichlet-multinomial setting, however, cannot capture the power-law phenomenon of a word distribution, which is known as Zipf's law in linguistics. We therefore propose a novel topic model using the Pitman-Yor(PY) process, called the PY topic model. The PY topic model captures two properties of a document; a power-law word distribution and the presence of multiple topics. In an experiment using real data, this model outperformed LDA in document modeling in terms of perplexity.","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75395117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 75

Session details: IG track 1: advertising, transportation 会议详情:IG专场1:广告、运输

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2010-07-25 DOI: 10.1145/3248777

Y. Li

引用次数: 0