Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining最新文献_第4页

A data mining driven risk profiling method for road asset management 基于数据挖掘的道路资产管理风险分析方法

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2013-08-11 DOI: 10.1145/2487575.2488204

D. Emerson, J. Weligamage, R. Nayak

引用次数: 1

LAICOS: an open source platform for personalized social web search LAICOS:一个用于个性化社交网络搜索的开源平台

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2013-08-11 DOI: 10.1145/2487575.2487705

Mohamed Reda Bouadjenek, Hakim Hacid, M. Bouzeghoub

引用次数: 19

Mining evidences for named entity disambiguation 命名实体消歧的证据挖掘

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2013-08-11 DOI: 10.1145/2487575.2487681

Yang Li, Chi Wang, Fangqiu Han, Jiawei Han, D. Roth, Xifeng Yan

{"title":"Mining evidences for named entity disambiguation","authors":"Yang Li, Chi Wang, Fangqiu Han, Jiawei Han, D. Roth, Xifeng Yan","doi":"10.1145/2487575.2487681","DOIUrl":"https://doi.org/10.1145/2487575.2487681","url":null,"abstract":"Named entity disambiguation is the task of disambiguating named entity mentions in natural language text and link them to their corresponding entries in a knowledge base such as Wikipedia. Such disambiguation can help enhance readability and add semantics to plain text. It is also a central step in constructing high-quality information network or knowledge graph from unstructured text. Previous research has tackled this problem by making use of various textual and structural features from a knowledge base. Most of the proposed algorithms assume that a knowledge base can provide enough explicit and useful information to help disambiguate a mention to the right entity. However, the existing knowledge bases are rarely complete (likely will never be), thus leading to poor performance on short queries with not well-known contexts. In such cases, we need to collect additional evidences scattered in internal and external corpus to augment the knowledge bases and enhance their disambiguation power. In this work, we propose a generative model and an incremental algorithm to automatically mine useful evidences across documents. With a specific modeling of \"background topic\" and \"unknown entities\", our model is able to harvest useful evidences out of noisy information. Experimental results show that our proposed method outperforms the state-of-the-art approaches significantly: boosting the disambiguation accuracy from 43% (baseline) to 86% on short queries derived from tweets.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90199149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 101

Extracting social events for learning better information diffusion models 提取社会事件以学习更好的信息扩散模型

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2013-08-11 DOI: 10.1145/2487575.2487584

Shuyang Lin, Fengjiao Wang, Qingbo Hu, Philip S. Yu

引用次数: 30

A transfer learning based framework of crowd-selection on twitter 基于迁移学习的twitter人群选择框架

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2013-08-11 DOI: 10.1145/2487575.2487708

Zhou Zhao, D. Yan, Wilfred Ng, Shi Gao

引用次数: 32

Learning to question: leveraging user preferences for shopping advice 学会质疑:利用用户偏好来获得购物建议

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2013-08-11 DOI: 10.1145/2487575.2487653

Mahashweta Das, G. D. F. Morales, A. Gionis, Ingmar Weber

{"title":"Learning to question: leveraging user preferences for shopping advice","authors":"Mahashweta Das, G. D. F. Morales, A. Gionis, Ingmar Weber","doi":"10.1145/2487575.2487653","DOIUrl":"https://doi.org/10.1145/2487575.2487653","url":null,"abstract":"We present ShoppingAdvisor, a novel recommender system that helps users in shopping for technical products. ShoppingAdvisor leverages both user preferences and technical product attributes in order to generate its suggestions. The system elicits user preferences via a tree-shaped flowchart, where each node is a question to the user. At each node, ShoppingAdvisor suggests a ranking of products matching the preferences of the user, and that gets progressively refined along the path from the tree's root to one of its leafs. In this paper we show (i) how to learn the structure of the tree, i.e., which questions to ask at each node, and (ii) how to produce a suitable ranking at each node. First, we adapt the classical top-down strategy for building decision trees in order to find the best user attribute to ask at each node. Differently from decision trees, ShoppingAdvisor partitions the user space rather than the product space. Second, we show how to employ a learning-to-rank approach in order to learn, for each node of the tree, a ranking of products appropriate to the users who reach that node. We experiment with two real-world datasets for cars and cameras, and a synthetic one. We use mean reciprocal rank to evaluate ShoppingAdvisor, and show how the performance increases by more than 50% along the path from root to leaf. We also show how collaborative recommendation algorithms such as k-nearest neighbor benefits from feature selection done by the ShoppingAdvisor tree. Our experiments show that ShoppingAdvisor produces good quality interpretable recommendations, while requiring less input from users and being able to handle the cold-start problem.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"78 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88180149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 23

Automatic selection of social media responses to news 自动选择社交媒体对新闻的反应

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2013-08-11 DOI: 10.1145/2487575.2487659

Tadej Štajner, B. Thomee, Ana-Maria Popescu, M. Pennacchiotti, A. Jaimes

{"title":"Automatic selection of social media responses to news","authors":"Tadej Štajner, B. Thomee, Ana-Maria Popescu, M. Pennacchiotti, A. Jaimes","doi":"10.1145/2487575.2487659","DOIUrl":"https://doi.org/10.1145/2487575.2487659","url":null,"abstract":"Social media responses to news have increasingly gained in importance as they can enhance a consumer's news reading experience, promote information sharing and aid journalists in assessing their readership's response to a story. Given that the number of responses to an online news article may be huge, a common challenge is that of selecting only the most interesting responses for display. This paper addresses this challenge by casting message selection as an optimization problem. We define an objective function which jointly models the messages' utility scores and their entropy. We propose a near-optimal solution to the underlying optimization problem, which leverages the submodularity property of the objective function. Our solution first learns the utility of individual messages in isolation and then produces a diverse selection of interesting messages by maximizing the defined objective function. The intuitions behind our work are that an interesting selection of messages contains diverse, informative, opinionated and popular messages referring to the news article, written mostly by users that have authority on the topic. Our intuitions are embodied by a rich set of content, social and user features capturing the aforementioned aspects. We evaluate our approach through both human and automatic experiments, and demonstrate it outperforms the state of the art. Additionally, we perform an in-depth analysis of the annotated ``interesting'' responses, shedding light on the subjectivity around the selection process and the perception of interestingness.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78607046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 38

An online system with end-user services: mining novelty concepts from tv broadcast subtitles 具有终端用户服务的在线系统:从电视广播字幕中挖掘新概念

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2013-08-11 DOI: 10.1145/2487575.2487715

Mika Rautiainen, Jouni Sarvanko, A. Heikkinen, M. Ylianttila, V. Kostakos

{"title":"An online system with end-user services: mining novelty concepts from tv broadcast subtitles","authors":"Mika Rautiainen, Jouni Sarvanko, A. Heikkinen, M. Ylianttila, V. Kostakos","doi":"10.1145/2487575.2487715","DOIUrl":"https://doi.org/10.1145/2487575.2487715","url":null,"abstract":"Better tools for content-based access of video are needed to improve access to time-continuous video data. Particularly information about linear TV broadcast programs has been available in a form limited to program guides that provide short manually described overviews of the program content. Recent development in digitalization of TV broadcasting and emergence of web-based services for catch-up and on-demand viewing bring out new possibilities to access data. In this paper we introduce our data mining system and accompanying services for summarizing Finnish DVB broadcast streams from seven national channels. We describe how data mining of novelty concepts can be extracted from DVB subtitles to augment web-based \"Catch-Up TV Guide\" and \"Novelty Cloud\" TV services. Furthermore, our system allows accessing media fragments as Picture Quotes via generated word lists and provides content-based recommendations to find new programs that have content similar to the user selected programs. Our index consists of over 180 000 programs that are used to recommend relevant programs. The service has been under development and available online since 2010. It has registered over 5000 user sessions.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"102 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77466265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-label classification by mining label and instance correlations from heterogeneous information networks 从异构信息网络中挖掘标签和实例关联的多标签分类

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2013-08-11 DOI: 10.1145/2487575.2487577

Xiangnan Kong, Bokai Cao, Philip S. Yu

{"title":"Multi-label classification by mining label and instance correlations from heterogeneous information networks","authors":"Xiangnan Kong, Bokai Cao, Philip S. Yu","doi":"10.1145/2487575.2487577","DOIUrl":"https://doi.org/10.1145/2487575.2487577","url":null,"abstract":"Multi-label classification is prevalent in many real-world applications, where each example can be associated with a set of multiple labels simultaneously. The key challenge of multi-label classification comes from the large space of all possible label sets, which is exponential to the number of candidate labels. Most previous work focuses on exploiting correlations among different labels to facilitate the learning process. It is usually assumed that the label correlations are given beforehand or can be derived directly from data samples by counting their label co-occurrences. However, in many real-world multi-label classification tasks, the label correlations are not given and can be hard to learn directly from data samples within a moderate-sized training set. Heterogeneous information networks can provide abundant knowledge about relationships among different types of entities including data samples and class labels. In this paper, we propose to use heterogeneous information networks to facilitate the multi-label classification process. By mining the linkage structure of heterogeneous information networks, multiple types of relationships among different class labels and data samples can be extracted. Then we can use these relationships to effectively infer the correlations among different class labels in general, as well as the dependencies among the label sets of data examples inter-connected in the network. Empirical studies on real-world tasks demonstrate that the performance of multi-label classification can be effectively boosted using heterogeneous information net- works.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"77 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77547555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 78

Big data analytics for healthcare 医疗保健大数据分析

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2013-08-11 DOI: 10.1145/2487575.2506178

Jimeng Sun, C. Reddy

引用次数: 92