{"title":"A data mining driven risk profiling method for road asset management","authors":"D. Emerson, J. Weligamage, R. Nayak","doi":"10.1145/2487575.2488204","DOIUrl":"https://doi.org/10.1145/2487575.2488204","url":null,"abstract":"Road surface skid resistance has been shown to have a strong relationship to road crash risk, however, applying the current method of using investigatory levels to identify crash prone roads is problematic as they may fail in identifying risky roads outside of the norm. The proposed method analyses a complex and formerly impenetrable volume of data from roads and crashes using data mining. This method rapidly identifies roads with elevated crash-rate, potentially due to skid resistance deficit, for investigation. A hypothetical skid resistance/crash risk curve is developed for each road segment, driven by the model deployed in a novel regression tree extrapolation method. The method potentially solves the problem of missing skid resistance values which occurs during network-wide crash analysis, and allows risk assessment of the major proportion of roads without skid resistance values.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82455779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohamed Reda Bouadjenek, Hakim Hacid, M. Bouzeghoub
{"title":"LAICOS: an open source platform for personalized social web search","authors":"Mohamed Reda Bouadjenek, Hakim Hacid, M. Bouzeghoub","doi":"10.1145/2487575.2487705","DOIUrl":"https://doi.org/10.1145/2487575.2487705","url":null,"abstract":"In this paper, we introduce LAICOS, a social Web search engine as a contribution to the growing area of Social Information Retrieval (SIR). Social information and personalization are at the heart of LAICOS. On the one hand, the social context of documents is added as a layer to their textual content traditionally used for indexing to provide Personalized Social Document Representations. On the other hand, the social context of users is used for the query expansion process using the Personalized Social Query Expansion framework (PSQE) proposed in our earlier works. We describe the different components of the system while relying on social bookmarking systems as a source of social information for personalizing and enhancing the IR process. We show how the internal structure of indexes as well as the query expansion process operated using social information.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82720719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yang Li, Chi Wang, Fangqiu Han, Jiawei Han, D. Roth, Xifeng Yan
{"title":"Mining evidences for named entity disambiguation","authors":"Yang Li, Chi Wang, Fangqiu Han, Jiawei Han, D. Roth, Xifeng Yan","doi":"10.1145/2487575.2487681","DOIUrl":"https://doi.org/10.1145/2487575.2487681","url":null,"abstract":"Named entity disambiguation is the task of disambiguating named entity mentions in natural language text and link them to their corresponding entries in a knowledge base such as Wikipedia. Such disambiguation can help enhance readability and add semantics to plain text. It is also a central step in constructing high-quality information network or knowledge graph from unstructured text. Previous research has tackled this problem by making use of various textual and structural features from a knowledge base. Most of the proposed algorithms assume that a knowledge base can provide enough explicit and useful information to help disambiguate a mention to the right entity. However, the existing knowledge bases are rarely complete (likely will never be), thus leading to poor performance on short queries with not well-known contexts. In such cases, we need to collect additional evidences scattered in internal and external corpus to augment the knowledge bases and enhance their disambiguation power. In this work, we propose a generative model and an incremental algorithm to automatically mine useful evidences across documents. With a specific modeling of \"background topic\" and \"unknown entities\", our model is able to harvest useful evidences out of noisy information. Experimental results show that our proposed method outperforms the state-of-the-art approaches significantly: boosting the disambiguation accuracy from 43% (baseline) to 86% on short queries derived from tweets.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90199149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shuyang Lin, Fengjiao Wang, Qingbo Hu, Philip S. Yu
{"title":"Extracting social events for learning better information diffusion models","authors":"Shuyang Lin, Fengjiao Wang, Qingbo Hu, Philip S. Yu","doi":"10.1145/2487575.2487584","DOIUrl":"https://doi.org/10.1145/2487575.2487584","url":null,"abstract":"Learning of the information diffusion model is a fundamental problem in the study of information diffusion in social networks. Existing approaches learn the diffusion models from events in social networks. However, events in social networks may have different underlying reasons. Some of them may be caused by the social influence inside the network, while others may reflect external trends in the ``real world''. Most existing work on the learning of diffusion models does not distinguish the events caused by the social influence from those caused by external trends. In this paper, we extract social events from data streams in social networks, and then use the extracted social events to improve the learning of information diffusion models. We propose a LADP (Latent Action Diffusion Path) model to incorporate the information diffusion model with the model of external trends, and then design an EM-based algorithm to infer the diffusion probabilities, the external trends and the sources of events efficiently.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90237107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A transfer learning based framework of crowd-selection on twitter","authors":"Zhou Zhao, D. Yan, Wilfred Ng, Shi Gao","doi":"10.1145/2487575.2487708","DOIUrl":"https://doi.org/10.1145/2487575.2487708","url":null,"abstract":"Crowd selection is essential to crowd sourcing applications, since choosing the right workers with particular expertise to carry out crowdsourced tasks is extremely important. The central problem is simple but tricky: given a crowdsourced task, who are the most knowledgable users to ask? In this demo, we show our framework that tackles the problem of crowdsourced task assignment on Twitter according to the social activities of its users. Since user profiles on Twitter do not reveal user interests and skills, we transfer the knowledge from categorized Yahoo! Answers datasets for learning user expertise. Then, we select the right crowd for certain tasks based on user expertise. We study the effectiveness of our system using extensive user evaluation. We further engage the attendees to participate a game called--Whom to Ask on Twitter?. This helps understand our ideas in an interactive manner. Our crowd selection can be accessed by the following url http://webproject2.cse.ust.hk:8034/tcrowd/.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87592829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mahashweta Das, G. D. F. Morales, A. Gionis, Ingmar Weber
{"title":"Learning to question: leveraging user preferences for shopping advice","authors":"Mahashweta Das, G. D. F. Morales, A. Gionis, Ingmar Weber","doi":"10.1145/2487575.2487653","DOIUrl":"https://doi.org/10.1145/2487575.2487653","url":null,"abstract":"We present ShoppingAdvisor, a novel recommender system that helps users in shopping for technical products. ShoppingAdvisor leverages both user preferences and technical product attributes in order to generate its suggestions. The system elicits user preferences via a tree-shaped flowchart, where each node is a question to the user. At each node, ShoppingAdvisor suggests a ranking of products matching the preferences of the user, and that gets progressively refined along the path from the tree's root to one of its leafs. In this paper we show (i) how to learn the structure of the tree, i.e., which questions to ask at each node, and (ii) how to produce a suitable ranking at each node. First, we adapt the classical top-down strategy for building decision trees in order to find the best user attribute to ask at each node. Differently from decision trees, ShoppingAdvisor partitions the user space rather than the product space. Second, we show how to employ a learning-to-rank approach in order to learn, for each node of the tree, a ranking of products appropriate to the users who reach that node. We experiment with two real-world datasets for cars and cameras, and a synthetic one. We use mean reciprocal rank to evaluate ShoppingAdvisor, and show how the performance increases by more than 50% along the path from root to leaf. We also show how collaborative recommendation algorithms such as k-nearest neighbor benefits from feature selection done by the ShoppingAdvisor tree. Our experiments show that ShoppingAdvisor produces good quality interpretable recommendations, while requiring less input from users and being able to handle the cold-start problem.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88180149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tadej Štajner, B. Thomee, Ana-Maria Popescu, M. Pennacchiotti, A. Jaimes
{"title":"Automatic selection of social media responses to news","authors":"Tadej Štajner, B. Thomee, Ana-Maria Popescu, M. Pennacchiotti, A. Jaimes","doi":"10.1145/2487575.2487659","DOIUrl":"https://doi.org/10.1145/2487575.2487659","url":null,"abstract":"Social media responses to news have increasingly gained in importance as they can enhance a consumer's news reading experience, promote information sharing and aid journalists in assessing their readership's response to a story. Given that the number of responses to an online news article may be huge, a common challenge is that of selecting only the most interesting responses for display. This paper addresses this challenge by casting message selection as an optimization problem. We define an objective function which jointly models the messages' utility scores and their entropy. We propose a near-optimal solution to the underlying optimization problem, which leverages the submodularity property of the objective function. Our solution first learns the utility of individual messages in isolation and then produces a diverse selection of interesting messages by maximizing the defined objective function. The intuitions behind our work are that an interesting selection of messages contains diverse, informative, opinionated and popular messages referring to the news article, written mostly by users that have authority on the topic. Our intuitions are embodied by a rich set of content, social and user features capturing the aforementioned aspects. We evaluate our approach through both human and automatic experiments, and demonstrate it outperforms the state of the art. Additionally, we perform an in-depth analysis of the annotated ``interesting'' responses, shedding light on the subjectivity around the selection process and the perception of interestingness.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78607046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mika Rautiainen, Jouni Sarvanko, A. Heikkinen, M. Ylianttila, V. Kostakos
{"title":"An online system with end-user services: mining novelty concepts from tv broadcast subtitles","authors":"Mika Rautiainen, Jouni Sarvanko, A. Heikkinen, M. Ylianttila, V. Kostakos","doi":"10.1145/2487575.2487715","DOIUrl":"https://doi.org/10.1145/2487575.2487715","url":null,"abstract":"Better tools for content-based access of video are needed to improve access to time-continuous video data. Particularly information about linear TV broadcast programs has been available in a form limited to program guides that provide short manually described overviews of the program content. Recent development in digitalization of TV broadcasting and emergence of web-based services for catch-up and on-demand viewing bring out new possibilities to access data. In this paper we introduce our data mining system and accompanying services for summarizing Finnish DVB broadcast streams from seven national channels. We describe how data mining of novelty concepts can be extracted from DVB subtitles to augment web-based \"Catch-Up TV Guide\" and \"Novelty Cloud\" TV services. Furthermore, our system allows accessing media fragments as Picture Quotes via generated word lists and provides content-based recommendations to find new programs that have content similar to the user selected programs. Our index consists of over 180 000 programs that are used to recommend relevant programs. The service has been under development and available online since 2010. It has registered over 5000 user sessions.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77466265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-label classification by mining label and instance correlations from heterogeneous information networks","authors":"Xiangnan Kong, Bokai Cao, Philip S. Yu","doi":"10.1145/2487575.2487577","DOIUrl":"https://doi.org/10.1145/2487575.2487577","url":null,"abstract":"Multi-label classification is prevalent in many real-world applications, where each example can be associated with a set of multiple labels simultaneously. The key challenge of multi-label classification comes from the large space of all possible label sets, which is exponential to the number of candidate labels. Most previous work focuses on exploiting correlations among different labels to facilitate the learning process. It is usually assumed that the label correlations are given beforehand or can be derived directly from data samples by counting their label co-occurrences. However, in many real-world multi-label classification tasks, the label correlations are not given and can be hard to learn directly from data samples within a moderate-sized training set. Heterogeneous information networks can provide abundant knowledge about relationships among different types of entities including data samples and class labels. In this paper, we propose to use heterogeneous information networks to facilitate the multi-label classification process. By mining the linkage structure of heterogeneous information networks, multiple types of relationships among different class labels and data samples can be extracted. Then we can use these relationships to effectively infer the correlations among different class labels in general, as well as the dependencies among the label sets of data examples inter-connected in the network. Empirical studies on real-world tasks demonstrate that the performance of multi-label classification can be effectively boosted using heterogeneous information net- works.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77547555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Big data analytics for healthcare","authors":"Jimeng Sun, C. Reddy","doi":"10.1145/2487575.2506178","DOIUrl":"https://doi.org/10.1145/2487575.2506178","url":null,"abstract":"Large amounts of heterogeneous medical data have become available in various healthcare organizations (payers, providers, pharmaceuticals). Those data could be an enabling resource for deriving insights for improving care delivery and reducing waste. The enormity and complexity of these datasets present great challenges in analyses and subsequent applications to a practical clinical environment. In this tutorial, we introduce the characteristics and related mining challenges on dealing with big medical data. Many of those insights come from medical informatics community, which is highly related to data mining but focuses on biomedical specifics. We survey various related papers from data mining venues as well as medical informatics venues to share with the audiences key problems and trends in healthcare analytics research, with different applications ranging from clinical text mining, predictive modeling, survival analysis, patient similarity, genetic data analysis, and public health. The tutorial will include several case studies dealing with some of the important healthcare applications.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85580195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}