Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management最新文献
{"title":"Building a generic debugger for information extraction pipelines","authors":"A. Sarma, Alpa Jain, P. Bohannon","doi":"10.1145/2063576.2063933","DOIUrl":"https://doi.org/10.1145/2063576.2063933","url":null,"abstract":"Complex information extraction (IE) pipelines are becoming an integral component of most text processing frameworks. We introduce a first system to help IE users analyze extraction pipeline semantics and operator transformations interactively while debugging. This allows the effort to be proportional to the need, and to focus on the portions of the pipeline under the greatest suspicion. We present a generic debugger for running post-execution analysis of any IE pipeline consisting of arbitrary types of operators. For this, we propose an effective provenance model for IE pipelines which captures a variety of operator types, ranging from those for which full to no specifications are available. We have evaluated our proposed algorithms and provenance model on large-scale real-world extraction pipelines.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"49 1","pages":"2229-2232"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90660444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using games with a purpose and bootstrapping to create domain-specific sentiment lexicons","authors":"A. Weichselbraun, Stefan Gindl, A. Scharl","doi":"10.1145/2063576.2063729","DOIUrl":"https://doi.org/10.1145/2063576.2063729","url":null,"abstract":"Sentiment detection analyzes the positive or negative polarity of text. The field has received considerable attention in recent years, since it plays an important role in providing means to assess user opinions regarding an organization's products, services, or actions. Approaches towards sentiment detection include machine learning techniques as well as computationally less expensive methods. Both approaches rely on the use of language-specific sentiment lexicons, which are lists of sentiment terms with their corresponding sentiment value. The effort involved in creating, customizing, and extending sentiment lexicons is considerable, particularly if less common languages and domains are targeted without access to appropriate language resources. This paper proposes a semi-automatic approach for the creation of sentiment lexicons which assigns sentiment values to sentiment terms via crowd-sourcing. Furthermore, it introduces a bootstrapping process operating on unlabeled domain documents to extend the created lexicons, and to customize them according to the particular use case. This process considers sentiment terms as well as sentiment indicators occurring in the discourse surrounding a articular topic. Such indicators are associated with a positive or negative context in a particular domain, but might have a neutral connotation in other domains. A formal evaluation shows that bootstrapping considerably improves the method's recall. Automatically created lexicons yield a performance comparable to professionally created language resources such as the General Inquirer.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"59 1","pages":"1053-1060"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91015992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hongliang Fei, R. Jiang, Yuhao Yang, Bo Luo, Jun Huan
{"title":"Content based social behavior prediction: a multi-task learning approach","authors":"Hongliang Fei, R. Jiang, Yuhao Yang, Bo Luo, Jun Huan","doi":"10.1145/2063576.2063719","DOIUrl":"https://doi.org/10.1145/2063576.2063719","url":null,"abstract":"Information Flow Studies analyze the principles and mechanisms of social information distribution and is an essential research topic in social networks. Traditional approaches are primarily based on the social network graph topology. However, topology itself can not accurately reflect the user interests or activities. In this paper, we adopt a \"microeconomics\" approach to study social information diffusion and aim to answer the question that how social information flow and socialization behaviors are related to content similarity and user interests. In particular, we study content-based social activity prediction, i.e., to predict a user's response (e.g. comment or like) to their friends' postings (e.g. blogs) w.r.t. message content. In our solution, we cast the social behavior prediction problem as a multi-task learning problem, in which each task corresponds to a user. We have designed a novel multi-task learning algorithm that is specifically designed for learning information flow in social networks. In our model, we apply l1 and Tikhonov regularization to obtain a sparse and smooth model in a linear multi-task learning framework. Using comprehensive experimental study, we have demonstrated the effectiveness of the proposed learning method.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"6 1","pages":"995-1000"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91165521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Information diffusion in social networks: observing and affecting what society cares about","authors":"D. Agrawal, Ceren Budak, A. E. Abbadi","doi":"10.1145/2063576.2064036","DOIUrl":"https://doi.org/10.1145/2063576.2064036","url":null,"abstract":"Information diffusion in social networks provide great opportunities for political and social change as well as societal education. Therefore understanding information diffusion in social networks is a critical research goal. This greater understanding can be achieved through data analysis, development of reliable models that can predict outcomes of social processes, and ultimately the creation of applications that can shape the outcome of these processes. In this tutorial, we aim to provide an overview of such recent research based on a wide variety of techniques such as optimization algorithms, data mining, data streams covering a large number of problems such as influence spread maximization, misinformation limitation and study of trends in online social networks.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"76 1","pages":"2609-2610"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90912052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On bias problem in relevance feedback","authors":"Qianli Xing, Yi Zhang, Lanbo Zhang","doi":"10.1145/2063576.2063866","DOIUrl":"https://doi.org/10.1145/2063576.2063866","url":null,"abstract":"Relevance feedback is an effective approach to improve retrieval quality over the initial query. Typical relevance feedback methods usually select top-ranked documents for relevance judgments, then query expansion or model updating are carried out based on the feedback documents. However, the number of feedback documents is usually limited due to expensive human labeling. Thus relevant documents in the feedback set are hardly representative of all relevant documents and the feedback set is actually biased. As a result, the performance of relevance feedback will get hurt. In this paper, we first show how and where the bias problem exists through experiments. Then we study how the bias can be reduced by utilizing the unlabeled documents. After analyzing the usefulness of a document to relevance feedback, we propose an approach that extends the feedback set with carefully selected unlabeled documents by heuristics. Our experiment results show that the extended feedback set has less bias than the original feedback set and better performance can be achieved when the extended feedback set is used for relevance feedback.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"15 1","pages":"1965-1968"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89579408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kai-Yang Chiang, Nagarajan Natarajan, Ambuj Tewari, I. Dhillon
{"title":"Exploiting longer cycles for link prediction in signed networks","authors":"Kai-Yang Chiang, Nagarajan Natarajan, Ambuj Tewari, I. Dhillon","doi":"10.1145/2063576.2063742","DOIUrl":"https://doi.org/10.1145/2063576.2063742","url":null,"abstract":"We consider the problem of link prediction in signed networks. Such networks arise on the web in a variety of ways when users can implicitly or explicitly tag their relationship with other users as positive or negative. The signed links thus created reflect social attitudes of the users towards each other in terms of friendship or trust. Our first contribution is to show how any quantitative measure of social imbalance in a network can be used to derive a link prediction algorithm. Our framework allows us to reinterpret some existing algorithms as well as derive new ones. Second, we extend the approach of Leskovec et al. (2010) by presenting a supervised machine learning based link prediction method that uses features derived from longer cycles in the network. The supervised method outperforms all previous approaches on 3 networks drawn from sources such as Epinions, Slashdot and Wikipedia. The supervised approach easily scales to these networks, the largest of which has 132k nodes and 841k edges. Most real-world networks have an overwhelmingly large proportion of positive edges and it is therefore easy to get a high overall accuracy at the cost of a high false positive rate. We see that our supervised method not only achieves good accuracy for sign prediction but is also especially effective in lowering the false positive rate.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"51 1","pages":"1157-1162"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89417575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning kernels with upper bounds of leave-one-out error","authors":"Yong Liu, Shizhong Liao, Yuexian Hou","doi":"10.1145/2063576.2063927","DOIUrl":"https://doi.org/10.1145/2063576.2063927","url":null,"abstract":"We propose a new leaning method for Multiple Kernel Learning (MKL) based on the upper bounds of the leave-one-out error that is an almost unbiased estimate of the expected generalization error. Specifically, we first present two new formulations for MKL by minimizing the upper bounds of the leave-one-out error. Then, we compute the derivatives of these bounds and design an efficient iterative algorithm for solving these formulations. Experimental results show that the proposed method gives better accuracy results than that of both SVM with the uniform combination of basis kernels and other state-of-art kernel learning approaches.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"149 1","pages":"2205-2208"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76751948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Ananiadou, Doheon Lee, S. Navathe, Min-Keun Song
{"title":"DTMBIO 2011: international workshop on data and textmining in biomedical informatics","authors":"S. Ananiadou, Doheon Lee, S. Navathe, Min-Keun Song","doi":"10.1145/2063576.2064041","DOIUrl":"https://doi.org/10.1145/2063576.2064041","url":null,"abstract":"ACM Fifth International Workshop on Data and Text Mining in Biomedical Informatics (DTMBIO 11) organizers are pleased to announce that the fifth DTMBIO will be held in conjunction with CIKM, one of the largest data and text mining conferences. While CIKM presents the state-of-the-art research in informatics with the primary focus on data and text mining, the main focus of DTMBIO is on biomedical informatics. DTMBIO delegates will bring forth interesting applications of up-to-date informatics in the context of biomedical research.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"13 1","pages":"2617-2618"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78502759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A framework for personalized and collaborative clustering of search results","authors":"D. Anastasiu, Byron J. Gao, David J. Buttler","doi":"10.1145/2063576.2063662","DOIUrl":"https://doi.org/10.1145/2063576.2063662","url":null,"abstract":"How to organize and present search results plays a critical role in the utility of search engines. Due to the unprecedented scale of the Web and diversity of search results, the common strategy of ranked lists has become increasingly inadequate, and clustering has been considered as a promising alternative. Clustering divides a long list of disparate search results into a few topic-coherent clusters, allowing the user to quickly locate relevant results by topic navigation. While many clustering algorithms have been proposed that innovate on the automatic clustering procedure, we introduce ClusteringWiki, the first prototype and framework for personalized clustering that allows direct user editing of the clustering results. Through a Wiki interface, the user can edit and annotate the membership, structure and labels of clusters for a personalized presentation. In addition, the edits and annotations can be shared among users as a mass-collaborative way of improving search result organization and search engine utility.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"12 1","pages":"573-582"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78505434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Designing an ensemble classifier over subspace classifiers using iterative convergence routine","authors":"B. Vinzamuri, K. Karlapalem","doi":"10.1145/2063576.2063678","DOIUrl":"https://doi.org/10.1145/2063576.2063678","url":null,"abstract":"There can be multiple classifiers for a given data set. One way to generate multiple classifiers is to use subspaces of the attribute sets. In this paper, we generate subspace classifiers by an iterative convergence routine to build an ensemble classifier. Experimental evaluation covers the cases of both labelled and unlabelled (blind) data separately. We evaluate our approach on many benchmark UC Irvine datasets to assess the robustness of our approach with varying induced noise levels. We explicitly compare and present the utility of the clusterings generated for classification using several diverse clustering dissimilarity metrics. Results show that our ensemble classifier is a more robust classifier in comparison to different multi-class classification approaches.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"1 1","pages":"693-698"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77994958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}