Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management最新文献
Peng Li, Bin Wang, Wei Jin, Jian-Yun Nie, Zhiwei Shi, Ben He
{"title":"Exploring categorization property of social annotations for information retrieval","authors":"Peng Li, Bin Wang, Wei Jin, Jian-Yun Nie, Zhiwei Shi, Ben He","doi":"10.1145/2063576.2063659","DOIUrl":"https://doi.org/10.1145/2063576.2063659","url":null,"abstract":"User generated social annotations provide extra information for describing document contents. In this paper, we propose an effective method to model the categorization property of social annotations and explore the potential of combining it with classical language models for improving retrieval performance. Specifically, a novel TR-LDA model is presented to take annotations as an additional source for generating document contents apart from the document itself. We provide strategies for representing and weighting the categorization property and develop an efficient inference algorithm, where space saving is taken into account. Experiments are carried out on synthetic datasets, where documents and queries come from the standard evaluation conference TREC and annotations come from the website Delicious.com. Our results demonstrate the effectiveness of the proposed method on the ad-hoc retrieval task, which significantly outperforms state-of-art baselines.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"1 1","pages":"557-562"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77052226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiyin He, M. de Rijke, M. Sevenster, R. V. Ommering, Y. Qian
{"title":"Generating links to background knowledge: a case study using narrative radiology reports","authors":"Jiyin He, M. de Rijke, M. Sevenster, R. V. Ommering, Y. Qian","doi":"10.1145/2063576.2063845","DOIUrl":"https://doi.org/10.1145/2063576.2063845","url":null,"abstract":"Automatically annotating texts with background information has recently received much attention. We conduct a case study in automatically generating links from narrative radiology reports to Wikipedia. Such links help users understand the medical terminology and thereby increase the value of the reports. Direct applications of existing automatic link generation systems trained on Wikipedia to our radiology data do not yield satisfactory results. Our analysis reveals that medical phrases are often syntactically regular but semantically complicated, e.g., containing multiple concepts or concepts with multiple modifiers. The latter property is the main reason for the failure of existing systems. Based on this observation, we propose an automatic link generation approach that takes into account these properties. We use a sequential labeling approach with syntactic features for anchor text identification in order to exploit syntactic regularities in medical terminology. We combine this with a sub-anchor based approach to target finding, which is aimed at coping with the complex semantic structure of medical phrases. Empirical results show that the proposed system effectively improves the performance over existing systems.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"96 1","pages":"1867-1876"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80961304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An algorithm for axiom pinpointing in EL+ and its incremental variant","authors":"Xiaojun Cheng, G. Qi","doi":"10.1145/2063576.2063985","DOIUrl":"https://doi.org/10.1145/2063576.2063985","url":null,"abstract":"Axiom pinpointing plays an important role in the development and maintenance of ontologies. It helps the user to comprehend an unwanted entailment of an ontology by presenting all minimal subsets of the ontology which are responsible for the entailment (called MinAs). In this paper, we consider the problem of axiom pinpointing in description logic EL+, which underpins OWL 2 EL, a profile of the latest version of Web Ontology Language (OWL). We propose a novel method to compute all MinAs that utilizes the hierarchy information obtained from the classification of an EL+ ontology. The advantage of our method over an existing labeled classification based method is that we do not attach labels to entailed subsumptions, which can be memory exhaustion for large scale ontologies. We further consider axiom pinpointing in EL+ when ontologies change. An incremental algorithm is given to compute all MinAs by reusing MinAs previously computed.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"5 1","pages":"2433-2436"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84089093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Privacy preservation by independent component analysis and variance control","authors":"Chih-Ming Hsu, Ming-Syan Chen","doi":"10.1145/2063576.2063709","DOIUrl":"https://doi.org/10.1145/2063576.2063709","url":null,"abstract":"The primary objective of privacy preservation is to protect an individual's confidential information in released data sets. In recent years, several simulation-based approaches for privacy preservation have been proposed. The idea is to generate a synthetic data set with the constraint that the probability distribution is as close as possible to that of the original set. In this paper, we propose two frameworks for simulation-based privacy preservation of multivariate numerical data. The first framework, called PRIMP (PRivacy preserving by Independent coMPonents), is based on independent component analysis (ICA). It is shown empirically that PRIMP outperforms other simulation-based approaches in terms of Spearman's rank correlation and Kendall's tau correlation. The second approach proposed is a hybrid method that combines PRIMP and Cholesky's decomposition technique. It is shown empirically that the hybrid method preserves the covariance matrix of the original data exactly. The method also resolves the problem of generating good seeds for the Cholesky-based approach. Although the empirical results show that the hybrid approach is not always better than the PRIMP in terms of Spearman's rank correlation and Kendall's tau correlation, in theory, the risk of information leakage under the hybrid approach is much less than that under PRIMP.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"9 1","pages":"925-930"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78465717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Topic modeling for named entity queries","authors":"Xiaobing Xue, Xiaoxin Yin","doi":"10.1145/2063576.2063877","DOIUrl":"https://doi.org/10.1145/2063576.2063877","url":null,"abstract":"Named entities are observed in a large portion of web search queries (named entity queries), where each entity can be associated with many different query terms that refer to various aspects of this entity. Organizing these query terms into topics helps understand major search intents about entities and the discovered topics are useful for applications such as query suggestion. Furthermore, we notice that named entities can often be organized into categories and those from the same category share many generic topics. Therefore, working on a category of named entities instead of individual ones helps avoid the problems caused by the sparsity and noise in the data. In this paper, Named Entity Topic Model (NETM) is proposed to discover generic topics for a category of named entities, where the quality of the generic topics is improved through the model design and the parameter initialization. Experiments based on query log data show that NETM discovers high-quality topics and outperforms the state-of-the-art techniques by 12.8% based on F1 measure.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"19 Suppl 1 1","pages":"2009-2012"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78142701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Partial duplicate detection for large book collections","authors":"I. Z. Yalniz, E. Can, R. Manmatha","doi":"10.1145/2063576.2063647","DOIUrl":"https://doi.org/10.1145/2063576.2063647","url":null,"abstract":"A framework is presented for discovering partial duplicates in large collections of scanned books with optical character recognition (OCR) errors. Each book in the collection is represented by the sequence of words (in the order they appear in the text) which appear only once in the book. These words are referred to as \"unique words\" and they constitute a small percentage of all the words in a typical book. Along with the order information the set of unique words provides a compact representation which is highly descriptive of the content and the flow of ideas in the book. By aligning the sequence of unique words from two books using the longest common subsequence (LCS) one can discover whether two books are duplicates. Experiments on several datasets show that DUPNIQ is more accurate than traditional methods for duplicate detection such as shingling and is fast. On a collection of 100K scanned English books DUPNIQ detects partial duplicates in 30 min using 350 cores and has precision 0.996 and recall 0.833 compared to shingling with precision 0.992 and recall 0.720. The technique works on other languages as well and is demonstrated for a French dataset.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"238 1","pages":"469-474"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82244274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Detecting anomalies in graphs with numeric labels","authors":"Michael Davis, Weiru Liu, P. Miller, G. Redpath","doi":"10.1145/2063576.2063749","DOIUrl":"https://doi.org/10.1145/2063576.2063749","url":null,"abstract":"This paper presents Yagada, an algorithm to search labelled graphs for anomalies using both structural data and numeric attributes. Yagada is explained using several security-related examples and validated with experiments on a physical Access Control database. Quantitative analysis shows that in the upper range of anomaly thresholds, Yagada detects twice as many anomalies as the best-performing numeric discretization algorithm. Qualitative evaluation shows that the detected anomalies are meaningful, representing a combination of structural irregularities and numerical outliers.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"209 1","pages":"1197-1202"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88058276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Baichuan Li, Xiance Si, Michael R. Lyu, Irwin King, E. Chang
{"title":"Question identification on twitter","authors":"Baichuan Li, Xiance Si, Michael R. Lyu, Irwin King, E. Chang","doi":"10.1145/2063576.2063996","DOIUrl":"https://doi.org/10.1145/2063576.2063996","url":null,"abstract":"In this paper, we investigate the novel problem of automatic question identification in the microblog environment. It contains two steps: detecting tweets that contain questions (we call them \"interrogative tweets\") and extracting the tweets which really seek information or ask for help (so called \"qweets\") from interrogative tweets. To detect interrogative tweets, both traditional rule-based approach and state-of-the-art learning-based method are employed. To extract qweets, context features like short urls and Tweet-specific features like Retweets are elaborately selected for classification. We conduct an empirical study with sampled one hour's English tweets and report our experimental results for question identification on Twitter.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"45 1","pages":"2477-2480"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83939842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"More or better: on trade-offs in compacting textual problem solution repositories","authors":"P Deepak, Sutanu Chakraborti, D. Khemani","doi":"10.1145/2063576.2063956","DOIUrl":"https://doi.org/10.1145/2063576.2063956","url":null,"abstract":"In this paper, we look into the problem of filtering problem solution repositories (from sources such as community-driven question answering systems) to render them more suitable for usage in knowledge reuse systems. We explore harnessing the fuzzy nature of usability of a solution to a problem, for such compaction. Fuzzy usabilities lead to several challenges; notably, the trade-off between choosing generic or better solutions. We develop an approach that can heed to a user specification of the trade-off between these criteria and introduce several quality measures based on fuzzy usability estimates to ascertain the quality of a problem-solution repository for usage in a Case Based Reasoning system. We establish, through a detailed empirical analysis, that our approach outperforms state-of-the-art approaches on virtually all quality measures.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"1 1","pages":"2321-2324"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82934697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Extracting multi-dimensional relations: a generative model of groups of entities in a corpus","authors":"C. Yeung, Tomoharu Iwata","doi":"10.1145/2063576.2063750","DOIUrl":"https://doi.org/10.1145/2063576.2063750","url":null,"abstract":"Extracting relations among different entities from various data sources has been an important topic in data mining. While many methods focus only on a single type of relations, real world entities maintain relations that contain much richer information. We propose a hierarchical Bayesian model for extracting multi-dimensional relations among entities from a text corpus. Using data from Wikipedia, we show that our model can accurately predict the relevance of an entity given the topic of the document as well as the set of entities that are already mentioned in that document.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"36 1","pages":"1203-1208"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89215350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}