Proceedings of the 19th ACM international conference on Information and knowledge management最新文献_第4页

Estimating accuracy for text classification tasks on large unlabeled data 对大型未标记数据的文本分类任务的准确度估计

Proceedings of the 19th ACM international conference on Information and knowledge management Pub Date : 2010-10-26 DOI: 10.1145/1871437.1871551

Snigdha Chaturvedi, T. Faruquie, L. V. Subramaniam, M. Mohania

{"title":"Estimating accuracy for text classification tasks on large unlabeled data","authors":"Snigdha Chaturvedi, T. Faruquie, L. V. Subramaniam, M. Mohania","doi":"10.1145/1871437.1871551","DOIUrl":"https://doi.org/10.1145/1871437.1871551","url":null,"abstract":"Rule based systems for processing text data encode the knowledge of a human expert into a rule base to take decisions based on interactions of the input data and the rule base. Similarly, supervised learning based systems can learn patterns present in a given dataset to make decisions on similar and other related data. Performances of both these classes of models are largely dependent on the training examples seen by them, based on which the learning was performed. Even though trained models might fit well on training data, the accuracies they yield on a new test data may be considerably different. Computing the accuracy of the learnt models on new unlabeled datasets is a challenging problem requiring costly labeling, and which is still likely to only cover a subset of the new data because of the large sizes of datasets involved. In this paper, we present a method to estimate the accuracy of a given model on a new dataset without manually labeling the data. We verify our method on large datasets for two shallow text processing tasks: document classification and postal address segmentation, and using both supervised machine learning methods and human generated rule based models.","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133598655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Web search solved?: all result rankings the same? 网络搜索解决了吗?:所有结果排名相同?

Proceedings of the 19th ACM international conference on Information and knowledge management Pub Date : 2010-10-26 DOI: 10.1145/1871437.1871507

H. Zaragoza, B. B. Cambazoglu, R. Baeza-Yates

引用次数: 29

WS-GraphMatching: a web service tool for graph matching WS-GraphMatching:用于图匹配的web服务工具

Proceedings of the 19th ACM international conference on Information and knowledge management Pub Date : 2010-10-26 DOI: 10.1145/1871437.1871779

Qiong Cheng, M. Ogihara, Jinpeng Wei, A. Zelikovsky

引用次数: 0

Searching consumer image collections using web-based concept expansion 使用基于web的概念扩展搜索消费者图像集合

Proceedings of the 19th ACM international conference on Information and knowledge management Pub Date : 2010-10-26 DOI: 10.1145/1871437.1871525

Mark D. Wood, A. Loui, S. Hibino

{"title":"Searching consumer image collections using web-based concept expansion","authors":"Mark D. Wood, A. Loui, S. Hibino","doi":"10.1145/1871437.1871525","DOIUrl":"https://doi.org/10.1145/1871437.1871525","url":null,"abstract":"As consumers accumulate more and more personal imagery, searching for specific images has become increasingly difficult. Consumers typically provide little or no annotations, and automated classifiers and concept tagging tools are limited in their scope and vocabulary. This work addresses this sparsity of semantic information by leveraging domain-specific information provided by online photo-sharing communities. Such information enables improved search by allowing user-provided search terms to be expanded into a set of semantically related concepts, using relevant semantic relationships provided by millions of users. Our system first extracts metadata using a modest number of image and event-based semantic classifiers, as well as any meaningful file or folder names. When users pose text-based queries, our system retrieves images from their personal image collections by leveraging Flickr's tag dataset for concept expansion. This approach enables users to search their collections without having to manually annotate their pictures. We compare the retrieval performance of using a Flickr-based concept expander with the performance obtained without concept expansion and with using a WordNet-based concept expander. The results demonstrate that common sense knowledge gleaned from online photo sharing communities can enable meaningful image search on consumer image collections, searches that would be impossible using only the available image metadata.","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115711046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

PTM: probabilistic topic mapping model for mining parallel document collections PTM:挖掘并行文档集合的概率主题映射模型

Proceedings of the 19th ACM international conference on Information and knowledge management Pub Date : 2010-10-26 DOI: 10.1145/1871437.1871696

Duo Zhang, Jimeng Sun, ChengXiang Zhai, A. Bose, Nikos Anerousis

引用次数: 7

Ontology emergence from folksonomies 从大众分类法中产生的本体

Proceedings of the 19th ACM international conference on Information and knowledge management Pub Date : 2010-10-26 DOI: 10.1145/1871437.1871578

Kaipeng Liu, Binxing Fang, Weizhe Zhang

引用次数: 24

Web page classification on child suitability 网页对儿童适用性的分类

Proceedings of the 19th ACM international conference on Information and knowledge management Pub Date : 2010-10-26 DOI: 10.1145/1871437.1871638

Carsten Eickhoff, P. Serdyukov, A. D. Vries

引用次数: 32

Constructing classification features using minimal predictive patterns 使用最小的预测模式构建分类特征

Proceedings of the 19th ACM international conference on Information and knowledge management Pub Date : 2010-10-26 DOI: 10.1145/1871437.1871549

Iyad Batal, M. Hauskrecht

{"title":"Constructing classification features using minimal predictive patterns","authors":"Iyad Batal, M. Hauskrecht","doi":"10.1145/1871437.1871549","DOIUrl":"https://doi.org/10.1145/1871437.1871549","url":null,"abstract":"Choosing good features to represent objects can be crucial to the success of supervised machine learning methods. Recently, there has been a great interest in applying data mining techniques to construct new classification features. The rationale behind this approach is that patterns (feature-value combinations) could capture more underlying semantics than single features. Hence the inclusion of some patterns can improve the classification performance. Currently, most methods adopt a two-phases approach by generating all frequent patterns in the first phase and selecting the discriminative patterns in the second phase. However, this approach has limited success because it is usually very difficult to correctly identify important predictive patterns in a large set of highly correlated frequent patterns. In this paper, we introduce the minimal predictive patterns framework to directly mine a compact set of highly predictive patterns. The idea is to integrate pattern mining and feature selection in order to filter out non-informative and redundant patterns while being generated. We propose some pruning techniques to speed up the mining process. Our extensive experimental evaluation on many datasets demonstrates the advantage of our method by outperforming many well known classifiers.","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114891892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 22

Hierarchical auto-tagging: organizing Q&A knowledge for everyone 分层自动标记:为每个人组织问答知识

Proceedings of the 19th ACM international conference on Information and knowledge management Pub Date : 2010-10-26 DOI: 10.1145/1871437.1871697

Kyosuke Nishida, Ko Fujimura

引用次数: 7

SEQUEL: query completion via pattern mining on multi-column structural data SEQUEL:通过模式挖掘在多列结构数据上完成查询

Proceedings of the 19th ACM international conference on Information and knowledge management Pub Date : 2010-10-26 DOI: 10.1145/1871437.1871782

Chuancong Gao, Qingyan Yang, Jianyong Wang

引用次数: 0