Proceedings of the 18th ACM conference on Information and knowledge management最新文献_第10页

A term dependency-based approach for query terms ranking 用于查询词排序的基于词依赖的方法

Proceedings of the 18th ACM conference on Information and knowledge management Pub Date : 2009-11-02 DOI: 10.1145/1645953.1646114

Chia-Jung Lee, Ruey-Cheng Chen, Shao-Hang Kao, Pu-Jen Cheng

引用次数: 35

Spatio-temporal association rule mining framework for real-time sensor network applications 实时传感器网络应用的时空关联规则挖掘框架

Proceedings of the 18th ACM conference on Information and knowledge management Pub Date : 2009-11-02 DOI: 10.1145/1645953.1646224

H. Chok, L. Gruenwald

引用次数: 10

Multi-aspect opinion polling from textual reviews 从文本审查中进行多方面的民意调查

Proceedings of the 18th ACM conference on Information and knowledge management Pub Date : 2009-11-02 DOI: 10.1145/1645953.1646233

Jingbo Zhu, Huizhen Wang, Benjamin Ka-Yin T'sou, Muhua Zhu

引用次数: 122

Mining data streams with periodically changing distributions 挖掘具有周期性变化分布的数据流

Proceedings of the 18th ACM conference on Information and knowledge management Pub Date : 2009-11-02 DOI: 10.1145/1645953.1646065

Yingying Tao, M. Tamer Özsu

引用次数: 8

Pure spreading activation is pointless 纯粹的传播激活是没有意义的

Proceedings of the 18th ACM conference on Information and knowledge management Pub Date : 2009-11-02 DOI: 10.1145/1645953.1646264

M. Berthold, U. Brandes, Tobias Kötter, Martin Mader, U. Nagel, Kilian Thiel

引用次数: 42

Space-economical partial gram indices for exact substring matching 精确子串匹配的空间经济部分克索引

Proceedings of the 18th ACM conference on Information and knowledge management Pub Date : 2009-11-02 DOI: 10.1145/1645953.1645992

N. Tang, Lefteris Sidirourgos, P. Boncz

{"title":"Space-economical partial gram indices for exact substring matching","authors":"N. Tang, Lefteris Sidirourgos, P. Boncz","doi":"10.1145/1645953.1645992","DOIUrl":"https://doi.org/10.1145/1645953.1645992","url":null,"abstract":"Exact substring matching queries on large data collections can be answered using q-gram indices, that store for each occurring q-byte pattern an (ordered) posting list with the positions of all occurrences. Such gram indices are known to provide fast query response time and to allow the index to be created quickly even on huge disk-based datasets. Their main drawback is relatively large storage space, that is a constant multiple (typically >2) of the original data size, even when compression is used. In this work, we study methods to conserve the scalable creation time and efficient exact substring query properties of gram indices, while reducing storage space. To this end, we first propose a partial gram index based on a reduction from the problem of omitting indexed q-grams to the set cover problem. While this method is successful in reducing the size of the index, it generates false positives at query time, reducing efficiency. We then increase the accuracy of partial grams by splitting posting lists of frequent grams in a frequency-tuned set of signatures that take the bytes surrounding the grams into account. The resulting qs-gram scheme is tested on huge collections (up to 426GB) and is shown to achieve an almost 1:1 data:index size, and query performance even faster than normal gram methods, thanks to the reduced size and access cost.","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134319884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Feature selection for ranking using boosted trees 使用增强树进行特征选择

Proceedings of the 18th ACM conference on Information and knowledge management Pub Date : 2009-11-02 DOI: 10.1145/1645953.1646292

Feng Pan, Tim Converse, David Ahn, F. Salvetti, Gianluca Donato

{"title":"Feature selection for ranking using boosted trees","authors":"Feng Pan, Tim Converse, David Ahn, F. Salvetti, Gianluca Donato","doi":"10.1145/1645953.1646292","DOIUrl":"https://doi.org/10.1145/1645953.1646292","url":null,"abstract":"Modern search engines have to be fast to satisfy users, so there are hard back-end latency requirements. The set of features useful for search ranking functions, though, continues to grow, making feature computation a latency bottleneck. As a result, not all available features can be used for ranking, and in fact, much of the time, only a small percentage of these features can be used. Thus, it is crucial to have a feature selection mechanism that can find a subset of features that both meets latency requirements and achieves high relevance. To this end, we explore different feature selection methods using boosted regression trees, including both greedy approaches (selecting the features with highest relative importance as computed by boosted trees; discounting importance by feature similarity and a randomized approach. We evaluate and compare these approaches using data from a commercial search engine. The experimental results show that the proposed randomized feature selection with feature-importance-based backward elimination outperforms greedy approaches and achieves a comparable relevance with 30 features to a full-feature model trained with 419 features and the same modeling parameters.","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134343426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 64

Session details: KM information extraction II 会议详情:KM信息提取

Proceedings of the 18th ACM conference on Information and knowledge management Pub Date : 2009-11-02 DOI: 10.1145/3261222

R. Wong

引用次数: 0

Injecting purpose and trust into data anonymisation 为数据匿名注入目的和信任

Proceedings of the 18th ACM conference on Information and knowledge management Pub Date : 2009-11-02 DOI: 10.1145/1645953.1646166

Xiaoxun Sun, Hua Wang, Jiuyong Li

引用次数: 45

Bitmap indexes for relational XML twig query processing 用于关系XML枝查询处理的位图索引

Proceedings of the 18th ACM conference on Information and knowledge management Pub Date : 2009-11-02 DOI: 10.1145/1645953.1646014

Kyong-Ha Lee, Bongki Moon

{"title":"Bitmap indexes for relational XML twig query processing","authors":"Kyong-Ha Lee, Bongki Moon","doi":"10.1145/1645953.1646014","DOIUrl":"https://doi.org/10.1145/1645953.1646014","url":null,"abstract":"Due to an increasing volume of XML data, it is considered prudent to store XML data on an industry-strength database system instead of relying on a domain specific application or a file system. For shredded XML data stored in the relational tables, however, it may not be straightforward to apply existing algorithms for twig query processing, because most of the algorithms require XML data to be accessed in a form of streams of elements grouped by their tags and sorted in a particular order. In order to support XML query processing within the common framework of relational database systems, we first propose several bitmap indexes for supporting holistic twig joins on XML data stored in the relational tables. Since bitmap indexes are well supported in most of the commercial and open-source database systems, the proposed bitmap indexes and twig query processing algorithms can be incorporated into the relational query processing framework with more ease. The proposed query processing algorithms are efficient in terms of both time and space, since the compressed bitmap indexes stay compressed during query processing. In addition, we propose a hybrid index which computes twig query solutions with only bit-vectors, without accessing labeled XML elements stored in the relational tables.","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133460866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3