Proceedings of the 18th ACM conference on Information and knowledge management最新文献_第5页

Exploit the tripartite network of social tagging for web clustering 利用社会标签的三方网络进行网络聚类

Proceedings of the 18th ACM conference on Information and knowledge management Pub Date : 2009-11-02 DOI: 10.1145/1645953.1646167

Caimei Lu, Xin Chen, Eun Kyo Park

引用次数: 47

SPIDER: a system for scalable, parallel / distributed evaluation of large-scale RDF data SPIDER:用于大规模RDF数据的可伸缩、并行/分布式评估的系统

Proceedings of the 18th ACM conference on Information and knowledge management Pub Date : 2009-11-02 DOI: 10.1145/1645953.1646315

Hyunsik Choi, Jihoon Son, YongHyun Cho, M. Sung, Y. Chung

{"title":"SPIDER: a system for scalable, parallel / distributed evaluation of large-scale RDF data","authors":"Hyunsik Choi, Jihoon Son, YongHyun Cho, M. Sung, Y. Chung","doi":"10.1145/1645953.1646315","DOIUrl":"https://doi.org/10.1145/1645953.1646315","url":null,"abstract":"RDF is a data model for representing labeled directed graphs, and it is used as an important building block of semantic web. Due to its flexibility and applicability, RDF has been used in applications, such as semantic web, bioinformatics, and social networks. In these applications, large-scale graph datasets are very common. However, existing techniques are not effectively managing them. In this paper, we present a scalable, efficient query processing system for RDF data, named SPIDER, based on the well-known parallel/distributed computing framework, Hadoop. SPIDER consists of two major modules (1) the graph data loader, (2) the graph query processor. The loader analyzes and dissects the RDF data and places parts of data over multiple servers. The query processor parses the user query and distributes sub queries to cluster nodes. Also, the results of sub queries from multiple servers are gathered (and refined if necessary) and delivered to the user. Both modules utilize the MapReduce framework of Hadoop. In addition, our system supports some features of SPARQL query language. This prototype will be foundation to develop real applications with large-scale RDF graph data.","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"122 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120861616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 58

A machine learning approach for improved BM25 retrieval 改进BM25检索的机器学习方法

Proceedings of the 18th ACM conference on Information and knowledge management Pub Date : 2009-11-02 DOI: 10.1145/1645953.1646237

K. Svore, C. Burges

引用次数: 66

Fast shortest path distance estimation in large networks 大型网络中的快速最短路径距离估计

Proceedings of the 18th ACM conference on Information and knowledge management Pub Date : 2009-11-02 DOI: 10.1145/1645953.1646063

Michalis Potamias, F. Bonchi, C. Castillo, A. Gionis

{"title":"Fast shortest path distance estimation in large networks","authors":"Michalis Potamias, F. Bonchi, C. Castillo, A. Gionis","doi":"10.1145/1645953.1646063","DOIUrl":"https://doi.org/10.1145/1645953.1646063","url":null,"abstract":"In this paper we study approximate landmark-based methods for point-to-point distance estimation in very large networks. These methods involve selecting a subset of nodes as landmarks and computing offline the distances from each node in the graph to those landmarks. At runtime, when the distance between a pair of nodes is needed, it can be estimated quickly by combining the precomputed distances. We prove that selecting the optimal set of landmarks is an NP-hard problem, and thus heuristic solutions need to be employed. We therefore explore theoretical insights to devise a variety of simple methods that scale well in very large networks. The efficiency of the suggested techniques is tested experimentally using five real-world graphs having millions of edges. While theoretical bounds support the claim that random landmarks work well in practice, our extensive experimentation shows that smart landmark selection can yield dramatically more accurate results: for a given target accuracy, our methods require as much as 250 times less space than selecting landmarks at random. In addition, we demonstrate that at a very small accuracy loss our techniques are several orders of magnitude faster than the state-of-the-art exact methods. Finally, we study an application of our methods to the task of social search in large graphs.","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125078899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 317

Learning to rank with a novel kernel perceptron method 用一种新的核感知器方法学习排序

Proceedings of the 18th ACM conference on Information and knowledge management Pub Date : 2009-11-02 DOI: 10.1145/1645953.1646018

Xue-wen Chen, Haixun Wang, Xiaotong Lin

引用次数: 12

Acronym extraction and disambiguation in large-scale organizational web pages 大型组织网页中缩略语的提取与消歧

Proceedings of the 18th ACM conference on Information and knowledge management Pub Date : 2009-11-02 DOI: 10.1145/1645953.1646206

Shicong Feng, Yuhong Xiong, Conglei Yao, Liwei Zheng, W. Liu

引用次数: 10

Session details: KM classification and clustering II 会议详细内容:KM分类和聚类

Proceedings of the 18th ACM conference on Information and knowledge management Pub Date : 2009-11-02 DOI: 10.1145/3261240

Joost Kok

引用次数: 0

Fragment-based clustering ensembles 基于片段的集群集成

Proceedings of the 18th ACM conference on Information and knowledge management Pub Date : 2009-11-02 DOI: 10.1145/1645953.1646232

Ou Wu, Mingliang Zhu, Weiming Hu

引用次数: 2

Clustering queries for better document ranking 聚类查询以获得更好的文档排名

Proceedings of the 18th ACM conference on Information and knowledge management Pub Date : 2009-11-02 DOI: 10.1145/1645953.1646174

Yi Liu, Liangjie Zhang, Ruihua Song, Jian-Yun Nie, Ji-Rong Wen

引用次数: 4

Efficient feature weighting methods for ranking 高效的特征加权排序方法

Proceedings of the 18th ACM conference on Information and knowledge management Pub Date : 2009-11-02 DOI: 10.1145/1645953.1646100

Hwanjo Yu, Jinoh Oh, Wook-Shin Han

{"title":"Efficient feature weighting methods for ranking","authors":"Hwanjo Yu, Jinoh Oh, Wook-Shin Han","doi":"10.1145/1645953.1646100","DOIUrl":"https://doi.org/10.1145/1645953.1646100","url":null,"abstract":"Feature weighting or selection is a crucial process to identify an important subset of features from a data set. Removing irrelevant or redundant features can improve the generalization performance of ranking functions in information retrieval. Due to fundamental differences between classification and ranking, feature weighting methods developed for classification cannot be readily applied to feature weighting for ranking. A state of the art feature selection method for ranking, called GAS, has been recently proposed, which exploits importance of each feature and similarity between every pair of features. However, GAS must compute the similarity scores of all pairs of features, thus it is not scalable for high-dimensional data and its performance degrades on nonlinear ranking functions. This paper proposes novel algorithms, RankWrapper and RankFilter, which is scalable for high-dimensional data and also performs reasonably well on nonlinear ranking functions. RankWrapper and RankFilter are designed based on the key idea of Relief algorithm. Relief is a feature selection algorithm for classification, which exploits the notions of hits (data points within the same class) and misses (data points from different classes) for classification. However, there is no such notion of hits or misses in ranking. The proposed algorithms instead utilize the ranking distances of nearest data points in order to identify the key features for ranking. Our extensive experiments show that RankWrapper and RankFilter generate higher accuracy overall than the GAS and traditional Relief algorithms adapted for ranking, and run substantially faster than the GAS on high dimensional data.","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126947425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 21