{"title":"Named entity disambiguation by leveraging wikipedia semantic knowledge","authors":"Xianpei Han, Jun Zhao","doi":"10.1145/1645953.1645983","DOIUrl":"https://doi.org/10.1145/1645953.1645983","url":null,"abstract":"Name ambiguity problem has raised an urgent demand for efficient, high-quality named entity disambiguation methods. The key problem of named entity disambiguation is to measure the similarity between occurrences of names. The traditional methods measure the similarity using the bag of words (BOW) model. The BOW, however, ignores all the semantic relations such as social relatedness between named entities, associative relatedness between concepts, polysemy and synonymy between key terms. So the BOW cannot reflect the actual similarity. Some research has investigated social networks as background knowledge for disambiguation. Social networks, however, can only capture the social relatedness between named entities, and often suffer the limited coverage problem. To overcome the previous methods' deficiencies, this paper proposes to use Wikipedia as the background knowledge for disambiguation, which surpasses other knowledge bases by the coverage of concepts, rich semantic information and up-to-date content. By leveraging Wikipedia's semantic knowledge like social relatedness between named entities and associative relatedness between concepts, we can measure the similarity between occurrences of names more accurately. In particular, we construct a large-scale semantic network from Wikipedia, in order that the semantic knowledge can be used efficiently and effectively. Based on the constructed semantic network, a novel similarity measure is proposed to leverage Wikipedia semantic knowledge for disambiguation. The proposed method has been tested on the standard WePS data sets. Empirical results show that the disambiguation performance of our method gets 10.7% improvement over the traditional BOW based methods and 16.7% improvement over the traditional social network based methods.","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132844693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"M-COPE: a multiple continuous query processing engine","authors":"H. Park, S. Shin, Sang Hyuck Na, W. Lee","doi":"10.1145/1645953.1646303","DOIUrl":"https://doi.org/10.1145/1645953.1646303","url":null,"abstract":"A data stream management system (DSMS) should support an efficient evaluation scheme for long-running continuous queries over infinite data streams. This demonstration presents a scalable query processing engine, M-COPE (Multiple Continuous Query Processing Engine) developed to evaluate multiple continuous queries efficiently. A multiple query optimization scheme implemented in the system generates a single network of operations as an execution plan for registered queries in order to maximize the reuse of the intermediate results of common sub-expressions in the queries adaptively. In this paper, we describe the overall architecture of M-COPE along with its special features. Network traffic flow streams are used to demonstrate the main features of M-COPE.","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"387 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132375332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Event detection from flickr data through wavelet-based spatial analysis","authors":"Ling Chen, Abhishek Roy","doi":"10.1145/1645953.1646021","DOIUrl":"https://doi.org/10.1145/1645953.1646021","url":null,"abstract":"Detecting events from web resources has attracted increasing research interests in recent years. Our focus in this paper is to detect events from photos on Flickr, an Internet image community website. The results can be used to facilitate user searching and browsing photos by events. The problem is challenging considering: (1) Flickr data is noisy, because there are photos unrelated to real-world events; (2) It is not easy to capture the content of photos. This paper presents our effort in detecting events from Flickr photos by exploiting the tags supplied by users to annotate photos. In particular, the temporal and locational distributions of tag usage are analyzed in the first place, where a wavelet transform is employed to suppress noise. Then, we identify tags related with events, and further distinguish between tags of aperiodic events and those of periodic events. Afterwards, event-related tags are clustered such that each cluster, representing an event, consists of tags with similar temporal and locational distribution patterns as well as with similar associated photos. Finally, for each tag cluster, photos corresponding to the represented event are extracted. We evaluate the performance of our approach using a set of real data collected from Flickr. The experimental results demonstrate that our approach is effective in detecting events from the Flickr photo collection.","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128822803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuqing Wu, N. Lele, Rashmi Aroskar, Sharanya Chinnusamy, Sofia Brenes
{"title":"XQGen: an algebra-based XPath query generator for micro-benchmarking","authors":"Yuqing Wu, N. Lele, Rashmi Aroskar, Sharanya Chinnusamy, Sofia Brenes","doi":"10.1145/1645953.1646328","DOIUrl":"https://doi.org/10.1145/1645953.1646328","url":null,"abstract":"We propose XQGen, a stand-alone, algebra-based XPath generator to aid engineers in testing and improving the design of XML query engines. XQGen takes an XML schema sketch and user configurations, such as number of queries, query types, duplication factors, and branching factors as input, and generates a set of queries that comform to the schema and configurations. In addition, given a set of label-paths as workload input, XQGen is capable of generating query sets that honor the workload.","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133756991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A unified relevance model for opinion retrieval","authors":"Xuanjing Huang, W. Bruce Croft","doi":"10.1145/1645953.1646075","DOIUrl":"https://doi.org/10.1145/1645953.1646075","url":null,"abstract":"Representing the information need is the greatest challenge for opinion retrieval. Typical queries for opinion retrieval are composed of either just content words, or content words with a small number of cue \"opinion\" words. Both are inadequate for retrieving opinionated documents. In this paper, we develop a general formal framework--the opinion relevance model--to represent an information need for opinion retrieval. We explore a series of methods to automatically identify the most appropriate opinion words for query expansion, including using query independent sentiment resources. We also propose a relevance feedback-based approach to extract opinion words. Both query-independent and query-dependent methods can also be integrated into a more effective mixture relevance model. Finally, opinion retrieval experiments are presented for the Blog06 and COAE08 text collections. The results show that, significant improvements can always be obtained by this opinion relevance model whether sentiment resources are available or not.","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124192399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Context sensitive synonym discovery for web search queries","authors":"Xing Wei, Fuchun Peng, Huihsin Tseng, Yumao Lu, Benoît Dumoulin","doi":"10.1145/1645953.1646178","DOIUrl":"https://doi.org/10.1145/1645953.1646178","url":null,"abstract":"We propose a simple yet effective approach to context sensitive synonym discovery for Web search queries based on co-click analysis; i.e., analyzing queries leading to clicking same documents. In addition to deriving word based synonyms, we also derive concept based synonyms with the help of query segmentation. Evaluation results show that this approach dramatically outperforms the thesaurus based synonym replacement method in keeping search intent, from accuracy of 40% to above 80%.","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114360133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient processing of twig pattern matching in fuzzy XML","authors":"Jian Liu, Z. Ma, Li Yan","doi":"10.1145/1645953.1645971","DOIUrl":"https://doi.org/10.1145/1645953.1645971","url":null,"abstract":"In order to find all occurrences of a twig pattern in XML documents, a considerable amount of twig pattern matching algorithms have been proposed. At the same time, previous work mainly focuses on twig pattern query under the complete semantics. However, there is often a need to produce partial answers because XML data may have missing sub-elements. Furthermore, the existed works fall short in their ability to support twig pattern query under different semantics in fuzzy XML. In this paper, we study the problem of twig matches in fuzzy XML. We begin by introducing the extended region scheme to accurately and effectively represent nodes information in fuzzy XML. We then discuss the fuzzy query semantics and compute the membership information by using Einstein operator instead of Zadeh's min-max technique. On the basis, we propose two efficient algorithms for querying twig under complete and incomplete semantics in fuzzy XML. The experimental results show that our proposed algorithms can perform on the fuzzy twig pattern matching efficiently.","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114387622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Oktie Hassanzadeh, Anastasios Kementsietsidis, Lipyeow Lim, Renée J. Miller, Min Wang
{"title":"A framework for semantic link discovery over relational data","authors":"Oktie Hassanzadeh, Anastasios Kementsietsidis, Lipyeow Lim, Renée J. Miller, Min Wang","doi":"10.1145/1645953.1646084","DOIUrl":"https://doi.org/10.1145/1645953.1646084","url":null,"abstract":"Discovering links between different data items in a single data source or across different data sources is a challenging problem faced by many information systems today. In particular, the recent Linking Open Data (LOD) community project has highlighted the paramount importance of establishing semantic links among web data sources. Currently, LOD sources provide billions of RDF triples, but only millions of links between data sources. Many of these data sources are published using tools that operate over relational data stored in a standard RDBMS. In this paper, we present a framework for discovery of semantic links from relational data. Our framework is based on declarative specification of linkage requirements by a user. We illustrate the use of our framework using several link discovery algorithms on a real world scenario. Our framework allows data publishers to easily find and publish high-quality links to other data sources, and therefore could significantly enhance the value of the data in the next generation of web.","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"158 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116428214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dissemination of heterogeneous XML data in publish/subscibe systems","authors":"Y. Ni, C. Chan","doi":"10.1145/1645953.1645972","DOIUrl":"https://doi.org/10.1145/1645953.1645972","url":null,"abstract":"The publish-subscribe paradigm is an effective approach for data publishers to asynchronously disseminate relevant data to a large number of data subscribers. A lot of recent research has focused on extending this paradigm to support content-based delivery of XML data using more expressive XML-based subscription specifications that allow constraints on both data contents as well as structure. However, due to the heterogeneous data schemas used by different data publishers even for data in the same domain, an important challenge is how to efficiently and effectively disseminate relevant data to subscribers whose subscriptions might be specified based on schemas that are different from those used by the data publishers. In this paper, we examine the options to resolve this schema heterogeneity problem in XML data dissemination, and propose a novel paradigm that is based on data rewriting. Our experimental results demonstrate the effectiveness of the data rewriting paradigm and identifies the tradeoffs of the various approaches.","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123554536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"HDDBrs middleware for implementing highly available distributed databases","authors":"Rim Moussa","doi":"10.1145/1645953.1646308","DOIUrl":"https://doi.org/10.1145/1645953.1646308","url":null,"abstract":"Our demo presents HDDBRS, a middle tier offering to clients a highly available distributed database interface using Reed Solomon codes to compute parity data. Parity data is stored in dedicated parity DB backends, is synchronously updated and allows recovering from multiple DB backend unavailability. HDDBRS middle tier is implemented in JAVA using standard technology, and is designed to be interoperable with any database engine that provides a JDBC driver and implements X/open XA protocol.","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123564784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}