Philipp Langer, P. Schulze, Stefan George, Matthias Kohnen, Tobias Metzke, Ziawasch Abedjan, G. Kasneci
{"title":"为DBpedia事实分配全局相关性分数","authors":"Philipp Langer, P. Schulze, Stefan George, Matthias Kohnen, Tobias Metzke, Ziawasch Abedjan, G. Kasneci","doi":"10.1109/ICDEW.2014.6818334","DOIUrl":null,"url":null,"abstract":"Knowledge bases have become ubiquitous assets in today's Web. They provide access to billions of statements about real-world entities derived from governmental, institutional, product-oriented, bibliographic, bio-chemical, and many other domain-oriented and general-purpose datasets. The sheer amount of statements that can be retrieved for a given entity calls for ranking techniques that return the most salient, i.e., globally relevant, statements as top results. In this paper we analyze and compare various strategies for assigning global relevance scores to DBpedia facts with the goal to derive the best one among these strategies. Some of these strategies build on complementary aspects such as frequency and inverse document frequency, yet others combine structural information about the underlying knowledge graph with Web-based co-occurrence statistics for entity pairs. A user evaluation of the discussed approaches has been conducted on the popular DBpedia knowledge base with statistics derived from an indexed version of the ClueWeb09 corpus. The created dataset can be seen as a strong baseline for comparing entity ranking strategies (especially, in terms of global relevance) and can be used as a building block for developing new ranking and mining techniques on linked data.","PeriodicalId":302600,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering Workshops","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Assigning global relevance scores to DBpedia facts\",\"authors\":\"Philipp Langer, P. Schulze, Stefan George, Matthias Kohnen, Tobias Metzke, Ziawasch Abedjan, G. Kasneci\",\"doi\":\"10.1109/ICDEW.2014.6818334\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Knowledge bases have become ubiquitous assets in today's Web. They provide access to billions of statements about real-world entities derived from governmental, institutional, product-oriented, bibliographic, bio-chemical, and many other domain-oriented and general-purpose datasets. The sheer amount of statements that can be retrieved for a given entity calls for ranking techniques that return the most salient, i.e., globally relevant, statements as top results. In this paper we analyze and compare various strategies for assigning global relevance scores to DBpedia facts with the goal to derive the best one among these strategies. Some of these strategies build on complementary aspects such as frequency and inverse document frequency, yet others combine structural information about the underlying knowledge graph with Web-based co-occurrence statistics for entity pairs. A user evaluation of the discussed approaches has been conducted on the popular DBpedia knowledge base with statistics derived from an indexed version of the ClueWeb09 corpus. The created dataset can be seen as a strong baseline for comparing entity ranking strategies (especially, in terms of global relevance) and can be used as a building block for developing new ranking and mining techniques on linked data.\",\"PeriodicalId\":302600,\"journal\":{\"name\":\"2014 IEEE 30th International Conference on Data Engineering Workshops\",\"volume\":\"15 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-05-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 IEEE 30th International Conference on Data Engineering Workshops\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDEW.2014.6818334\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE 30th International Conference on Data Engineering Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDEW.2014.6818334","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Assigning global relevance scores to DBpedia facts
Knowledge bases have become ubiquitous assets in today's Web. They provide access to billions of statements about real-world entities derived from governmental, institutional, product-oriented, bibliographic, bio-chemical, and many other domain-oriented and general-purpose datasets. The sheer amount of statements that can be retrieved for a given entity calls for ranking techniques that return the most salient, i.e., globally relevant, statements as top results. In this paper we analyze and compare various strategies for assigning global relevance scores to DBpedia facts with the goal to derive the best one among these strategies. Some of these strategies build on complementary aspects such as frequency and inverse document frequency, yet others combine structural information about the underlying knowledge graph with Web-based co-occurrence statistics for entity pairs. A user evaluation of the discussed approaches has been conducted on the popular DBpedia knowledge base with statistics derived from an indexed version of the ClueWeb09 corpus. The created dataset can be seen as a strong baseline for comparing entity ranking strategies (especially, in terms of global relevance) and can be used as a building block for developing new ranking and mining techniques on linked data.