用于文档检索的自动语义查询公式

2018 Fourth International Conference on Information Retrieval and Knowledge Management (CAMP) Pub Date : 2018-09-13 DOI:10.1109/INFRKM.2018.8464786

R. A. Kadir, A. Yauri, A. Azman

{"title":"用于文档检索的自动语义查询公式","authors":"R. A. Kadir, A. Yauri, A. Azman","doi":"10.1109/INFRKM.2018.8464786","DOIUrl":null,"url":null,"abstract":"Introduction to the Semantic Web is the chances for easier and effective access to the constantly increasing heterogeneous data on the Web. Currently, the data is able to be retrieved semantically rather than through traditional keyword based searches, which usually return lots of irrelevant information. However, one of the main challenges of the Semantic Web is that data are stored in a structured RDF triple format and are retrieved using complex structured triple represented queries, such as SPARQL, instead of preferred natural language queries and this problem remains subject to research. The proposed AutoSDoR, meaning Automated Semantic Document Retrieval, enables the semantic formulation of natural language queries to structured triple representation based on the machine learning approach in order to retrieve documents from the structured RDF triple format. Additionally the research goes beyond small fragment queries, such as in FREyA to paragraph length query. Automatic disambiguation of query terms that are not covered in WordNet is also proposed, which contributes to the increase in precision and recall of the retrieved document.","PeriodicalId":196731,"journal":{"name":"2018 Fourth International Conference on Information Retrieval and Knowledge Management (CAMP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Automated Semantic Query Formulation for Document Retrieval\",\"authors\":\"R. A. Kadir, A. Yauri, A. Azman\",\"doi\":\"10.1109/INFRKM.2018.8464786\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Introduction to the Semantic Web is the chances for easier and effective access to the constantly increasing heterogeneous data on the Web. Currently, the data is able to be retrieved semantically rather than through traditional keyword based searches, which usually return lots of irrelevant information. However, one of the main challenges of the Semantic Web is that data are stored in a structured RDF triple format and are retrieved using complex structured triple represented queries, such as SPARQL, instead of preferred natural language queries and this problem remains subject to research. The proposed AutoSDoR, meaning Automated Semantic Document Retrieval, enables the semantic formulation of natural language queries to structured triple representation based on the machine learning approach in order to retrieve documents from the structured RDF triple format. Additionally the research goes beyond small fragment queries, such as in FREyA to paragraph length query. Automatic disambiguation of query terms that are not covered in WordNet is also proposed, which contributes to the increase in precision and recall of the retrieved document.\",\"PeriodicalId\":196731,\"journal\":{\"name\":\"2018 Fourth International Conference on Information Retrieval and Knowledge Management (CAMP)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-09-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 Fourth International Conference on Information Retrieval and Knowledge Management (CAMP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/INFRKM.2018.8464786\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 Fourth International Conference on Information Retrieval and Knowledge Management (CAMP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INFRKM.2018.8464786","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

语义网入门为更容易和有效地访问Web上不断增加的异构数据提供了机会。目前，可以通过语义检索数据，而不是通过传统的基于关键字的搜索，这通常会返回大量不相关的信息。然而，语义Web的主要挑战之一是数据以结构化RDF三元格式存储，并且使用复杂的结构化三元表示查询(如SPARQL)来检索，而不是首选的自然语言查询，这个问题仍有待研究。提议的AutoSDoR，意思是自动语义文档检索，使自然语言查询的语义公式能够基于机器学习方法结构化三重表示，以便从结构化RDF三重格式检索文档。此外，研究超出了小片段查询，如在FREyA段落长度查询。对WordNet中未涵盖的查询词进行自动消歧，提高了检索文档的查全率和查全率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Automated Semantic Query Formulation for Document Retrieval

Introduction to the Semantic Web is the chances for easier and effective access to the constantly increasing heterogeneous data on the Web. Currently, the data is able to be retrieved semantically rather than through traditional keyword based searches, which usually return lots of irrelevant information. However, one of the main challenges of the Semantic Web is that data are stored in a structured RDF triple format and are retrieved using complex structured triple represented queries, such as SPARQL, instead of preferred natural language queries and this problem remains subject to research. The proposed AutoSDoR, meaning Automated Semantic Document Retrieval, enables the semantic formulation of natural language queries to structured triple representation based on the machine learning approach in order to retrieve documents from the structured RDF triple format. Additionally the research goes beyond small fragment queries, such as in FREyA to paragraph length query. Automatic disambiguation of query terms that are not covered in WordNet is also proposed, which contributes to the increase in precision and recall of the retrieved document.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 Fourth International Conference on Information Retrieval and Knowledge Management (CAMP)

自引率

0.00%

发文量