生物地球化学文献的语义检索

Joshua D. Eisenberg, Deya Banisakher, Maria E. Presa-Reyes, Kalli Unthank, Mark A. Finlayson, René Price, Shu‐Ching Chen
{"title":"生物地球化学文献的语义检索","authors":"Joshua D. Eisenberg, Deya Banisakher, Maria E. Presa-Reyes, Kalli Unthank, Mark A. Finlayson, René Price, Shu‐Ching Chen","doi":"10.1109/IRI.2017.49","DOIUrl":null,"url":null,"abstract":"Literature search is a vital step of every research project. Semantic literature search is an approach to article retrieval and ranking using concepts rather than keywords, in an attempt to address the well-known deficiencies of keyword-based search, namely, (1) retrieval of an overwhelming number of results, (2) rankings that do not precisely reflect true relevance, and (3) the omission of relevant results because they do not contain the idiosyncratic keywords of the query. The difficulty of semantic search, however, is that it requires significant knowledge engineering, often in the form of conceptual ontologies tailored to a particular scientific domain. It also requires non-trivial tuning, in the form of domain-specific term and concepts weights. Here we present preliminary, work-in-progress results in the development of a semantic search system for the biogeochemical scientific literature. We report the following initial steps: first, one of the co-authors—a biogeochemistry expert—wrote a sample search query, and ranked the five most relevant articles that were returned for that query from a popular keyword-based search engine. We then hand annotated the five articles and the query with the Environmental Ontology (ENVO), an existing ontology for the domain. Critically, this pilot annotation revealed a number of missing concepts that we will add in future work. We then showed that a straightforward ontology distance metric between concepts in the search query and the five articles was sufficient to produce the expected ranking. We discuss the implications of these results, and outline next steps required produce a full-fledged semantic search system for the biogeochemistry scientific literature.","PeriodicalId":254330,"journal":{"name":"2017 IEEE International Conference on Information Reuse and Integration (IRI)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Toward Semantic Search for the Biogeochemical Literature\",\"authors\":\"Joshua D. Eisenberg, Deya Banisakher, Maria E. Presa-Reyes, Kalli Unthank, Mark A. Finlayson, René Price, Shu‐Ching Chen\",\"doi\":\"10.1109/IRI.2017.49\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Literature search is a vital step of every research project. Semantic literature search is an approach to article retrieval and ranking using concepts rather than keywords, in an attempt to address the well-known deficiencies of keyword-based search, namely, (1) retrieval of an overwhelming number of results, (2) rankings that do not precisely reflect true relevance, and (3) the omission of relevant results because they do not contain the idiosyncratic keywords of the query. The difficulty of semantic search, however, is that it requires significant knowledge engineering, often in the form of conceptual ontologies tailored to a particular scientific domain. It also requires non-trivial tuning, in the form of domain-specific term and concepts weights. Here we present preliminary, work-in-progress results in the development of a semantic search system for the biogeochemical scientific literature. We report the following initial steps: first, one of the co-authors—a biogeochemistry expert—wrote a sample search query, and ranked the five most relevant articles that were returned for that query from a popular keyword-based search engine. We then hand annotated the five articles and the query with the Environmental Ontology (ENVO), an existing ontology for the domain. Critically, this pilot annotation revealed a number of missing concepts that we will add in future work. We then showed that a straightforward ontology distance metric between concepts in the search query and the five articles was sufficient to produce the expected ranking. We discuss the implications of these results, and outline next steps required produce a full-fledged semantic search system for the biogeochemistry scientific literature.\",\"PeriodicalId\":254330,\"journal\":{\"name\":\"2017 IEEE International Conference on Information Reuse and Integration (IRI)\",\"volume\":\"64 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE International Conference on Information Reuse and Integration (IRI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IRI.2017.49\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Conference on Information Reuse and Integration (IRI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IRI.2017.49","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

文献检索是每一个研究项目的重要步骤。语义文献搜索是一种使用概念而不是关键字进行文章检索和排名的方法,旨在解决基于关键字的搜索的众所周知的缺陷,即(1)检索大量结果,(2)排名不能准确反映真正的相关性,以及(3)由于不包含查询的特殊关键字而遗漏相关结果。然而,语义搜索的困难在于它需要大量的知识工程,通常以针对特定科学领域定制的概念本体的形式。它还需要以特定于领域的术语和概念权重的形式进行重要调优。在这里,我们提出了初步的,正在进行的工作结果,在开发一个语义搜索系统的生物地球化学科学文献。我们报告了以下初步步骤:首先,其中一位合著者——一位生物地球化学专家——编写了一个搜索查询示例,并从一个流行的基于关键字的搜索引擎中对该查询返回的五篇最相关的文章进行了排名。然后,我们用环境本体(ENVO)(该领域的现有本体)手工注释了这五篇文章和查询。至关重要的是,这个试点注释揭示了一些我们将在未来工作中添加的缺失概念。然后,我们展示了搜索查询中概念与五篇文章之间的直接本体距离度量足以产生预期的排名。我们讨论了这些结果的含义,并概述了下一步需要为生物地球化学科学文献建立一个成熟的语义搜索系统。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Toward Semantic Search for the Biogeochemical Literature
Literature search is a vital step of every research project. Semantic literature search is an approach to article retrieval and ranking using concepts rather than keywords, in an attempt to address the well-known deficiencies of keyword-based search, namely, (1) retrieval of an overwhelming number of results, (2) rankings that do not precisely reflect true relevance, and (3) the omission of relevant results because they do not contain the idiosyncratic keywords of the query. The difficulty of semantic search, however, is that it requires significant knowledge engineering, often in the form of conceptual ontologies tailored to a particular scientific domain. It also requires non-trivial tuning, in the form of domain-specific term and concepts weights. Here we present preliminary, work-in-progress results in the development of a semantic search system for the biogeochemical scientific literature. We report the following initial steps: first, one of the co-authors—a biogeochemistry expert—wrote a sample search query, and ranked the five most relevant articles that were returned for that query from a popular keyword-based search engine. We then hand annotated the five articles and the query with the Environmental Ontology (ENVO), an existing ontology for the domain. Critically, this pilot annotation revealed a number of missing concepts that we will add in future work. We then showed that a straightforward ontology distance metric between concepts in the search query and the five articles was sufficient to produce the expected ranking. We discuss the implications of these results, and outline next steps required produce a full-fledged semantic search system for the biogeochemistry scientific literature.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信