以知识图谱形式发布农业文件语义注释的统一方法

IF 6.3 Q1 AGRICULTURAL ENGINEERING

Smart agricultural technology Pub Date : 2024-08-01 DOI:10.1016/j.atech.2024.100484

Nadia Yacoubi Ayadi , Stephan Bernard , Robert Bossy , Marine Courtin , Bill Gates Happi Happi , Pierre Larmande , Franck Michel , Claire Nédellec , Catherine Roussey , Catherine Faron

{"title":"以知识图谱形式发布农业文件语义注释的统一方法","authors":"Nadia Yacoubi Ayadi , Stephan Bernard , Robert Bossy , Marine Courtin , Bill Gates Happi Happi , Pierre Larmande , Franck Michel , Claire Nédellec , Catherine Roussey , Catherine Faron","doi":"10.1016/j.atech.2024.100484","DOIUrl":null,"url":null,"abstract":"<div><p>The research results presented in this paper were obtained as part of the D2KAB project (Data to Knowledge in Agriculture and Biodiversity) which aims to develop semantic web-based tools to describe and make agronomical data actionable and accessible following the FAIR principles. We focus on constructing domain-specific Knowledge Graphs (KGs) from textual data sources, using Natural Language Processing (NLP) techniques to extract and structure relevant entities. Our approach is based on the formalization of a semantic data model using common linked open vocabularies such as the Web Annotation Ontology (OA) and the Provenance Ontology (PROV). The model was developed by formulating motivating scenarios and competency questions from domain experts. This model has been used to construct three different KGs from three distinct corpora: PubMed scientific publications on wheat and rice genetics and phenotyping, and French agricultural alert bulletins. The named entities to be recognized include genes, phenotypes, traits, genetic markers, taxa and phenological stages normalized using semantic resources such as the Wheat Trait and Phenotype Ontology (WTO), the French Crop Usage (FCU) thesaurus and the Plant Phenological Description Ontology (PPDO). Named entities were extracted using different NLP approaches and tools. The relevance of the semantic model was validated by implementing experts questions as SPARQL queries to be answered on the constructed RDF knowledge graphs. Our work demonstrates how domain-specific vocabularies and systematic querying of KGs can reveal hidden interactions and support agronomists in navigating vast amounts of data. The resources and transformation pipelines developed are publicly available in Git repositories.</p></div>","PeriodicalId":74813,"journal":{"name":"Smart agricultural technology","volume":"8 ","pages":"Article 100484"},"PeriodicalIF":6.3000,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772375524000893/pdfft?md5=7b50dd8eaf7a72ae5125f8390427364e&pid=1-s2.0-S2772375524000893-main.pdf","citationCount":"0","resultStr":"{\"title\":\"A unified approach to publish semantic annotations of agricultural documents as knowledge graphs\",\"authors\":\"Nadia Yacoubi Ayadi , Stephan Bernard , Robert Bossy , Marine Courtin , Bill Gates Happi Happi , Pierre Larmande , Franck Michel , Claire Nédellec , Catherine Roussey , Catherine Faron\",\"doi\":\"10.1016/j.atech.2024.100484\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>The research results presented in this paper were obtained as part of the D2KAB project (Data to Knowledge in Agriculture and Biodiversity) which aims to develop semantic web-based tools to describe and make agronomical data actionable and accessible following the FAIR principles. We focus on constructing domain-specific Knowledge Graphs (KGs) from textual data sources, using Natural Language Processing (NLP) techniques to extract and structure relevant entities. Our approach is based on the formalization of a semantic data model using common linked open vocabularies such as the Web Annotation Ontology (OA) and the Provenance Ontology (PROV). The model was developed by formulating motivating scenarios and competency questions from domain experts. This model has been used to construct three different KGs from three distinct corpora: PubMed scientific publications on wheat and rice genetics and phenotyping, and French agricultural alert bulletins. The named entities to be recognized include genes, phenotypes, traits, genetic markers, taxa and phenological stages normalized using semantic resources such as the Wheat Trait and Phenotype Ontology (WTO), the French Crop Usage (FCU) thesaurus and the Plant Phenological Description Ontology (PPDO). Named entities were extracted using different NLP approaches and tools. The relevance of the semantic model was validated by implementing experts questions as SPARQL queries to be answered on the constructed RDF knowledge graphs. Our work demonstrates how domain-specific vocabularies and systematic querying of KGs can reveal hidden interactions and support agronomists in navigating vast amounts of data. The resources and transformation pipelines developed are publicly available in Git repositories.</p></div>\",\"PeriodicalId\":74813,\"journal\":{\"name\":\"Smart agricultural technology\",\"volume\":\"8 \",\"pages\":\"Article 100484\"},\"PeriodicalIF\":6.3000,\"publicationDate\":\"2024-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2772375524000893/pdfft?md5=7b50dd8eaf7a72ae5125f8390427364e&pid=1-s2.0-S2772375524000893-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Smart agricultural technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2772375524000893\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AGRICULTURAL ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Smart agricultural technology","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772375524000893","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURAL ENGINEERING","Score":null,"Total":0}

引用次数: 0

摘要

本文介绍的研究成果是 D2KAB 项目（农业和生物多样性数据到知识）的一部分，该项目旨在开发基于语义网络的工具，以按照 FAIR 原则描述农学数据并使其具有可操作性和可访问性。我们的重点是从文本数据源中构建特定领域的知识图谱（KGs），使用自然语言处理（NLP）技术来提取和构建相关实体。我们的方法基于语义数据模型的形式化，使用的是通用的链接开放词汇表，如网络注释本体（OA）和出处本体（PROV）。该模型是通过制定激励情景和领域专家提出的能力问题开发出来的。该模型已被用于从三个不同的语料库中构建三个不同的 KG：PubMed 上关于小麦和水稻遗传学和表型的科学出版物，以及法国农业警报公告。要识别的命名实体包括基因、表型、性状、遗传标记、类群和表型阶段，这些命名实体利用小麦性状和表型本体（WTO）、法国作物使用（FCU）词库和植物表型描述本体（PPDO）等语义资源进行规范化。使用不同的 NLP 方法和工具提取命名实体。通过在构建的 RDF 知识图谱上将专家问题作为 SPARQL 查询来回答，验证了语义模型的相关性。我们的工作展示了特定领域词汇表和对知识图谱的系统查询如何揭示隐藏的交互作用，并支持农学家浏览海量数据。所开发的资源和转换管道可在 Git 存储库中公开获取。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A unified approach to publish semantic annotations of agricultural documents as knowledge graphs

The research results presented in this paper were obtained as part of the D2KAB project (Data to Knowledge in Agriculture and Biodiversity) which aims to develop semantic web-based tools to describe and make agronomical data actionable and accessible following the FAIR principles. We focus on constructing domain-specific Knowledge Graphs (KGs) from textual data sources, using Natural Language Processing (NLP) techniques to extract and structure relevant entities. Our approach is based on the formalization of a semantic data model using common linked open vocabularies such as the Web Annotation Ontology (OA) and the Provenance Ontology (PROV). The model was developed by formulating motivating scenarios and competency questions from domain experts. This model has been used to construct three different KGs from three distinct corpora: PubMed scientific publications on wheat and rice genetics and phenotyping, and French agricultural alert bulletins. The named entities to be recognized include genes, phenotypes, traits, genetic markers, taxa and phenological stages normalized using semantic resources such as the Wheat Trait and Phenotype Ontology (WTO), the French Crop Usage (FCU) thesaurus and the Plant Phenological Description Ontology (PPDO). Named entities were extracted using different NLP approaches and tools. The relevance of the semantic model was validated by implementing experts questions as SPARQL queries to be answered on the constructed RDF knowledge graphs. Our work demonstrates how domain-specific vocabularies and systematic querying of KGs can reveal hidden interactions and support agronomists in navigating vast amounts of data. The resources and transformation pipelines developed are publicly available in Git repositories.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Smart agricultural technology

CiteScore

4.20

自引率

0.00%

发文量