Performance assessment of ontology matching systems for FAIR data.

IF 2 3区工程技术 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Journal of Biomedical Semantics Pub Date : 2022-07-15 DOI:10.1186/s13326-022-00273-5

Philip van Damme, Jesualdo Tomás Fernández-Breis, Nirupama Benis, Jose Antonio Miñarro-Gimenez, Nicolette F de Keizer, Ronald Cornet

{"title":"Performance assessment of ontology matching systems for FAIR data.","authors":"Philip van Damme, Jesualdo Tomás Fernández-Breis, Nirupama Benis, Jose Antonio Miñarro-Gimenez, Nicolette F de Keizer, Ronald Cornet","doi":"10.1186/s13326-022-00273-5","DOIUrl":null,"url":null,"abstract":"Background: Ontology matching should contribute to the interoperability aspect of FAIR data (Findable, Accessible, Interoperable, and Reusable). Multiple data sources can use different ontologies for annotating their data and, thus, creating the need for dynamic ontology matching services. In this experimental study, we assessed the performance of ontology matching systems in the context of a real-life application from the rare disease domain. Additionally, we present a method for analyzing top-level classes to improve precision.Results: We included three ontologies (NCIt, SNOMED CT, ORDO) and three matching systems (AgreementMakerLight 2.0, FCA-Map, LogMap 2.0). We evaluated the performance of the matching systems against reference alignments from BioPortal and the Unified Medical Language System Metathesaurus (UMLS). Then, we analyzed the top-level ancestors of matched classes, to detect incorrect mappings without consulting a reference alignment. To detect such incorrect mappings, we manually matched semantically equivalent top-level classes of ontology pairs. AgreementMakerLight 2.0, FCA-Map, and LogMap 2.0 had F1-scores of 0.55, 0.46, 0.55 for BioPortal and 0.66, 0.53, 0.58 for the UMLS respectively. Using vote-based consensus alignments increased performance across the board. Evaluation with manually created top-level hierarchy mappings revealed that on average 90% of the mappings' classes belonged to top-level classes that matched.Conclusions: Our findings show that the included ontology matching systems automatically produced mappings that were modestly accurate according to our evaluation. The hierarchical analysis of mappings seems promising when no reference alignments are available. All in all, the systems show potential to be implemented as part of an ontology matching service for querying FAIR data. Future research should focus on developing methods for the evaluation of mappings used in such mapping services, leading to their implementation in a FAIR data ecosystem.","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":" ","pages":"19"},"PeriodicalIF":2.0000,"publicationDate":"2022-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9284868/pdf/","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Biomedical Semantics","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1186/s13326-022-00273-5","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 1

Abstract

Background: Ontology matching should contribute to the interoperability aspect of FAIR data (Findable, Accessible, Interoperable, and Reusable). Multiple data sources can use different ontologies for annotating their data and, thus, creating the need for dynamic ontology matching services. In this experimental study, we assessed the performance of ontology matching systems in the context of a real-life application from the rare disease domain. Additionally, we present a method for analyzing top-level classes to improve precision.

Results: We included three ontologies (NCIt, SNOMED CT, ORDO) and three matching systems (AgreementMakerLight 2.0, FCA-Map, LogMap 2.0). We evaluated the performance of the matching systems against reference alignments from BioPortal and the Unified Medical Language System Metathesaurus (UMLS). Then, we analyzed the top-level ancestors of matched classes, to detect incorrect mappings without consulting a reference alignment. To detect such incorrect mappings, we manually matched semantically equivalent top-level classes of ontology pairs. AgreementMakerLight 2.0, FCA-Map, and LogMap 2.0 had F1-scores of 0.55, 0.46, 0.55 for BioPortal and 0.66, 0.53, 0.58 for the UMLS respectively. Using vote-based consensus alignments increased performance across the board. Evaluation with manually created top-level hierarchy mappings revealed that on average 90% of the mappings' classes belonged to top-level classes that matched.

Conclusions: Our findings show that the included ontology matching systems automatically produced mappings that were modestly accurate according to our evaluation. The hierarchical analysis of mappings seems promising when no reference alignments are available. All in all, the systems show potential to be implemented as part of an ontology matching service for querying FAIR data. Future research should focus on developing methods for the evaluation of mappings used in such mapping services, leading to their implementation in a FAIR data ecosystem.

Abstract Image

查看原文本刊更多论文

FAIR数据本体匹配系统的性能评估。

背景:本体匹配应该有助于FAIR数据的互操作性(可查找、可访问、可互操作和可重用)。多个数据源可以使用不同的本体来注释它们的数据，从而产生对动态本体匹配服务的需求。在这项实验研究中，我们评估了本体匹配系统在罕见病领域的实际应用中的性能。此外，我们还提出了一种分析顶级类的方法，以提高精度。结果:我们纳入了3个本体(NCIt、SNOMED CT、ORDO)和3个匹配系统(AgreementMakerLight 2.0、FCA-Map、LogMap 2.0)。我们根据来自biopportal和统一医学语言系统元词典(UMLS)的参考比对评估了匹配系统的性能。然后，我们分析匹配类的顶级祖先，在不参考引用对齐的情况下检测不正确的映射。为了检测这种不正确的映射，我们手动匹配语义等价的本体对顶级类。AgreementMakerLight 2.0、FCA-Map和LogMap 2.0在biopportal上的f1得分分别为0.55、0.46、0.55，在UMLS上的f1得分分别为0.66、0.53、0.58。使用以投票为基础的共识联盟可以全面提高绩效。使用手动创建的顶级层次映射进行评估显示，平均90%的映射的类属于匹配的顶级类。结论:我们的研究结果表明，根据我们的评估，所包含的本体匹配系统自动生成了适度准确的映射。当没有可用的引用对齐时，映射的层次分析似乎很有希望。总而言之，这些系统显示出作为查询FAIR数据的本体匹配服务的一部分实现的潜力。未来的研究应侧重于开发用于评估此类地图服务中使用的映射的方法，从而在FAIR数据生态系统中实现它们。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Biomedical Semantics MATHEMATICAL & COMPUTATIONAL BIOLOGY-

CiteScore

4.20

自引率

5.30%

发文量

审稿时长

30 weeks

期刊介绍： Journal of Biomedical Semantics addresses issues of semantic enrichment and semantic processing in the biomedical domain. The scope of the journal covers two main areas: Infrastructure for biomedical semantics: focusing on semantic resources and repositories, meta-data management and resource description, knowledge representation and semantic frameworks, the Biomedical Semantic Web, and semantic interoperability. Semantic mining, annotation, and analysis: focusing on approaches and applications of semantic resources; and tools for investigation, reasoning, prediction, and discoveries in biomedicine.