Journal of Biomedical Semantics最新文献_第7页

Classifying literature mentions of biological pathogens as experimentally studied using natural language processing. 将提及生物病原体的文献分类为使用自然语言处理进行实验研究。

IF 1.9 3区工程技术

Journal of Biomedical Semantics Pub Date : 2023-01-31 DOI: 10.1186/s13326-023-00282-y

Antonio Jose Jimeno Yepes, Karin Verspoor

{"title":"Classifying literature mentions of biological pathogens as experimentally studied using natural language processing.","authors":"Antonio Jose Jimeno Yepes, Karin Verspoor","doi":"10.1186/s13326-023-00282-y","DOIUrl":"10.1186/s13326-023-00282-y","url":null,"abstract":"Background: Information pertaining to mechanisms, management and treatment of disease-causing pathogens including viruses and bacteria is readily available from research publications indexed in MEDLINE. However, identifying the literature that specifically characterises these pathogens and their properties based on experimental research, important for understanding of the molecular basis of diseases caused by these agents, requires sifting through a large number of articles to exclude incidental mentions of the pathogens, or references to pathogens in other non-experimental contexts such as public health.Objective: In this work, we lay the foundations for the development of automatic methods for characterising mentions of pathogens in scientific literature, focusing on the task of identifying research that involves the experimental study of a pathogen in an experimental context. There are no manually annotated pathogen corpora available for this purpose, while such resources are necessary to support the development of machine learning-based models. We therefore aim to fill this gap, producing a large data set automatically from MEDLINE under some simplifying assumptions for the task definition, and using it to explore automatic methods that specifically support the detection of experimentally studied pathogen mentions in research publications.Methods: We developed a pathogen mention characterisation literature data set -READBiomed-Pathogens- automatically using NCBI resources, which we make available. Resources such as the NCBI Taxonomy, MeSH and GenBank can be used effectively to identify relevant literature about experimentally researched pathogens, more specifically using MeSH to link to MEDLINE citations including titles and abstracts with experimentally researched pathogens. We experiment with several machine learning-based natural language processing (NLP) algorithms leveraging this data set as training data, to model the task of detecting papers that specifically describe experimental study of a pathogen.Results: We show that our data set READBiomed-Pathogens can be used to explore natural language processing configurations for experimental pathogen mention characterisation. READBiomed-Pathogens includes citations related to organisms including bacteria, viruses, and a small number of toxins and other disease-causing agents.Conclusions: We studied the characterisation of experimentally studied pathogens in scientific literature, developing several natural language processing methods supported by an automatically developed data set. As a core contribution of the work, we presented a methodology to automatically construct a data set for pathogen identification using existing biomedical resources. The data set and the annotation code are made publicly available. Performance of the pathogen mention identification and characterisa","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"14 1","pages":"1"},"PeriodicalIF":1.9,"publicationDate":"2023-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9889128/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9243626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

We are not ready yet: limitations of state-of-the-art disease named entity recognizers. 我们还没有准备好:最先进的疾病命名实体识别器的局限性。

IF 1.9 3区工程技术

Journal of Biomedical Semantics Pub Date : 2022-10-27 DOI: 10.1186/s13326-022-00280-6

Lisa Kühnel, Juliane Fluck

{"title":"We are not ready yet: limitations of state-of-the-art disease named entity recognizers.","authors":"Lisa Kühnel, Juliane Fluck","doi":"10.1186/s13326-022-00280-6","DOIUrl":"https://doi.org/10.1186/s13326-022-00280-6","url":null,"abstract":"Background: Intense research has been done in the area of biomedical natural language processing. Since the breakthrough of transfer learning-based methods, BERT models are used in a variety of biomedical and clinical applications. For the available data sets, these models show excellent results - partly exceeding the inter-annotator agreements. However, biomedical named entity recognition applied on COVID-19 preprints shows a performance drop compared to the results on test data. The question arises how well trained models are able to predict on completely new data, i.e. to generalize.Results: Based on the example of disease named entity recognition, we investigate the robustness of different machine learning-based methods - thereof transfer learning - and show that current state-of-the-art methods work well for a given training and the corresponding test set but experience a significant lack of generalization when applying to new data.Conclusions: We argue that there is a need for larger annotated data sets for training and testing. Therefore, we foresee the curation of further data sets and, moreover, the investigation of continual learning processes for machine learning-based models.","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":" ","pages":"26"},"PeriodicalIF":1.9,"publicationDate":"2022-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9612606/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40429097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

A comprehensive update on CIDO: the community-based coronavirus infectious disease ontology. 全面更新 CIDO：基于社区的冠状病毒传染病本体。

IF 2 3区工程技术

Journal of Biomedical Semantics Pub Date : 2022-10-21 DOI: 10.1186/s13326-022-00279-z

Yongqun He, Hong Yu, Anthony Huffman, Asiyah Yu Lin, Darren A Natale, John Beverley, Ling Zheng, Yehoshua Perl, Zhigang Wang, Yingtong Liu, Edison Ong, Yang Wang, Philip Huang, Long Tran, Jinyang Du, Zalan Shah, Easheta Shah, Roshan Desai, Hsin-Hui Huang, Yujia Tian, Eric Merrell, William D Duncan, Sivaram Arabandi, Lynn M Schriml, Jie Zheng, Anna Maria Masci, Liwei Wang, Hongfang Liu, Fatima Zohra Smaili, Robert Hoehndorf, Zoë May Pendlington, Paola Roncaglia, Xianwei Ye, Jiangan Xie, Yi-Wei Tang, Xiaolin Yang, Suyuan Peng, Luxia Zhang, Luonan Chen, Junguk Hur, Gilbert S Omenn, Brian Athey, Barry Smith

{"title":"A comprehensive update on CIDO: the community-based coronavirus infectious disease ontology.","authors":"Yongqun He, Hong Yu, Anthony Huffman, Asiyah Yu Lin, Darren A Natale, John Beverley, Ling Zheng, Yehoshua Perl, Zhigang Wang, Yingtong Liu, Edison Ong, Yang Wang, Philip Huang, Long Tran, Jinyang Du, Zalan Shah, Easheta Shah, Roshan Desai, Hsin-Hui Huang, Yujia Tian, Eric Merrell, William D Duncan, Sivaram Arabandi, Lynn M Schriml, Jie Zheng, Anna Maria Masci, Liwei Wang, Hongfang Liu, Fatima Zohra Smaili, Robert Hoehndorf, Zoë May Pendlington, Paola Roncaglia, Xianwei Ye, Jiangan Xie, Yi-Wei Tang, Xiaolin Yang, Suyuan Peng, Luxia Zhang, Luonan Chen, Junguk Hur, Gilbert S Omenn, Brian Athey, Barry Smith","doi":"10.1186/s13326-022-00279-z","DOIUrl":"10.1186/s13326-022-00279-z","url":null,"abstract":"Background: The current COVID-19 pandemic and the previous SARS/MERS outbreaks of 2003 and 2012 have resulted in a series of major global public health crises. We argue that in the interest of developing effective and safe vaccines and drugs and to better understand coronaviruses and associated disease mechenisms it is necessary to integrate the large and exponentially growing body of heterogeneous coronavirus data. Ontologies play an important role in standard-based knowledge and data representation, integration, sharing, and analysis. Accordingly, we initiated the development of the community-based Coronavirus Infectious Disease Ontology (CIDO) in early 2020.Results: As an Open Biomedical Ontology (OBO) library ontology, CIDO is open source and interoperable with other existing OBO ontologies. CIDO is aligned with the Basic Formal Ontology and Viral Infectious Disease Ontology. CIDO has imported terms from over 30 OBO ontologies. For example, CIDO imports all SARS-CoV-2 protein terms from the Protein Ontology, COVID-19-related phenotype terms from the Human Phenotype Ontology, and over 100 COVID-19 terms for vaccines (both authorized and in clinical trial) from the Vaccine Ontology. CIDO systematically represents variants of SARS-CoV-2 viruses and over 300 amino acid substitutions therein, along with over 300 diagnostic kits and methods. CIDO also describes hundreds of host-coronavirus protein-protein interactions (PPIs) and the drugs that target proteins in these PPIs. CIDO has been used to model COVID-19 related phenomena in areas such as epidemiology. The scope of CIDO was evaluated by visual analysis supported by a summarization network method. CIDO has been used in various applications such as term standardization, inference, natural language processing (NLP) and clinical data integration. We have applied the amino acid variant knowledge present in CIDO to analyze differences between SARS-CoV-2 Delta and Omicron variants. CIDO's integrative host-coronavirus PPIs and drug-target knowledge has also been used to support drug repurposing for COVID-19 treatment.Conclusion: CIDO represents entities and relations in the domain of coronavirus diseases with a special focus on COVID-19. It supports shared knowledge representation, data and metadata standardization and integration, and has been used in a range of applications.","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"13 1","pages":"25"},"PeriodicalIF":2.0,"publicationDate":"2022-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9585694/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9587760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Alignment of vaccine codes using an ontology of vaccine descriptions. 使用疫苗描述本体对疫苗代码进行对齐。

IF 1.9 3区工程技术

Journal of Biomedical Semantics Pub Date : 2022-10-18 DOI: 10.1186/s13326-022-00278-0

Benedikt Fh Becker, Jan A Kors, Erik M van Mulligen, Miriam Cjm Sturkenboom

{"title":"Alignment of vaccine codes using an ontology of vaccine descriptions.","authors":"Benedikt Fh Becker, Jan A Kors, Erik M van Mulligen, Miriam Cjm Sturkenboom","doi":"10.1186/s13326-022-00278-0","DOIUrl":"https://doi.org/10.1186/s13326-022-00278-0","url":null,"abstract":"Background: Vaccine information in European electronic health record (EHR) databases is represented using various clinical and database-specific coding systems and drug vocabularies. The lack of harmonization constitutes a challenge in reusing EHR data in collaborative benefit-risk studies about vaccines.Methods: We designed an ontology of the properties that are commonly used in vaccine descriptions, called Ontology of Vaccine Descriptions (VaccO), with a dictionary for the analysis of multilingual vaccine descriptions. We implemented five algorithms for the alignment of vaccine coding systems, i.e., the identification of corresponding codes from different coding ystems, based on an analysis of the code descriptors. The algorithms were evaluated by comparing their results with manually created alignments in two reference sets including clinical and database-specific coding systems with multilingual code descriptors.Results: The best-performing algorithm represented code descriptors as logical statements about entities in the VaccO ontology and used an ontology reasoner to infer common properties and identify corresponding vaccine codes. The evaluation demonstrated excellent performance of the approach (F-scores 0.91 and 0.96).Conclusion: The VaccO ontology allows the identification, representation, and comparison of heterogeneous descriptions of vaccines. The automatic alignment of vaccine coding systems can accelerate the readiness of EHR databases in collaborative vaccine studies.","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":" ","pages":"24"},"PeriodicalIF":1.9,"publicationDate":"2022-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9580193/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40339107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Pathling: analytics on FHIR. 路径:FHIR分析。

IF 1.9 3区工程技术

Journal of Biomedical Semantics Pub Date : 2022-09-08 DOI: 10.1186/s13326-022-00277-1

John Grimes, Piotr Szul, Alejandro Metke-Jimenez, Michael Lawley, Kylynn Loi

{"title":"Pathling: analytics on FHIR.","authors":"John Grimes, Piotr Szul, Alejandro Metke-Jimenez, Michael Lawley, Kylynn Loi","doi":"10.1186/s13326-022-00277-1","DOIUrl":"https://doi.org/10.1186/s13326-022-00277-1","url":null,"abstract":"Background: Health data analytics is an area that is facing rapid change due to the acceleration of digitization of the health sector, and the changing landscape of health data and clinical terminology standards. Our research has identified a need for improved tooling to support analytics users in the task of analyzing Fast Healthcare Interoperability Resources (FHIR®) data and associated clinical terminology.Results: A server implementation was developed, featuring a FHIR API with new operations designed to support exploratory data analysis (EDA), advanced patient cohort selection and data preparation tasks. Integration with a FHIR Terminology Service is also supported, allowing users to incorporate knowledge from rich terminologies such as SNOMED CT within their queries. A prototype user interface for EDA was developed, along with visualizations in support of a health data analysis project.Conclusions: Experience with applying this technology within research projects and towards the development of analytics-enabled applications provides a preliminary indication that the FHIR Analytics API pattern implemented by Pathling is a valuable abstraction for data scientists and software developers within the health care domain. Pathling contributes towards the value proposition for the use of FHIR within health data analytics, and assists with the use of complex clinical terminologies in that context.","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"13 1","pages":"23"},"PeriodicalIF":1.9,"publicationDate":"2022-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9455941/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10470739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Figuring Out Root and Epistemic Uses of Modals: The Role of the Input 情态动词的词根和认知用法:输入的作用

IF 1.9 3区工程技术

Journal of Biomedical Semantics Pub Date : 2022-08-26 DOI: 10.1093/jos/ffac010

Annemarie van Dooren, Anouk Dieuleveut, Ailís Cournane, V. Hacquard

{"title":"Figuring Out Root and Epistemic Uses of Modals: The Role of the Input","authors":"Annemarie van Dooren, Anouk Dieuleveut, Ailís Cournane, V. Hacquard","doi":"10.1093/jos/ffac010","DOIUrl":"https://doi.org/10.1093/jos/ffac010","url":null,"abstract":"\u0000 This paper investigates how children figure out that modals like must can be used to express both epistemic and “root” (i.e. non epistemic) flavors. The existing acquisition literature shows that children produce modals with epistemic meanings up to a year later than with root meanings. We conducted a corpus study to examine how modality is expressed in speech to and by young children, to investigate the ways in which the linguistic input children hear may help or hinder them in uncovering the flavor flexibility of modals. Our results show that the way parents use modals may obscure the fact that they can express epistemic flavors: modals are very rarely used epistemically. Yet, children eventually figure it out; our results suggest that some do so even before age 3. To investigate how children pick up on epistemic flavors, we explore distributional cues that distinguish roots and epistemics. The semantic literature argues they differ in “temporal orientation” (Condoravdi, 2002): while epistemics can have present or past orientation, root modals tend to be constrained to future orientation (Werner 2006; Klecha, 2016; Rullmann & Matthewson, 2018). We show that in child-directed speech, this constraint is well-reflected in the distribution of aspectual features of roots and epistemics, but that the signal might be weak given the strong usage bias towards roots. We discuss (a) what these results imply for how children might acquire adult-like modal representations, and (b) possible learning paths towards adult-like modal representations.","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"81 1","pages":"581-616"},"PeriodicalIF":1.9,"publicationDate":"2022-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79294875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Identification of missing hierarchical relations in the vaccine ontology using acquired term pairs. 利用获得的术语对识别疫苗本体中缺失的层次关系。

IF 1.9 3区工程技术

Journal of Biomedical Semantics Pub Date : 2022-08-13 DOI: 10.1186/s13326-022-00276-2

Warren Manuel, Rashmie Abeysinghe, Yongqun He, Cui Tao, Licong Cui

{"title":"Identification of missing hierarchical relations in the vaccine ontology using acquired term pairs.","authors":"Warren Manuel, Rashmie Abeysinghe, Yongqun He, Cui Tao, Licong Cui","doi":"10.1186/s13326-022-00276-2","DOIUrl":"https://doi.org/10.1186/s13326-022-00276-2","url":null,"abstract":"Background: The Vaccine Ontology (VO) is a biomedical ontology that standardizes vaccine annotation. Errors in VO will affect a multitude of applications that it is being used in. Quality assurance of VO is imperative to ensure that it provides accurate domain knowledge to these downstream tasks. Manual review to identify and fix quality issues (such as missing hierarchical is-a relations) is challenging given the complexity of the ontology. Automated approaches are highly desirable to facilitate the quality assurance of VO.Methods: We developed an automated lexical approach that identifies potentially missing is-a relations in VO. First, we construct two types of VO concept-pairs: (1) linked; and (2) unlinked. Each concept-pair further derives an Acquired Term Pair (ATP) based on their lexical features. If the same ATP is obtained by a linked concept-pair and an unlinked concept-pair, this is considered to indicate a potentially missing is-a relation between the unlinked pair of concepts.Results: Applying this approach on the 1.1.192 version of VO, we were able to identify 232 potentially missing is-a relations. A manual review by a VO domain expert on a random sample of 70 potentially missing is-a relations revealed that 65 of the cases were valid missing is-a relations in VO (a precision of 92.86%).Conclusions: The results indicate that our approach is highly effective in identifying missing is-a relation in VO.","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":" ","pages":"22"},"PeriodicalIF":1.9,"publicationDate":"2022-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9375092/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40611283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

DCSO: towards an ontology for machine-actionable data management plans. DCSO:面向机器可操作数据管理计划的本体。

IF 1.9 3区工程技术

Journal of Biomedical Semantics Pub Date : 2022-07-26 DOI: 10.1186/s13326-022-00274-4

João Cardoso, Leyla J Castro, Fajar J Ekaputra, Marie C Jacquemot, Marek Suchánek, Tomasz Miksa, José Borbinha

{"title":"DCSO: towards an ontology for machine-actionable data management plans.","authors":"João Cardoso, Leyla J Castro, Fajar J Ekaputra, Marie C Jacquemot, Marek Suchánek, Tomasz Miksa, José Borbinha","doi":"10.1186/s13326-022-00274-4","DOIUrl":"https://doi.org/10.1186/s13326-022-00274-4","url":null,"abstract":"The concept of Data Management Plan (DMP) has emerged as a fundamental tool to help researchers through the systematical management of data. The Research Data Alliance DMP Common Standard (DCS) working group developed a set of universal concepts characterising a DMP so it can be represented as a machine-actionable artefact, i.e., machine-actionable Data Management Plan (maDMP). The technology-agnostic approach of the current maDMP specification: (i) does not explicitly link to related data models or ontologies, (ii) has no standardised way to describe controlled vocabularies, and (iii) is extensible but has no clear mechanism to distinguish between the core specification and its extensions.This paper reports on a community effort to create the DMP Common Standard Ontology (DCSO) as a serialisation of the DCS core concepts, with a particular focus on a detailed description of the components of the ontology. Our initial result shows that the proposed DCSO can become a suitable candidate for a reference serialisation of the DMP Common Standard.","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":" ","pages":"21"},"PeriodicalIF":1.9,"publicationDate":"2022-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9327208/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40625989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Correction: PhenoDEF: a corpus for annotating sentences with information of phenotype definitions in biomedical literature. 更正:PhenoDEF:用于在生物医学文献中注释带有表型定义信息的句子的语料库。

IF 1.9 3区工程技术

Journal of Biomedical Semantics Pub Date : 2022-07-20 DOI: 10.1186/s13326-022-00275-3

Samar Binkheder, Heng-Yi Wu, Sara K Quinney, Shijun Zhang, Md Muntasir Zitu, Chien-Wei Chiang, Lei Wang, Josette Jones, Lang Li

引用次数: 0

Performance assessment of ontology matching systems for FAIR data. FAIR数据本体匹配系统的性能评估。

IF 1.9 3区工程技术

Journal of Biomedical Semantics Pub Date : 2022-07-15 DOI: 10.1186/s13326-022-00273-5

Philip van Damme, Jesualdo Tomás Fernández-Breis, Nirupama Benis, Jose Antonio Miñarro-Gimenez, Nicolette F de Keizer, Ronald Cornet

{"title":"Performance assessment of ontology matching systems for FAIR data.","authors":"Philip van Damme, Jesualdo Tomás Fernández-Breis, Nirupama Benis, Jose Antonio Miñarro-Gimenez, Nicolette F de Keizer, Ronald Cornet","doi":"10.1186/s13326-022-00273-5","DOIUrl":"https://doi.org/10.1186/s13326-022-00273-5","url":null,"abstract":"Background: Ontology matching should contribute to the interoperability aspect of FAIR data (Findable, Accessible, Interoperable, and Reusable). Multiple data sources can use different ontologies for annotating their data and, thus, creating the need for dynamic ontology matching services. In this experimental study, we assessed the performance of ontology matching systems in the context of a real-life application from the rare disease domain. Additionally, we present a method for analyzing top-level classes to improve precision.Results: We included three ontologies (NCIt, SNOMED CT, ORDO) and three matching systems (AgreementMakerLight 2.0, FCA-Map, LogMap 2.0). We evaluated the performance of the matching systems against reference alignments from BioPortal and the Unified Medical Language System Metathesaurus (UMLS). Then, we analyzed the top-level ancestors of matched classes, to detect incorrect mappings without consulting a reference alignment. To detect such incorrect mappings, we manually matched semantically equivalent top-level classes of ontology pairs. AgreementMakerLight 2.0, FCA-Map, and LogMap 2.0 had F1-scores of 0.55, 0.46, 0.55 for BioPortal and 0.66, 0.53, 0.58 for the UMLS respectively. Using vote-based consensus alignments increased performance across the board. Evaluation with manually created top-level hierarchy mappings revealed that on average 90% of the mappings' classes belonged to top-level classes that matched.Conclusions: Our findings show that the included ontology matching systems automatically produced mappings that were modestly accurate according to our evaluation. The hierarchical analysis of mappings seems promising when no reference alignments are available. All in all, the systems show potential to be implemented as part of an ontology matching service for querying FAIR data. Future research should focus on developing methods for the evaluation of mappings used in such mapping services, leading to their implementation in a FAIR data ecosystem.","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":" ","pages":"19"},"PeriodicalIF":1.9,"publicationDate":"2022-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9284868/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40597376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1