Xingsi Xue , Donglei Sun , Achyut Shankar , Wattana Viriyasitavat , Patrick Siarry
{"title":"Efficient large-scale biomedical ontology matching with anchor-based biomedical ontology partitioning and compact geometric semantic genetic programming","authors":"Xingsi Xue , Donglei Sun , Achyut Shankar , Wattana Viriyasitavat , Patrick Siarry","doi":"10.1016/j.jii.2024.100637","DOIUrl":null,"url":null,"abstract":"<div><p>Biomedical ontology offers a structured framework to model the biomedical knowledge in a machine-readable format. However, the heterogeneity inherent in biomedical ontologies hinders their communication. Biomedical Ontology Matching (BOM) can address this issue by identifying equivalent concepts in biomedical ontologies. Recently, Evolutionary Algorithms (EAs) based matching techniques have exhibited their effectiveness in finding high-quality matching results. However, due to the vast number of entities, and intricate relationships between entities, it is difficult for traditional EAs to efficiently solve the BOM problem. To tackle this challenge, this paper proposes an efficient BOM method to automatically match large-scale biomedical ontologies. First, a novel anchor-based biomedical ontology partitioning method is developed to transform the large-scale BOM problem into several small-scale matching tasks, reducing the search space of the matching phase. Second, a new Compact Geometric Semantic Genetic Programming (CGSGP) is proposed to efficiently construct high-level Similarity Feature for BOM, which can significantly reduce the computational complexity. Lastly, a new fitness function composed of the approximated evaluation metric and the Dominance Improvement Ratio (DIR) is introduced, which can overcome the solution’s bias improvement and enable the simultaneous matching of multiple pairs of sub-ontologies without requiring the standard alignment. The experiment verifies our approach’s performance on the Ontology Alignment Evaluation Initiative (OAEI)’s Anatomy, Large Biomed and Disease and Phenotype datasets. The experimental results show that our method can efficiently determine high-quality BOM results across different test cases, whose performance significantly outperforms the state-of-the-art BOM techniques.</p></div>","PeriodicalId":55975,"journal":{"name":"Journal of Industrial Information Integration","volume":"41 ","pages":"Article 100637"},"PeriodicalIF":10.4000,"publicationDate":"2024-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Industrial Information Integration","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2452414X24000815","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
Biomedical ontology offers a structured framework to model the biomedical knowledge in a machine-readable format. However, the heterogeneity inherent in biomedical ontologies hinders their communication. Biomedical Ontology Matching (BOM) can address this issue by identifying equivalent concepts in biomedical ontologies. Recently, Evolutionary Algorithms (EAs) based matching techniques have exhibited their effectiveness in finding high-quality matching results. However, due to the vast number of entities, and intricate relationships between entities, it is difficult for traditional EAs to efficiently solve the BOM problem. To tackle this challenge, this paper proposes an efficient BOM method to automatically match large-scale biomedical ontologies. First, a novel anchor-based biomedical ontology partitioning method is developed to transform the large-scale BOM problem into several small-scale matching tasks, reducing the search space of the matching phase. Second, a new Compact Geometric Semantic Genetic Programming (CGSGP) is proposed to efficiently construct high-level Similarity Feature for BOM, which can significantly reduce the computational complexity. Lastly, a new fitness function composed of the approximated evaluation metric and the Dominance Improvement Ratio (DIR) is introduced, which can overcome the solution’s bias improvement and enable the simultaneous matching of multiple pairs of sub-ontologies without requiring the standard alignment. The experiment verifies our approach’s performance on the Ontology Alignment Evaluation Initiative (OAEI)’s Anatomy, Large Biomed and Disease and Phenotype datasets. The experimental results show that our method can efficiently determine high-quality BOM results across different test cases, whose performance significantly outperforms the state-of-the-art BOM techniques.
期刊介绍:
The Journal of Industrial Information Integration focuses on the industry's transition towards industrial integration and informatization, covering not only hardware and software but also information integration. It serves as a platform for promoting advances in industrial information integration, addressing challenges, issues, and solutions in an interdisciplinary forum for researchers, practitioners, and policy makers.
The Journal of Industrial Information Integration welcomes papers on foundational, technical, and practical aspects of industrial information integration, emphasizing the complex and cross-disciplinary topics that arise in industrial integration. Techniques from mathematical science, computer science, computer engineering, electrical and electronic engineering, manufacturing engineering, and engineering management are crucial in this context.