用于探索疾病模型小鼠的联邦SPARQL查询性能评估:结合基因表达、正畸学和疾病知识图。

IF 3.3 3区 医学 Q2 MEDICAL INFORMATICS
Tatsuya Kushida, Tarcisio Mendes de Farias, Ana C Sima, Christophe Dessimoz, Hirokazu Chiba, Frederic B Bastian, Hiroshi Masuya
{"title":"用于探索疾病模型小鼠的联邦SPARQL查询性能评估:结合基因表达、正畸学和疾病知识图。","authors":"Tatsuya Kushida, Tarcisio Mendes de Farias, Ana C Sima, Christophe Dessimoz, Hirokazu Chiba, Frederic B Bastian, Hiroshi Masuya","doi":"10.1186/s12911-025-03013-8","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The RIKEN BRC develops and maintains the RIKEN BioResource MetaDatabase to help users explore appropriate target bioresources for their experiments and prepare precise and high-quality data infrastructures. The Swiss Institute of Bioinformatics develops two databases across multi-species for the study of gene expression and orthology: Bgee and Orthologous MAtrix (OMA, an orthology database).</p><p><strong>Methods: </strong>This study combines the RIKEN BioResource data with Resource Description Framework (RDF) datasets from Bgee, a gene expression database, the OMA, the DisGeNET, a human gene-disease association, Mouse Genome Informatics (MGI), UniProt, and four disease ontologies in the RIKEN BioResource MetaDatabase. Our aim is to evaluate the distributed SPARQL query performance when exploring which model organisms are most appropriate for specific medical science research applications across the aforementioned interoperable datasets. More precisely in our biomedical use cases, we investigate disease-related genes, as well as anatomical parts where these genes are expressed and subsequently identify appropriate bioresource candidates available for specific disease research applications.</p><p><strong>Results: </strong>We illustrate the above through two use cases targeting either Alzheimer's disease or melanoma. We identified 14 Alzheimer's disease-related genes that were expressed in the prefrontal cortex (e.g., APP and APOE) and 55 RIKEN bioresources, which were genetically modified mice related to these genes, predicted to be relevant to Alzheimer's disease research. Furthermore, executing a transitive search for the Uberon terms by using the Property Paths function, we identified 14 melanoma-related genes (e.g., HRAS and PTEN), and 12 anatomical parts in which these genes were expressed, such as the \"skin of limb\" as an example. Finally, we compared the performance of the federated SPARQL query via the remote Bgee SPARQL endpoint with the performance of a centralized SPARQL query using the Bgee dataset as part of the RIKEN BioResource MetaDatabase.</p><p><strong>Conclusions: </strong>As a result, we confirmed that the performance of the federated approach degraded. We concluded that we reduced the degradation of the query performance of the federated approach from the BioResource MetaDatabase to the SIB by refining the transferred data through a subquery and enhancing the server specifications thereby optimizing the triple store query evaluation.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"25 Suppl 1","pages":"189"},"PeriodicalIF":3.3000,"publicationDate":"2025-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12082848/pdf/","citationCount":"0","resultStr":"{\"title\":\"Federated SPARQL query performance evaluation for exploring disease model mouse: combining gene expression, orthology, and disease knowledge graphs.\",\"authors\":\"Tatsuya Kushida, Tarcisio Mendes de Farias, Ana C Sima, Christophe Dessimoz, Hirokazu Chiba, Frederic B Bastian, Hiroshi Masuya\",\"doi\":\"10.1186/s12911-025-03013-8\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>The RIKEN BRC develops and maintains the RIKEN BioResource MetaDatabase to help users explore appropriate target bioresources for their experiments and prepare precise and high-quality data infrastructures. The Swiss Institute of Bioinformatics develops two databases across multi-species for the study of gene expression and orthology: Bgee and Orthologous MAtrix (OMA, an orthology database).</p><p><strong>Methods: </strong>This study combines the RIKEN BioResource data with Resource Description Framework (RDF) datasets from Bgee, a gene expression database, the OMA, the DisGeNET, a human gene-disease association, Mouse Genome Informatics (MGI), UniProt, and four disease ontologies in the RIKEN BioResource MetaDatabase. Our aim is to evaluate the distributed SPARQL query performance when exploring which model organisms are most appropriate for specific medical science research applications across the aforementioned interoperable datasets. More precisely in our biomedical use cases, we investigate disease-related genes, as well as anatomical parts where these genes are expressed and subsequently identify appropriate bioresource candidates available for specific disease research applications.</p><p><strong>Results: </strong>We illustrate the above through two use cases targeting either Alzheimer's disease or melanoma. We identified 14 Alzheimer's disease-related genes that were expressed in the prefrontal cortex (e.g., APP and APOE) and 55 RIKEN bioresources, which were genetically modified mice related to these genes, predicted to be relevant to Alzheimer's disease research. Furthermore, executing a transitive search for the Uberon terms by using the Property Paths function, we identified 14 melanoma-related genes (e.g., HRAS and PTEN), and 12 anatomical parts in which these genes were expressed, such as the \\\"skin of limb\\\" as an example. Finally, we compared the performance of the federated SPARQL query via the remote Bgee SPARQL endpoint with the performance of a centralized SPARQL query using the Bgee dataset as part of the RIKEN BioResource MetaDatabase.</p><p><strong>Conclusions: </strong>As a result, we confirmed that the performance of the federated approach degraded. We concluded that we reduced the degradation of the query performance of the federated approach from the BioResource MetaDatabase to the SIB by refining the transferred data through a subquery and enhancing the server specifications thereby optimizing the triple store query evaluation.</p>\",\"PeriodicalId\":9340,\"journal\":{\"name\":\"BMC Medical Informatics and Decision Making\",\"volume\":\"25 Suppl 1\",\"pages\":\"189\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2025-05-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12082848/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC Medical Informatics and Decision Making\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1186/s12911-025-03013-8\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MEDICAL INFORMATICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Informatics and Decision Making","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12911-025-03013-8","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 0

摘要

背景:RIKEN BRC开发和维护RIKEN生物资源元数据库,以帮助用户为他们的实验探索合适的目标生物资源,并准备精确和高质量的数据基础设施。瑞士生物信息学研究所开发了两个跨多物种的数据库,用于研究基因表达和同源性:Bgee和同源性矩阵(OMA,一个同源性数据库)。方法:本研究将RIKEN BioResource数据与来自Bgee的资源描述框架(RDF)数据集、基因表达数据库、OMA、DisGeNET、人类基因-疾病关联、小鼠基因组信息学(MGI)、UniProt和RIKEN BioResource元数据库中的四个疾病本体相结合。我们的目标是在探索哪种模式生物最适合上述互操作数据集的特定医学科学研究应用时,评估分布式SPARQL查询性能。更准确地说,在我们的生物医学用例中,我们研究疾病相关基因,以及这些基因表达的解剖部位,并随后确定可用于特定疾病研究应用的适当生物资源候选物。结果:我们通过针对阿尔茨海默病或黑色素瘤的两个用例来说明上述情况。我们确定了14个在前额叶皮层表达的阿尔茨海默病相关基因(如APP和APOE)和55个RIKEN生物资源,这些生物资源是与这些基因相关的转基因小鼠,预计与阿尔茨海默病研究相关。此外,通过使用属性路径函数执行Uberon术语的传递搜索,我们确定了14个黑色素瘤相关基因(例如,HRAS和PTEN),以及12个表达这些基因的解剖部位,例如“肢体皮肤”。最后,我们比较了通过远程Bgee SPARQL端点进行的联邦SPARQL查询的性能与使用Bgee数据集作为RIKEN BioResource元数据库的一部分进行的集中式SPARQL查询的性能。结论:结果,我们证实了联合方法的性能下降。我们的结论是,通过细化通过子查询传输的数据和增强服务器规范从而优化三重存储查询评估,减少了从BioResource元数据库到SIB的联邦方法的查询性能下降。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Federated SPARQL query performance evaluation for exploring disease model mouse: combining gene expression, orthology, and disease knowledge graphs.

Background: The RIKEN BRC develops and maintains the RIKEN BioResource MetaDatabase to help users explore appropriate target bioresources for their experiments and prepare precise and high-quality data infrastructures. The Swiss Institute of Bioinformatics develops two databases across multi-species for the study of gene expression and orthology: Bgee and Orthologous MAtrix (OMA, an orthology database).

Methods: This study combines the RIKEN BioResource data with Resource Description Framework (RDF) datasets from Bgee, a gene expression database, the OMA, the DisGeNET, a human gene-disease association, Mouse Genome Informatics (MGI), UniProt, and four disease ontologies in the RIKEN BioResource MetaDatabase. Our aim is to evaluate the distributed SPARQL query performance when exploring which model organisms are most appropriate for specific medical science research applications across the aforementioned interoperable datasets. More precisely in our biomedical use cases, we investigate disease-related genes, as well as anatomical parts where these genes are expressed and subsequently identify appropriate bioresource candidates available for specific disease research applications.

Results: We illustrate the above through two use cases targeting either Alzheimer's disease or melanoma. We identified 14 Alzheimer's disease-related genes that were expressed in the prefrontal cortex (e.g., APP and APOE) and 55 RIKEN bioresources, which were genetically modified mice related to these genes, predicted to be relevant to Alzheimer's disease research. Furthermore, executing a transitive search for the Uberon terms by using the Property Paths function, we identified 14 melanoma-related genes (e.g., HRAS and PTEN), and 12 anatomical parts in which these genes were expressed, such as the "skin of limb" as an example. Finally, we compared the performance of the federated SPARQL query via the remote Bgee SPARQL endpoint with the performance of a centralized SPARQL query using the Bgee dataset as part of the RIKEN BioResource MetaDatabase.

Conclusions: As a result, we confirmed that the performance of the federated approach degraded. We concluded that we reduced the degradation of the query performance of the federated approach from the BioResource MetaDatabase to the SIB by refining the transferred data through a subquery and enhancing the server specifications thereby optimizing the triple store query evaluation.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
7.20
自引率
5.70%
发文量
297
审稿时长
1 months
期刊介绍: BMC Medical Informatics and Decision Making is an open access journal publishing original peer-reviewed research articles in relation to the design, development, implementation, use, and evaluation of health information technologies and decision-making for human health.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信