Oncopacket:使用GA4GH表型包整合癌症研究数据。

IF 5.4
Michael Sierk, Daniel Danis, Sujay Patil, Nobal Kishor, Rajdeep Mondal, Abhishek Jha, Qingrong Chen, Chunhua Yan, Monica Munoz-Torres, Daoud Meerzaman, Peter N Robinson, Justin T Reese
{"title":"Oncopacket:使用GA4GH表型包整合癌症研究数据。","authors":"Michael Sierk, Daniel Danis, Sujay Patil, Nobal Kishor, Rajdeep Mondal, Abhishek Jha, Qingrong Chen, Chunhua Yan, Monica Munoz-Torres, Daoud Meerzaman, Peter N Robinson, Justin T Reese","doi":"10.1093/bioinformatics/btaf546","DOIUrl":null,"url":null,"abstract":"<p><strong>Summary: </strong>Lack of data integration remains a significant impediment to cancer research, and many analyses still require customized software to transform and prepare cancer data. We describe a software package to harmonize genetic and clinical cancer data into the GA4GH Phenopacket Schema, an ISO standard for representing clinical case data. We integrated demographic, mutation, morphology, diagnosis, intervention, and survival data using case data from the National Cancer Institute (NCI) for 12 cancer types. The Phenopacket standard provides a foundation for downstream use, including sophisticated statistical and AI/ML analyses. We demonstrate fitness for purpose by using the integrated data to recapitulate a known association between mutations in the gene encoding isocitrate dehydrogenase 1 (IDH1) and survival time in brain cancer patients.</p><p><strong>Availability and implementation: </strong>Source code is freely available at: https://github.com/monarch-initiative/oncopacket (archived at 10.5281/zenodo.15353125).</p><p><strong>Supplementary information: </strong>Phenopackets for 23650 individuals from 12 cancer types, 7816 of which have mutational data (average 80 variants affecting 62 unique genes per patient), are available as a Zenodo dataset: https://doi.org/10.5281/zenodo.14610228. An example of plots summarizing a cohort of phenopackets is available in the online supplement.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4000,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Oncopacket: Integration of Cancer Research Data using GA4GH Phenopackets.\",\"authors\":\"Michael Sierk, Daniel Danis, Sujay Patil, Nobal Kishor, Rajdeep Mondal, Abhishek Jha, Qingrong Chen, Chunhua Yan, Monica Munoz-Torres, Daoud Meerzaman, Peter N Robinson, Justin T Reese\",\"doi\":\"10.1093/bioinformatics/btaf546\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Summary: </strong>Lack of data integration remains a significant impediment to cancer research, and many analyses still require customized software to transform and prepare cancer data. We describe a software package to harmonize genetic and clinical cancer data into the GA4GH Phenopacket Schema, an ISO standard for representing clinical case data. We integrated demographic, mutation, morphology, diagnosis, intervention, and survival data using case data from the National Cancer Institute (NCI) for 12 cancer types. The Phenopacket standard provides a foundation for downstream use, including sophisticated statistical and AI/ML analyses. We demonstrate fitness for purpose by using the integrated data to recapitulate a known association between mutations in the gene encoding isocitrate dehydrogenase 1 (IDH1) and survival time in brain cancer patients.</p><p><strong>Availability and implementation: </strong>Source code is freely available at: https://github.com/monarch-initiative/oncopacket (archived at 10.5281/zenodo.15353125).</p><p><strong>Supplementary information: </strong>Phenopackets for 23650 individuals from 12 cancer types, 7816 of which have mutational data (average 80 variants affecting 62 unique genes per patient), are available as a Zenodo dataset: https://doi.org/10.5281/zenodo.14610228. An example of plots summarizing a cohort of phenopackets is available in the online supplement.</p>\",\"PeriodicalId\":93899,\"journal\":{\"name\":\"Bioinformatics (Oxford, England)\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":5.4000,\"publicationDate\":\"2025-09-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Bioinformatics (Oxford, England)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/bioinformatics/btaf546\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics (Oxford, England)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btaf546","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

摘要:缺乏数据集成仍然是癌症研究的一个重大障碍,许多分析仍然需要定制软件来转换和准备癌症数据。我们描述了一个软件包,以协调遗传和临床癌症数据到GA4GH表型包模式,表示临床病例数据的ISO标准。我们使用来自美国国家癌症研究所(NCI)的12种癌症类型的病例数据整合了人口统计学、突变、形态学、诊断、干预和生存数据。Phenopacket标准为下游使用提供了基础,包括复杂的统计和AI/ML分析。我们通过使用整合的数据来概括编码异柠檬酸脱氢酶1 (IDH1)的基因突变与脑癌患者生存时间之间的已知关联,从而证明了适应性。可用性和实现:源代码免费提供:https://github.com/monarch-initiative/oncopacket(存档在10.5281/zenodo.15353125)。补充信息:来自12种癌症类型的23650个个体的表型包,其中7816个具有突变数据(平均每个患者80个变异影响62个独特基因),可作为Zenodo数据集:https://doi.org/10.5281/zenodo.14610228。在在线增刊中有一个总结表型包队列的图表示例。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Oncopacket: Integration of Cancer Research Data using GA4GH Phenopackets.

Summary: Lack of data integration remains a significant impediment to cancer research, and many analyses still require customized software to transform and prepare cancer data. We describe a software package to harmonize genetic and clinical cancer data into the GA4GH Phenopacket Schema, an ISO standard for representing clinical case data. We integrated demographic, mutation, morphology, diagnosis, intervention, and survival data using case data from the National Cancer Institute (NCI) for 12 cancer types. The Phenopacket standard provides a foundation for downstream use, including sophisticated statistical and AI/ML analyses. We demonstrate fitness for purpose by using the integrated data to recapitulate a known association between mutations in the gene encoding isocitrate dehydrogenase 1 (IDH1) and survival time in brain cancer patients.

Availability and implementation: Source code is freely available at: https://github.com/monarch-initiative/oncopacket (archived at 10.5281/zenodo.15353125).

Supplementary information: Phenopackets for 23650 individuals from 12 cancer types, 7816 of which have mutational data (average 80 variants affecting 62 unique genes per patient), are available as a Zenodo dataset: https://doi.org/10.5281/zenodo.14610228. An example of plots summarizing a cohort of phenopackets is available in the online supplement.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信