Michael Sierk, Daniel Danis, Sujay Patil, Nobal Kishor, Rajdeep Mondal, Abhishek Jha, Qingrong Chen, Chunhua Yan, Monica Munoz-Torres, Daoud Meerzaman, Peter N Robinson, Justin T Reese
{"title":"Oncopacket:使用GA4GH表型包整合癌症研究数据。","authors":"Michael Sierk, Daniel Danis, Sujay Patil, Nobal Kishor, Rajdeep Mondal, Abhishek Jha, Qingrong Chen, Chunhua Yan, Monica Munoz-Torres, Daoud Meerzaman, Peter N Robinson, Justin T Reese","doi":"10.1093/bioinformatics/btaf546","DOIUrl":null,"url":null,"abstract":"<p><strong>Summary: </strong>Lack of data integration remains a significant impediment to cancer research, and many analyses still require customized software to transform and prepare cancer data. We describe a software package to harmonize genetic and clinical cancer data into the GA4GH Phenopacket Schema, an ISO standard for representing clinical case data. We integrated demographic, mutation, morphology, diagnosis, intervention, and survival data using case data from the National Cancer Institute (NCI) for 12 cancer types. The Phenopacket standard provides a foundation for downstream use, including sophisticated statistical and AI/ML analyses. We demonstrate fitness for purpose by using the integrated data to recapitulate a known association between mutations in the gene encoding isocitrate dehydrogenase 1 (IDH1) and survival time in brain cancer patients.</p><p><strong>Availability and implementation: </strong>Source code is freely available at: https://github.com/monarch-initiative/oncopacket (archived at 10.5281/zenodo.15353125).</p><p><strong>Supplementary information: </strong>Phenopackets for 23650 individuals from 12 cancer types, 7816 of which have mutational data (average 80 variants affecting 62 unique genes per patient), are available as a Zenodo dataset: https://doi.org/10.5281/zenodo.14610228. An example of plots summarizing a cohort of phenopackets is available in the online supplement.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4000,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Oncopacket: Integration of Cancer Research Data using GA4GH Phenopackets.\",\"authors\":\"Michael Sierk, Daniel Danis, Sujay Patil, Nobal Kishor, Rajdeep Mondal, Abhishek Jha, Qingrong Chen, Chunhua Yan, Monica Munoz-Torres, Daoud Meerzaman, Peter N Robinson, Justin T Reese\",\"doi\":\"10.1093/bioinformatics/btaf546\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Summary: </strong>Lack of data integration remains a significant impediment to cancer research, and many analyses still require customized software to transform and prepare cancer data. We describe a software package to harmonize genetic and clinical cancer data into the GA4GH Phenopacket Schema, an ISO standard for representing clinical case data. We integrated demographic, mutation, morphology, diagnosis, intervention, and survival data using case data from the National Cancer Institute (NCI) for 12 cancer types. The Phenopacket standard provides a foundation for downstream use, including sophisticated statistical and AI/ML analyses. We demonstrate fitness for purpose by using the integrated data to recapitulate a known association between mutations in the gene encoding isocitrate dehydrogenase 1 (IDH1) and survival time in brain cancer patients.</p><p><strong>Availability and implementation: </strong>Source code is freely available at: https://github.com/monarch-initiative/oncopacket (archived at 10.5281/zenodo.15353125).</p><p><strong>Supplementary information: </strong>Phenopackets for 23650 individuals from 12 cancer types, 7816 of which have mutational data (average 80 variants affecting 62 unique genes per patient), are available as a Zenodo dataset: https://doi.org/10.5281/zenodo.14610228. An example of plots summarizing a cohort of phenopackets is available in the online supplement.</p>\",\"PeriodicalId\":93899,\"journal\":{\"name\":\"Bioinformatics (Oxford, England)\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":5.4000,\"publicationDate\":\"2025-09-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Bioinformatics (Oxford, England)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/bioinformatics/btaf546\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics (Oxford, England)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btaf546","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Oncopacket: Integration of Cancer Research Data using GA4GH Phenopackets.
Summary: Lack of data integration remains a significant impediment to cancer research, and many analyses still require customized software to transform and prepare cancer data. We describe a software package to harmonize genetic and clinical cancer data into the GA4GH Phenopacket Schema, an ISO standard for representing clinical case data. We integrated demographic, mutation, morphology, diagnosis, intervention, and survival data using case data from the National Cancer Institute (NCI) for 12 cancer types. The Phenopacket standard provides a foundation for downstream use, including sophisticated statistical and AI/ML analyses. We demonstrate fitness for purpose by using the integrated data to recapitulate a known association between mutations in the gene encoding isocitrate dehydrogenase 1 (IDH1) and survival time in brain cancer patients.
Availability and implementation: Source code is freely available at: https://github.com/monarch-initiative/oncopacket (archived at 10.5281/zenodo.15353125).
Supplementary information: Phenopackets for 23650 individuals from 12 cancer types, 7816 of which have mutational data (average 80 variants affecting 62 unique genes per patient), are available as a Zenodo dataset: https://doi.org/10.5281/zenodo.14610228. An example of plots summarizing a cohort of phenopackets is available in the online supplement.