释放多机构数据的力量：跨机构整合与协调基因组数据。

ArXiv Pub Date : 2024-10-29

Yuan Chen, Ronglai Shen, Xiwen Feng, Katherine Panageas

{"title":"释放多机构数据的力量：跨机构整合与协调基因组数据。","authors":"Yuan Chen, Ronglai Shen, Xiwen Feng, Katherine Panageas","doi":"","DOIUrl":null,"url":null,"abstract":"Cancer is a complex disease driven by genomic alterations, and tumor sequencing is becoming a mainstay of clinical care for cancer patients. The emergence of multi-institution sequencing data presents a powerful resource for learning real-world evidence to enhance precision oncology. GENIE BPC, led by American Association for Cancer Research, establishes a unique database linking genomic data with clinical information for patients treated at multiple cancer centers. However, leveraging sequencing data from multiple institutions presents significant challenges. Variability in gene panels can lead to loss of information when analyses focus on genes common across panels. Additionally, differences in sequencing techniques and patient heterogeneity across institutions add complexity. High data dimensionality, sparse gene mutation patterns, and weak signals at the individual gene level further complicate matters. Motivated by these real-world challenges, we introduce the Bridge model. It uses a quantile-matched latent variable approach to derive integrated features to preserve information beyond common genes and maximize the utilization of all available data, while leveraging information sharing to enhance both learning efficiency and the model's capacity to generalize. By extracting harmonized and noise-reduced lower-dimensional latent variables, the true mutation pattern unique to each individual is captured. We assess model's performance and parameter estimation through extensive simulation studies. The extracted latent features from the Bridge model consistently excel in predicting patient survival across six cancer types in GENIE BPC data.","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11581117/pdf/","citationCount":"0","resultStr":"{\"title\":\"Unlocking the Power of Multi-institutional Data: Integrating and Harmonizing Genomic Data Across Institutions.\",\"authors\":\"Yuan Chen, Ronglai Shen, Xiwen Feng, Katherine Panageas\",\"doi\":\"\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Cancer is a complex disease driven by genomic alterations, and tumor sequencing is becoming a mainstay of clinical care for cancer patients. The emergence of multi-institution sequencing data presents a powerful resource for learning real-world evidence to enhance precision oncology. GENIE BPC, led by American Association for Cancer Research, establishes a unique database linking genomic data with clinical information for patients treated at multiple cancer centers. However, leveraging sequencing data from multiple institutions presents significant challenges. Variability in gene panels can lead to loss of information when analyses focus on genes common across panels. Additionally, differences in sequencing techniques and patient heterogeneity across institutions add complexity. High data dimensionality, sparse gene mutation patterns, and weak signals at the individual gene level further complicate matters. Motivated by these real-world challenges, we introduce the Bridge model. It uses a quantile-matched latent variable approach to derive integrated features to preserve information beyond common genes and maximize the utilization of all available data, while leveraging information sharing to enhance both learning efficiency and the model's capacity to generalize. By extracting harmonized and noise-reduced lower-dimensional latent variables, the true mutation pattern unique to each individual is captured. We assess model's performance and parameter estimation through extensive simulation studies. The extracted latent features from the Bridge model consistently excel in predicting patient survival across six cancer types in GENIE BPC data.\",\"PeriodicalId\":93888,\"journal\":{\"name\":\"ArXiv\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-10-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11581117/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ArXiv\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ArXiv","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

癌症是一种由基因组改变驱动的复杂疾病，肿瘤测序正成为癌症患者临床治疗的主要手段。多机构测序数据的出现为学习真实世界的证据以提高精准肿瘤学提供了强大的资源。由美国癌症研究协会牵头的 GENIE BPC 建立了一个独特的数据库，将基因组数据与在多个癌症中心接受治疗的患者的临床信息联系起来。然而，利用这种多机构测序数据面临着巨大的挑战。在对常见基因集进行分析时，基因面板的差异会导致信息丢失。此外，各机构测序技术的差异和患者的异质性也增加了复杂性。高数据维度、稀疏的基因突变模式和单个基因水平的微弱信号使问题更加复杂。在这些现实挑战的激励下，我们引入了 Bridge 模型。该模型采用量化匹配潜变量方法提取综合特征，以保留共同基因以外的信息，最大限度地利用所有可用数据，同时利用信息共享提高学习效率和模型的泛化能力。通过提取经过协调和降噪处理的低维潜在变量，可以捕捉到每个个体独有的真实突变模式。我们通过大量的模拟研究来评估模型的性能和参数估计。从 Bridge 模型中提取的潜特征在预测 GENIE BPC 数据中六种癌症类型的患者生存率方面始终表现出色。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

本刊更多论文

Unlocking the Power of Multi-institutional Data: Integrating and Harmonizing Genomic Data Across Institutions.

Cancer is a complex disease driven by genomic alterations, and tumor sequencing is becoming a mainstay of clinical care for cancer patients. The emergence of multi-institution sequencing data presents a powerful resource for learning real-world evidence to enhance precision oncology. GENIE BPC, led by American Association for Cancer Research, establishes a unique database linking genomic data with clinical information for patients treated at multiple cancer centers. However, leveraging sequencing data from multiple institutions presents significant challenges. Variability in gene panels can lead to loss of information when analyses focus on genes common across panels. Additionally, differences in sequencing techniques and patient heterogeneity across institutions add complexity. High data dimensionality, sparse gene mutation patterns, and weak signals at the individual gene level further complicate matters. Motivated by these real-world challenges, we introduce the Bridge model. It uses a quantile-matched latent variable approach to derive integrated features to preserve information beyond common genes and maximize the utilization of all available data, while leveraging information sharing to enhance both learning efficiency and the model's capacity to generalize. By extracting harmonized and noise-reduced lower-dimensional latent variables, the true mutation pattern unique to each individual is captured. We assess model's performance and parameter estimation through extensive simulation studies. The extracted latent features from the Bridge model consistently excel in predicting patient survival across six cancer types in GENIE BPC data.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ArXiv

自引率

0.00%

发文量