Jason H. Moore, Xi Li, Jui-Hsuan Chang, Nicholas P. Tatonetti, Dan Theodorescu, Yong Chen, F. Asselbergs, Mythreye Venkatesan, Zhiping Wang
{"title":"SynTwin:一种基于图谱的方法,利用从合成患者中提取的数字双胞胎预测临床结果。","authors":"Jason H. Moore, Xi Li, Jui-Hsuan Chang, Nicholas P. Tatonetti, Dan Theodorescu, Yong Chen, F. Asselbergs, Mythreye Venkatesan, Zhiping Wang","doi":"10.1142/9789811286421_0008","DOIUrl":null,"url":null,"abstract":"The concept of a digital twin came from the engineering, industrial, and manufacturing domains to create virtual objects or machines that could inform the design and development of real objects. This idea is appealing for precision medicine where digital twins of patients could help inform healthcare decisions. We have developed a methodology for generating and using digital twins for clinical outcome prediction. We introduce a new approach that combines synthetic data and network science to create digital twins (i.e. SynTwin) for precision medicine. First, our approach starts by estimating the distance between all subjects based on their available features. Second, the distances are used to construct a network with subjects as nodes and edges defining distance less than the percolation threshold. Third, communities or cliques of subjects are defined. Fourth, a large population of synthetic patients are generated using a synthetic data generation algorithm that models the correlation structure of the data to generate new patients. Fifth, digital twins are selected from the synthetic patient population that are within a given distance defining a subject community in the network. Finally, we compare and contrast community-based prediction of clinical endpoints using real subjects, digital twins, or both within and outside of the community. Key to this approach are the digital twins defined using patient similarity that represent hypothetical unobserved patients with patterns similar to nearby real patients as defined by network distance and community structure. We apply our SynTwin approach to predicting mortality in a population-based cancer registry (n=87,674) from the Surveillance, Epidemiology, and End Results (SEER) program from the National Cancer Institute (USA). Our results demonstrate that nearest network neighbor prediction of mortality in this study is significantly improved with digital twins (AUROC=0.864, 95% CI=0.857-0.872) over just using real data alone (AUROC=0.791, 95% CI=0.781-0.800). These results suggest a network-based digital twin strategy using synthetic patients may add value to precision medicine efforts.","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"40 15","pages":"96-107"},"PeriodicalIF":0.0000,"publicationDate":"2023-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SynTwin: A graph-based approach for predicting clinical outcomes using digital twins derived from synthetic patients.\",\"authors\":\"Jason H. Moore, Xi Li, Jui-Hsuan Chang, Nicholas P. Tatonetti, Dan Theodorescu, Yong Chen, F. Asselbergs, Mythreye Venkatesan, Zhiping Wang\",\"doi\":\"10.1142/9789811286421_0008\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The concept of a digital twin came from the engineering, industrial, and manufacturing domains to create virtual objects or machines that could inform the design and development of real objects. This idea is appealing for precision medicine where digital twins of patients could help inform healthcare decisions. We have developed a methodology for generating and using digital twins for clinical outcome prediction. We introduce a new approach that combines synthetic data and network science to create digital twins (i.e. SynTwin) for precision medicine. First, our approach starts by estimating the distance between all subjects based on their available features. Second, the distances are used to construct a network with subjects as nodes and edges defining distance less than the percolation threshold. Third, communities or cliques of subjects are defined. Fourth, a large population of synthetic patients are generated using a synthetic data generation algorithm that models the correlation structure of the data to generate new patients. Fifth, digital twins are selected from the synthetic patient population that are within a given distance defining a subject community in the network. Finally, we compare and contrast community-based prediction of clinical endpoints using real subjects, digital twins, or both within and outside of the community. Key to this approach are the digital twins defined using patient similarity that represent hypothetical unobserved patients with patterns similar to nearby real patients as defined by network distance and community structure. We apply our SynTwin approach to predicting mortality in a population-based cancer registry (n=87,674) from the Surveillance, Epidemiology, and End Results (SEER) program from the National Cancer Institute (USA). Our results demonstrate that nearest network neighbor prediction of mortality in this study is significantly improved with digital twins (AUROC=0.864, 95% CI=0.857-0.872) over just using real data alone (AUROC=0.791, 95% CI=0.781-0.800). These results suggest a network-based digital twin strategy using synthetic patients may add value to precision medicine efforts.\",\"PeriodicalId\":34954,\"journal\":{\"name\":\"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing\",\"volume\":\"40 15\",\"pages\":\"96-107\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-12-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1142/9789811286421_0008\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Computer Science\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/9789811286421_0008","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Computer Science","Score":null,"Total":0}
引用次数: 0
摘要
数字孪生的概念来自工程、工业和制造领域,旨在创建虚拟物体或机器,为真实物体的设计和开发提供参考。这一想法对精准医疗很有吸引力,患者的数字孪生可以帮助医疗决策提供依据。我们开发了一种生成和使用数字双胞胎进行临床结果预测的方法。我们介绍了一种结合合成数据和网络科学的新方法,为精准医疗创建数字孪生(即 SynTwin)。首先,我们的方法是根据所有受试者的可用特征来估计他们之间的距离。其次,利用这些距离构建一个网络,以受试者为节点,边缘定义的距离小于渗透阈值。第三,定义受试者的群落或小群。第四,使用合成数据生成算法生成大量合成患者,该算法可模拟数据的相关结构,生成新的患者。第五,从合成患者群体中挑选出一定距离内的数字双胞胎,定义网络中的主体群落。最后,我们使用真实受试者、数字双胞胎或社区内外的受试者对基于社区的临床终点预测进行比较和对比。这种方法的关键在于使用患者相似性定义的数字孪生,它代表了假设的未观察到的患者,其模式与网络距离和社区结构定义的附近真实患者相似。我们将 SynTwin 方法应用于预测美国国家癌症研究所(National Cancer Institute,USA)监测、流行病学和最终结果(Surveillance,Epidemiology,and End Results,SEER)项目中基于人群的癌症登记(n=87,674)中的死亡率。我们的研究结果表明,在这项研究中,使用数字孪生预测死亡率的最近网络邻居(AUROC=0.864,95% CI=0.857-0.872)明显优于仅使用真实数据(AUROC=0.791,95% CI=0.781-0.800)。这些结果表明,使用合成患者的基于网络的数字孪生策略可能会为精准医疗工作增添价值。
SynTwin: A graph-based approach for predicting clinical outcomes using digital twins derived from synthetic patients.
The concept of a digital twin came from the engineering, industrial, and manufacturing domains to create virtual objects or machines that could inform the design and development of real objects. This idea is appealing for precision medicine where digital twins of patients could help inform healthcare decisions. We have developed a methodology for generating and using digital twins for clinical outcome prediction. We introduce a new approach that combines synthetic data and network science to create digital twins (i.e. SynTwin) for precision medicine. First, our approach starts by estimating the distance between all subjects based on their available features. Second, the distances are used to construct a network with subjects as nodes and edges defining distance less than the percolation threshold. Third, communities or cliques of subjects are defined. Fourth, a large population of synthetic patients are generated using a synthetic data generation algorithm that models the correlation structure of the data to generate new patients. Fifth, digital twins are selected from the synthetic patient population that are within a given distance defining a subject community in the network. Finally, we compare and contrast community-based prediction of clinical endpoints using real subjects, digital twins, or both within and outside of the community. Key to this approach are the digital twins defined using patient similarity that represent hypothetical unobserved patients with patterns similar to nearby real patients as defined by network distance and community structure. We apply our SynTwin approach to predicting mortality in a population-based cancer registry (n=87,674) from the Surveillance, Epidemiology, and End Results (SEER) program from the National Cancer Institute (USA). Our results demonstrate that nearest network neighbor prediction of mortality in this study is significantly improved with digital twins (AUROC=0.864, 95% CI=0.857-0.872) over just using real data alone (AUROC=0.791, 95% CI=0.781-0.800). These results suggest a network-based digital twin strategy using synthetic patients may add value to precision medicine efforts.