平衡病历审核的工作量与 PRS 预测准确性的提高：实证研究。

IF 4 2区医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Journal of Biomedical Informatics Pub Date : 2024-08-10 DOI:10.1016/j.jbi.2024.104705

Yuqing Lei , Adam Christian Naj , Hua Xu , Ruowang Li , Yong Chen

{"title":"平衡病历审核的工作量与 PRS 预测准确性的提高：实证研究。","authors":"Yuqing Lei , Adam Christian Naj , Hua Xu , Ruowang Li , Yong Chen","doi":"10.1016/j.jbi.2024.104705","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><p>Phenotypic misclassification in genetic association analyses can impact the accuracy of PRS-based prediction models. The bias reduction method proposed by Tong et al. (2019) has demonstrated its efficacy in reducing the effects of bias on the estimation of association parameters between genotype and phenotype while minimizing variance by employing chart reviews on a subset of the data for validating phenotypes, however its improvement of subsequent PRS prediction accuracy remains unclear. Our study aims to fill this gap by assessing the performance of simulated PRS models and estimating the optimal number of chart reviews needed for validation.</p></div><div><h3>Methods</h3><p>To comprehensively assess the efficacy of the bias reduction method proposed by Tong et al. in enhancing the accuracy of PRS-based prediction models, we simulated each phenotype under different correlation structures (an independent model, a weakly correlated model, a strongly correlated model) and introduced error-prone phenotypes using two distinct error mechanisms (differential and non-differential phenotyping errors). To facilitate this, we used genotype and phenotype data from 12 case-control datasets in the Alzheimer’s Disease Genetics Consortium (ADGC) to produce simulated phenotypes. The evaluation included analyses across various misclassification rates of original phenotypes as well as quantities of validation set. Additionally, we determined the median threshold, identifying the minimal validation size required for a meaningful improvement in the accuracy of PRS-based predictions across a broad spectrum.</p></div><div><h3>Results</h3><p>This simulation study demonstrated that incorporating chart review does not universally guarantee enhanced performance of PRS-based prediction models. Specifically, in scenarios with minimal misclassification rates and limited validation sizes, PRS models utilizing debiased regression coefficients demonstrated inferior predictive capabilities compared to models using error-prone phenotypes. Put differently, the effectiveness of the bias reduction method is contingent upon the misclassification rates of phenotypes and the size of the validation set employed during chart reviews. Notably, when dealing with datasets featuring higher misclassification rates, the advantages of utilizing this bias reduction method become more evident, requiring a smaller validation set to achieve better performance.</p></div><div><h3>Conclusion</h3><p>This study highlights the importance of choosing an appropriate validation set size to balance between the efforts of chart review and the gain in PRS prediction accuracy. Consequently, our study establishes a valuable guidance for validation planning, across a diverse array of sensitivity and specificity combinations.</p></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"157 ","pages":"Article 104705"},"PeriodicalIF":4.0000,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Balancing the efforts of chart review and gains in PRS prediction accuracy: An empirical study\",\"authors\":\"Yuqing Lei , Adam Christian Naj , Hua Xu , Ruowang Li , Yong Chen\",\"doi\":\"10.1016/j.jbi.2024.104705\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Objective</h3><p>Phenotypic misclassification in genetic association analyses can impact the accuracy of PRS-based prediction models. The bias reduction method proposed by Tong et al. (2019) has demonstrated its efficacy in reducing the effects of bias on the estimation of association parameters between genotype and phenotype while minimizing variance by employing chart reviews on a subset of the data for validating phenotypes, however its improvement of subsequent PRS prediction accuracy remains unclear. Our study aims to fill this gap by assessing the performance of simulated PRS models and estimating the optimal number of chart reviews needed for validation.</p></div><div><h3>Methods</h3><p>To comprehensively assess the efficacy of the bias reduction method proposed by Tong et al. in enhancing the accuracy of PRS-based prediction models, we simulated each phenotype under different correlation structures (an independent model, a weakly correlated model, a strongly correlated model) and introduced error-prone phenotypes using two distinct error mechanisms (differential and non-differential phenotyping errors). To facilitate this, we used genotype and phenotype data from 12 case-control datasets in the Alzheimer’s Disease Genetics Consortium (ADGC) to produce simulated phenotypes. The evaluation included analyses across various misclassification rates of original phenotypes as well as quantities of validation set. Additionally, we determined the median threshold, identifying the minimal validation size required for a meaningful improvement in the accuracy of PRS-based predictions across a broad spectrum.</p></div><div><h3>Results</h3><p>This simulation study demonstrated that incorporating chart review does not universally guarantee enhanced performance of PRS-based prediction models. Specifically, in scenarios with minimal misclassification rates and limited validation sizes, PRS models utilizing debiased regression coefficients demonstrated inferior predictive capabilities compared to models using error-prone phenotypes. Put differently, the effectiveness of the bias reduction method is contingent upon the misclassification rates of phenotypes and the size of the validation set employed during chart reviews. Notably, when dealing with datasets featuring higher misclassification rates, the advantages of utilizing this bias reduction method become more evident, requiring a smaller validation set to achieve better performance.</p></div><div><h3>Conclusion</h3><p>This study highlights the importance of choosing an appropriate validation set size to balance between the efforts of chart review and the gain in PRS prediction accuracy. Consequently, our study establishes a valuable guidance for validation planning, across a diverse array of sensitivity and specificity combinations.</p></div>\",\"PeriodicalId\":15263,\"journal\":{\"name\":\"Journal of Biomedical Informatics\",\"volume\":\"157 \",\"pages\":\"Article 104705\"},\"PeriodicalIF\":4.0000,\"publicationDate\":\"2024-08-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Biomedical Informatics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1532046424001230\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Biomedical Informatics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1532046424001230","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

摘要

目的：遗传关联分析中的表型分类错误会影响基于 PRS 预测模型的准确性。Tong 等人（2019 年）提出的减少偏倚方法证明了其在减少偏倚对基因型和表型之间关联参数估计的影响方面的有效性，同时通过对验证表型的数据子集进行图表审查来最小化方差，但其对后续 PRS 预测准确性的改善效果仍不明确。我们的研究旨在通过评估模拟 PRS 模型的性能和估算验证所需的最佳图表审查数量来填补这一空白：为了全面评估 Tong 等人提出的减少偏倚方法在提高基于 PRS 预测模型准确性方面的功效，我们模拟了不同相关结构（独立模型、弱相关模型、强相关模型）下的每种表型，并使用两种不同的错误机制（差异表型错误和非差异表型错误）引入了易出错的表型。为此，我们使用了阿尔茨海默病遗传学联合会（ADGC）12 个病例对照数据集的基因型和表型数据来制作模拟表型。评估包括分析原始表型的各种误分类率以及验证集的数量。此外，我们还确定了中值阈值，确定了在广泛范围内有效提高基于 PRS 预测的准确性所需的最小验证规模：这项模拟研究表明，纳入病历审查并不能普遍保证基于 PRS 预测模型的性能得到提高。具体来说，在误分类率极低且验证规模有限的情况下，与使用易出错表型的模型相比，使用去偏回归系数的 PRS 模型的预测能力较差。换句话说，减少偏差方法的有效性取决于表型的误分类率和病历审核中使用的验证集的大小。值得注意的是，在处理误分类率较高的数据集时，使用这种减少偏差方法的优势会更加明显，需要较小的验证集来获得更好的性能：本研究强调了选择适当验证集大小的重要性，以在病历审核工作量和 PRS 预测准确率之间取得平衡。因此，我们的研究为各种灵敏度和特异性组合的验证规划提供了宝贵的指导。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Balancing the efforts of chart review and gains in PRS prediction accuracy: An empirical study

查看原文本刊更多论文

Balancing the efforts of chart review and gains in PRS prediction accuracy: An empirical study

Objective

Phenotypic misclassification in genetic association analyses can impact the accuracy of PRS-based prediction models. The bias reduction method proposed by Tong et al. (2019) has demonstrated its efficacy in reducing the effects of bias on the estimation of association parameters between genotype and phenotype while minimizing variance by employing chart reviews on a subset of the data for validating phenotypes, however its improvement of subsequent PRS prediction accuracy remains unclear. Our study aims to fill this gap by assessing the performance of simulated PRS models and estimating the optimal number of chart reviews needed for validation.

Methods

To comprehensively assess the efficacy of the bias reduction method proposed by Tong et al. in enhancing the accuracy of PRS-based prediction models, we simulated each phenotype under different correlation structures (an independent model, a weakly correlated model, a strongly correlated model) and introduced error-prone phenotypes using two distinct error mechanisms (differential and non-differential phenotyping errors). To facilitate this, we used genotype and phenotype data from 12 case-control datasets in the Alzheimer’s Disease Genetics Consortium (ADGC) to produce simulated phenotypes. The evaluation included analyses across various misclassification rates of original phenotypes as well as quantities of validation set. Additionally, we determined the median threshold, identifying the minimal validation size required for a meaningful improvement in the accuracy of PRS-based predictions across a broad spectrum.

Results

This simulation study demonstrated that incorporating chart review does not universally guarantee enhanced performance of PRS-based prediction models. Specifically, in scenarios with minimal misclassification rates and limited validation sizes, PRS models utilizing debiased regression coefficients demonstrated inferior predictive capabilities compared to models using error-prone phenotypes. Put differently, the effectiveness of the bias reduction method is contingent upon the misclassification rates of phenotypes and the size of the validation set employed during chart reviews. Notably, when dealing with datasets featuring higher misclassification rates, the advantages of utilizing this bias reduction method become more evident, requiring a smaller validation set to achieve better performance.

Conclusion

This study highlights the importance of choosing an appropriate validation set size to balance between the efforts of chart review and the gain in PRS prediction accuracy. Consequently, our study establishes a valuable guidance for validation planning, across a diverse array of sensitivity and specificity combinations.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Biomedical Informatics 医学-计算机：跨学科应用

CiteScore

8.90

自引率

6.70%

发文量

243

审稿时长

32 days

期刊介绍： The Journal of Biomedical Informatics reflects a commitment to high-quality original research papers, reviews, and commentaries in the area of biomedical informatics methodology. Although we publish articles motivated by applications in the biomedical sciences (for example, clinical medicine, health care, population health, and translational bioinformatics), the journal emphasizes reports of new methodologies and techniques that have general applicability and that form the basis for the evolving science of biomedical informatics. Articles on medical devices; evaluations of implemented systems (including clinical trials of information technologies); or papers that provide insight into a biological process, a specific disease, or treatment options would generally be more suitable for publication in other venues. Papers on applications of signal processing and image analysis are often more suitable for biomedical engineering journals or other informatics journals, although we do publish papers that emphasize the information management and knowledge representation/modeling issues that arise in the storage and use of biological signals and images. System descriptions are welcome if they illustrate and substantiate the underlying methodology that is the principal focus of the report and an effort is made to address the generalizability and/or range of application of that methodology. Note also that, given the international nature of JBI, papers that deal with specific languages other than English, or with country-specific health systems or approaches, are acceptable for JBI only if they offer generalizable lessons that are relevant to the broad JBI readership, regardless of their country, language, culture, or health system.