用生存曲线分析预测SCA3和DRPLA年龄特异性概率的机器学习方法。

IF 3 3区 医学 Q2 CLINICAL NEUROLOGY
Yuya Hatano, Tomohiko Ishihara, Sachiko Hirokawa, Osamu Onodera
{"title":"用生存曲线分析预测SCA3和DRPLA年龄特异性概率的机器学习方法。","authors":"Yuya Hatano,&nbsp;Tomohiko Ishihara,&nbsp;Sachiko Hirokawa,&nbsp;Osamu Onodera","doi":"10.1212/NXG.0000000000200075","DOIUrl":null,"url":null,"abstract":"<p><strong>Background and objectives: </strong>As the number of repeats in the expansion increases, polyglutamine diseases tend to show at a younger age. From this relationship, attempts have been made to predict age at onset by parametric survival analysis. However, a method for a more accurate prediction has been desirable. In this study, we examined 2 methods for survival analysis using machine learning and 6 conventional methods for parametric survival analysis of spinocerebellar ataxia (SCA)3 and dentatorubral-pallidoluysian atrophy (DRPLA).</p><p><strong>Methods: </strong>We compared the performance of 2 machine learning methods of survival analysis (random survival forest [RSF] and DeepSurv) and 6 methods of parametric survival analysis (Weibull, exponential, Gaussian, logistic, loglogistic, and log Gaussian). Training and evaluation were performed using the leave-one-out cross-validation method, and evaluation criteria included root mean squared error (RMSE), mean absolute error (MAE), and the integrated Brier score. The latter was used as the primary end point, and the survival analysis model yielding the best result was used to predict the asymptomatic probability.</p><p><strong>Results: </strong>Among the models examined, the RSF and DeepSurv machine learning methods had a higher prediction accuracy than the parametric methods of survival analysis. For both SCA3 and DRPLA, RSF had a higher accuracy than DeepSurv for the assessment of RMSE (SCA3: 7.37, DRPLA: 10.78), MAE (SCA3: 5.52, DRPLA: 8.17), and the integrated Brier score (SCA3: 0.05, DRPLA: 0.077). Using RSF, we determined the age-specific probability distribution of age at onset based on CAG repeat size and current age.</p><p><strong>Discussion: </strong>In this study, we have demonstrated the superiority of machine learning methods for predicting age at onset of SCA3 and DRPLA using survival analysis. Such accurate prediction of onset will be useful for genetic counseling of carriers and for devising methods to verify the effects of interventions for unaffected individuals.</p>","PeriodicalId":48613,"journal":{"name":"Neurology-Genetics","volume":"9 3","pages":"e200075"},"PeriodicalIF":3.0000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/f2/bf/NXG-2023-000018.PMC10159758.pdf","citationCount":"1","resultStr":"{\"title\":\"Machine Learning Approach for the Prediction of Age-Specific Probability of SCA3 and DRPLA by Survival Curve Analysis.\",\"authors\":\"Yuya Hatano,&nbsp;Tomohiko Ishihara,&nbsp;Sachiko Hirokawa,&nbsp;Osamu Onodera\",\"doi\":\"10.1212/NXG.0000000000200075\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background and objectives: </strong>As the number of repeats in the expansion increases, polyglutamine diseases tend to show at a younger age. From this relationship, attempts have been made to predict age at onset by parametric survival analysis. However, a method for a more accurate prediction has been desirable. In this study, we examined 2 methods for survival analysis using machine learning and 6 conventional methods for parametric survival analysis of spinocerebellar ataxia (SCA)3 and dentatorubral-pallidoluysian atrophy (DRPLA).</p><p><strong>Methods: </strong>We compared the performance of 2 machine learning methods of survival analysis (random survival forest [RSF] and DeepSurv) and 6 methods of parametric survival analysis (Weibull, exponential, Gaussian, logistic, loglogistic, and log Gaussian). Training and evaluation were performed using the leave-one-out cross-validation method, and evaluation criteria included root mean squared error (RMSE), mean absolute error (MAE), and the integrated Brier score. The latter was used as the primary end point, and the survival analysis model yielding the best result was used to predict the asymptomatic probability.</p><p><strong>Results: </strong>Among the models examined, the RSF and DeepSurv machine learning methods had a higher prediction accuracy than the parametric methods of survival analysis. For both SCA3 and DRPLA, RSF had a higher accuracy than DeepSurv for the assessment of RMSE (SCA3: 7.37, DRPLA: 10.78), MAE (SCA3: 5.52, DRPLA: 8.17), and the integrated Brier score (SCA3: 0.05, DRPLA: 0.077). Using RSF, we determined the age-specific probability distribution of age at onset based on CAG repeat size and current age.</p><p><strong>Discussion: </strong>In this study, we have demonstrated the superiority of machine learning methods for predicting age at onset of SCA3 and DRPLA using survival analysis. Such accurate prediction of onset will be useful for genetic counseling of carriers and for devising methods to verify the effects of interventions for unaffected individuals.</p>\",\"PeriodicalId\":48613,\"journal\":{\"name\":\"Neurology-Genetics\",\"volume\":\"9 3\",\"pages\":\"e200075\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2023-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/f2/bf/NXG-2023-000018.PMC10159758.pdf\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neurology-Genetics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1212/NXG.0000000000200075\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"CLINICAL NEUROLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurology-Genetics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1212/NXG.0000000000200075","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
引用次数: 1

摘要

背景与目的:随着扩增中重复序列数量的增加,多谷氨酰胺疾病倾向于在更年轻的年龄表现出来。根据这种关系,已经尝试通过参数生存分析来预测发病年龄。然而,一种更准确的预测方法一直是可取的。在这项研究中,我们研究了2种使用机器学习的生存分析方法和6种常规的参数生存分析方法,用于脊髓小脑性失调(SCA)3和齿状体-苍白球萎缩(DRPLA)。方法:我们比较了2种机器学习生存分析方法(随机生存森林[RSF]和DeepSurv)和6种参数生存分析方法(威布尔、指数、高斯、logistic、逻辑学和对数高斯)的性能。采用留一交叉验证法进行训练和评价,评价标准包括均方根误差(RMSE)、平均绝对误差(MAE)和综合Brier评分。后者作为主要终点,采用产生最佳结果的生存分析模型预测无症状概率。结果:在研究的模型中,RSF和DeepSurv机器学习方法的预测精度高于生存分析的参数方法。对于SCA3和DRPLA, RSF对RMSE (SCA3: 7.37, DRPLA: 10.78)、MAE (SCA3: 5.52, DRPLA: 8.17)和Brier综合评分(SCA3: 0.05, DRPLA: 0.077)的评估精度均高于DeepSurv。使用RSF,我们根据CAG重复序列大小和当前年龄确定发病年龄的年龄特异性概率分布。讨论:在本研究中,我们通过生存分析证明了机器学习方法在预测SCA3和DRPLA发病年龄方面的优越性。这种对发病的准确预测将有助于对携带者进行遗传咨询,并有助于设计方法来验证对未受影响个体的干预效果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Machine Learning Approach for the Prediction of Age-Specific Probability of SCA3 and DRPLA by Survival Curve Analysis.

Machine Learning Approach for the Prediction of Age-Specific Probability of SCA3 and DRPLA by Survival Curve Analysis.

Machine Learning Approach for the Prediction of Age-Specific Probability of SCA3 and DRPLA by Survival Curve Analysis.

Machine Learning Approach for the Prediction of Age-Specific Probability of SCA3 and DRPLA by Survival Curve Analysis.

Background and objectives: As the number of repeats in the expansion increases, polyglutamine diseases tend to show at a younger age. From this relationship, attempts have been made to predict age at onset by parametric survival analysis. However, a method for a more accurate prediction has been desirable. In this study, we examined 2 methods for survival analysis using machine learning and 6 conventional methods for parametric survival analysis of spinocerebellar ataxia (SCA)3 and dentatorubral-pallidoluysian atrophy (DRPLA).

Methods: We compared the performance of 2 machine learning methods of survival analysis (random survival forest [RSF] and DeepSurv) and 6 methods of parametric survival analysis (Weibull, exponential, Gaussian, logistic, loglogistic, and log Gaussian). Training and evaluation were performed using the leave-one-out cross-validation method, and evaluation criteria included root mean squared error (RMSE), mean absolute error (MAE), and the integrated Brier score. The latter was used as the primary end point, and the survival analysis model yielding the best result was used to predict the asymptomatic probability.

Results: Among the models examined, the RSF and DeepSurv machine learning methods had a higher prediction accuracy than the parametric methods of survival analysis. For both SCA3 and DRPLA, RSF had a higher accuracy than DeepSurv for the assessment of RMSE (SCA3: 7.37, DRPLA: 10.78), MAE (SCA3: 5.52, DRPLA: 8.17), and the integrated Brier score (SCA3: 0.05, DRPLA: 0.077). Using RSF, we determined the age-specific probability distribution of age at onset based on CAG repeat size and current age.

Discussion: In this study, we have demonstrated the superiority of machine learning methods for predicting age at onset of SCA3 and DRPLA using survival analysis. Such accurate prediction of onset will be useful for genetic counseling of carriers and for devising methods to verify the effects of interventions for unaffected individuals.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Neurology-Genetics
Neurology-Genetics Medicine-Neurology (clinical)
CiteScore
6.30
自引率
3.20%
发文量
107
审稿时长
15 weeks
期刊介绍: Neurology: Genetics is an online open access journal publishing peer-reviewed reports in the field of neurogenetics. Original articles in all areas of neurogenetics will be published including rare and common genetic variation, genotype-phenotype correlations, outlier phenotypes as a result of mutations in known disease-genes, and genetic variations with a putative link to diseases. This will include studies reporting on genetic disease risk and pharmacogenomics. In addition, Neurology: Genetics will publish results of gene-based clinical trials (viral, ASO, etc.). Genetically engineered model systems are not a primary focus of Neurology: Genetics, but studies using model systems for treatment trials are welcome, including well-powered studies reporting negative results.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信