{"title":"PRP:罕见非同义单核苷酸变异的致病风险预测。","authors":"Jee Yeon Heo, Ju Han Kim","doi":"10.1007/s00439-025-02751-z","DOIUrl":null,"url":null,"abstract":"<p><p>Reliable prediction of pathogenic variants plays a crucial role in personalized medicine, which aims to provide accurate diagnosis and individualized treatment using genomic medicine. This study introduces PRP, a pathogenic risk prediction for rare nonsynonymous single nucleotide variants (nsSNVs), including missense, start_lost, stop_gained, and stop_lost variants. PRP was designed to provide robust performance and interpretable predictions using thirty-four features across four categories: frequency, conservation score, substitution metrics, and gene intolerance. Five machine-learning (ML) algorithms were compared to select the optimal model. Hyperparameter optimization was conducted using Optuna, and feature importance was analyzed using Shapley Additive exPlanations (SHAP). PRP used ClinVar data for training and evaluated performance using three independent test datasets and compared it with that of twenty other prediction tools. PRP consistently outperformed state-of-the-art tools across all eight performance metrics: AUC, AUPRC, Accuracy, F1-score, MCC, Precision, Recall, and Specificity. In addition to achieving high sensitivity and high specificity without overestimating the number of pathogenic variants, PRP demonstrates robustness in predicting rare variants. The datasets and codes used for training and testing PRP, along with pre-computed scores, are available at https://github.com/DNAvigation/PRP .</p>","PeriodicalId":13175,"journal":{"name":"Human Genetics","volume":" ","pages":"679-694"},"PeriodicalIF":3.8000,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12170803/pdf/","citationCount":"0","resultStr":"{\"title\":\"PRP: pathogenic risk prediction for rare nonsynonymous single nucleotide variants.\",\"authors\":\"Jee Yeon Heo, Ju Han Kim\",\"doi\":\"10.1007/s00439-025-02751-z\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Reliable prediction of pathogenic variants plays a crucial role in personalized medicine, which aims to provide accurate diagnosis and individualized treatment using genomic medicine. This study introduces PRP, a pathogenic risk prediction for rare nonsynonymous single nucleotide variants (nsSNVs), including missense, start_lost, stop_gained, and stop_lost variants. PRP was designed to provide robust performance and interpretable predictions using thirty-four features across four categories: frequency, conservation score, substitution metrics, and gene intolerance. Five machine-learning (ML) algorithms were compared to select the optimal model. Hyperparameter optimization was conducted using Optuna, and feature importance was analyzed using Shapley Additive exPlanations (SHAP). PRP used ClinVar data for training and evaluated performance using three independent test datasets and compared it with that of twenty other prediction tools. PRP consistently outperformed state-of-the-art tools across all eight performance metrics: AUC, AUPRC, Accuracy, F1-score, MCC, Precision, Recall, and Specificity. In addition to achieving high sensitivity and high specificity without overestimating the number of pathogenic variants, PRP demonstrates robustness in predicting rare variants. The datasets and codes used for training and testing PRP, along with pre-computed scores, are available at https://github.com/DNAvigation/PRP .</p>\",\"PeriodicalId\":13175,\"journal\":{\"name\":\"Human Genetics\",\"volume\":\" \",\"pages\":\"679-694\"},\"PeriodicalIF\":3.8000,\"publicationDate\":\"2025-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12170803/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Human Genetics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1007/s00439-025-02751-z\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/5/29 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"GENETICS & HEREDITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Human Genetics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1007/s00439-025-02751-z","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/5/29 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
PRP: pathogenic risk prediction for rare nonsynonymous single nucleotide variants.
Reliable prediction of pathogenic variants plays a crucial role in personalized medicine, which aims to provide accurate diagnosis and individualized treatment using genomic medicine. This study introduces PRP, a pathogenic risk prediction for rare nonsynonymous single nucleotide variants (nsSNVs), including missense, start_lost, stop_gained, and stop_lost variants. PRP was designed to provide robust performance and interpretable predictions using thirty-four features across four categories: frequency, conservation score, substitution metrics, and gene intolerance. Five machine-learning (ML) algorithms were compared to select the optimal model. Hyperparameter optimization was conducted using Optuna, and feature importance was analyzed using Shapley Additive exPlanations (SHAP). PRP used ClinVar data for training and evaluated performance using three independent test datasets and compared it with that of twenty other prediction tools. PRP consistently outperformed state-of-the-art tools across all eight performance metrics: AUC, AUPRC, Accuracy, F1-score, MCC, Precision, Recall, and Specificity. In addition to achieving high sensitivity and high specificity without overestimating the number of pathogenic variants, PRP demonstrates robustness in predicting rare variants. The datasets and codes used for training and testing PRP, along with pre-computed scores, are available at https://github.com/DNAvigation/PRP .
期刊介绍:
Human Genetics is a monthly journal publishing original and timely articles on all aspects of human genetics. The Journal particularly welcomes articles in the areas of Behavioral genetics, Bioinformatics, Cancer genetics and genomics, Cytogenetics, Developmental genetics, Disease association studies, Dysmorphology, ELSI (ethical, legal and social issues), Evolutionary genetics, Gene expression, Gene structure and organization, Genetics of complex diseases and epistatic interactions, Genetic epidemiology, Genome biology, Genome structure and organization, Genotype-phenotype relationships, Human Genomics, Immunogenetics and genomics, Linkage analysis and genetic mapping, Methods in Statistical Genetics, Molecular diagnostics, Mutation detection and analysis, Neurogenetics, Physical mapping and Population Genetics. Articles reporting animal models relevant to human biology or disease are also welcome. Preference will be given to those articles which address clinically relevant questions or which provide new insights into human biology.
Unless reporting entirely novel and unusual aspects of a topic, clinical case reports, cytogenetic case reports, papers on descriptive population genetics, articles dealing with the frequency of polymorphisms or additional mutations within genes in which numerous lesions have already been described, and papers that report meta-analyses of previously published datasets will normally not be accepted.
The Journal typically will not consider for publication manuscripts that report merely the isolation, map position, structure, and tissue expression profile of a gene of unknown function unless the gene is of particular interest or is a candidate gene involved in a human trait or disorder.