DS-MVP: identifying disease-specific pathogenicity of missense variants by pre-training representation.

IF 6.8 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Briefings in bioinformatics Pub Date : 2025-03-04 DOI:10.1093/bib/bbaf119

Qiufeng Chen, Lijun Quan, Lexin Cao, Bei Zhang, Zhijun Zhang, Liangchen Peng, Junkai Wang, Yelu Jiang, Liangpeng Nie, Geng Li, Tingfang Wu, Qiang Lyu

{"title":"DS-MVP: identifying disease-specific pathogenicity of missense variants by pre-training representation.","authors":"Qiufeng Chen, Lijun Quan, Lexin Cao, Bei Zhang, Zhijun Zhang, Liangchen Peng, Junkai Wang, Yelu Jiang, Liangpeng Nie, Geng Li, Tingfang Wu, Qiang Lyu","doi":"10.1093/bib/bbaf119","DOIUrl":null,"url":null,"abstract":"<p><p>Accurately predicting the pathogenicity of missense variants is crucial for improving disease diagnosis and advancing clinical research. However, existing computational methods primarily focus on general pathogenicity predictions, overlooking assessments of disease-specific conditions. In this study, we propose DS-MVP, a method capable of predicting disease-specific pathogenicity of missense variants in human genomes. DS-MVP first leverages a deep learning model pre-trained on a large general pathogenicity dataset to learn rich representation of missense variants. It then fine-tunes these representations with an XGBoost model on smaller datasets for specific diseases. We evaluated the learned representation by testing it on multiple binary pathogenicity datasets and gene-level statistics, demonstrating that DS-MVP outperforms existing state-of-the-art methods, such as MetaRNN and AlphaMissense. Additionally, DS-MVP excels in multi-label and multi-class classification, effectively classifying disease-specific pathogenic missense variants based on disease conditions. It further enhances predictions by fine-tuning the pre-trained model on disease-specific datasets. Finally, we analyzed the contributions of the pre-trained model and various feature types, with gene description corpus features from large language model and genetic feature fusion contributing the most. These results underscore that DS-MVP represents a broader perspective on pathogenicity prediction and holds potential as an effective tool for disease diagnosis.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 2","pages":""},"PeriodicalIF":6.8000,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11932084/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Briefings in bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bib/bbaf119","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Accurately predicting the pathogenicity of missense variants is crucial for improving disease diagnosis and advancing clinical research. However, existing computational methods primarily focus on general pathogenicity predictions, overlooking assessments of disease-specific conditions. In this study, we propose DS-MVP, a method capable of predicting disease-specific pathogenicity of missense variants in human genomes. DS-MVP first leverages a deep learning model pre-trained on a large general pathogenicity dataset to learn rich representation of missense variants. It then fine-tunes these representations with an XGBoost model on smaller datasets for specific diseases. We evaluated the learned representation by testing it on multiple binary pathogenicity datasets and gene-level statistics, demonstrating that DS-MVP outperforms existing state-of-the-art methods, such as MetaRNN and AlphaMissense. Additionally, DS-MVP excels in multi-label and multi-class classification, effectively classifying disease-specific pathogenic missense variants based on disease conditions. It further enhances predictions by fine-tuning the pre-trained model on disease-specific datasets. Finally, we analyzed the contributions of the pre-trained model and various feature types, with gene description corpus features from large language model and genetic feature fusion contributing the most. These results underscore that DS-MVP represents a broader perspective on pathogenicity prediction and holds potential as an effective tool for disease diagnosis.

查看原文本刊更多论文

DS-MVP：通过预训练表征识别错义变异的疾病特异性致病性。

准确预测错义变异的致病性对提高疾病诊断水平和推进临床研究至关重要。然而，现有的计算方法主要集中在一般致病性预测上，忽视了对疾病特异性条件的评估。在这项研究中，我们提出了一种能够预测人类基因组错义变异的疾病特异性致病性的DS-MVP方法。DS-MVP首先利用在大型一般致病性数据集上预先训练的深度学习模型来学习错义变体的丰富表示。然后，它使用XGBoost模型在特定疾病的较小数据集上微调这些表示。我们通过对多个二元致病性数据集和基因水平统计数据进行测试来评估学习表征，证明DS-MVP优于现有的最先进的方法，如MetaRNN和AlphaMissense。此外，DS-MVP在多标签和多类别分类方面表现出色，可以根据疾病状况有效地对疾病特异性致病性错义变异进行分类。它通过对疾病特定数据集的预训练模型进行微调，进一步增强了预测能力。最后，我们分析了预训练模型和各种特征类型的贡献，其中来自大型语言模型的基因描述语料库特征和遗传特征融合的贡献最大。这些结果表明，DS-MVP在预测致病性方面具有更广阔的前景，具有作为疾病诊断有效工具的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Briefings in bioinformatics 生物-生化研究方法

CiteScore

13.20

自引率

13.70%

发文量

549

审稿时长

6 months

期刊介绍： Briefings in Bioinformatics is an international journal serving as a platform for researchers and educators in the life sciences. It also appeals to mathematicians, statisticians, and computer scientists applying their expertise to biological challenges. The journal focuses on reviews tailored for users of databases and analytical tools in contemporary genetics, molecular and systems biology. It stands out by offering practical assistance and guidance to non-specialists in computerized methodologies. Covering a wide range from introductory concepts to specific protocols and analyses, the papers address bacterial, plant, fungal, animal, and human data. The journal's detailed subject areas include genetic studies of phenotypes and genotypes, mapping, DNA sequencing, expression profiling, gene expression studies, microarrays, alignment methods, protein profiles and HMMs, lipids, metabolic and signaling pathways, structure determination and function prediction, phylogenetic studies, and education and training.