{"title":"DNA语言模型预测罕见编码变异致病性的能力评估。","authors":"David Curtis","doi":"10.1038/s10038-025-01385-3","DOIUrl":null,"url":null,"abstract":"<p><p>A recently described method to predict pathogenicity of DNA variants uses a DNA language model and can be applied to both coding and non-coding variants. For coding variants the performance of this method, termed GPN-MSA (genomic pretrained network with multiple-sequence alignment), was reported to be superior to CADD. We compare the performance of this method against 45 other predictors applied to rare coding variants in 18 gene-phenotype pairs. We find that while GPN-MSA produces stronger evidence for association than CADD it is not the best-performing method for any gene and on average other prediction methods are superior. While GPN-MSA may be useful for predicting the pathogenicity of non-coding variants, it would seem sensible for clinicians and researchers to utilise other methods when dealing with coding variants.This research has been conducted using the UK Biobank Resource.</p>","PeriodicalId":16077,"journal":{"name":"Journal of Human Genetics","volume":" ","pages":""},"PeriodicalIF":2.5000,"publicationDate":"2025-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Assessment of ability of a DNA language model to predict pathogenicity of rare coding variants.\",\"authors\":\"David Curtis\",\"doi\":\"10.1038/s10038-025-01385-3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>A recently described method to predict pathogenicity of DNA variants uses a DNA language model and can be applied to both coding and non-coding variants. For coding variants the performance of this method, termed GPN-MSA (genomic pretrained network with multiple-sequence alignment), was reported to be superior to CADD. We compare the performance of this method against 45 other predictors applied to rare coding variants in 18 gene-phenotype pairs. We find that while GPN-MSA produces stronger evidence for association than CADD it is not the best-performing method for any gene and on average other prediction methods are superior. While GPN-MSA may be useful for predicting the pathogenicity of non-coding variants, it would seem sensible for clinicians and researchers to utilise other methods when dealing with coding variants.This research has been conducted using the UK Biobank Resource.</p>\",\"PeriodicalId\":16077,\"journal\":{\"name\":\"Journal of Human Genetics\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":2.5000,\"publicationDate\":\"2025-08-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Human Genetics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1038/s10038-025-01385-3\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"GENETICS & HEREDITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Human Genetics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1038/s10038-025-01385-3","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
Assessment of ability of a DNA language model to predict pathogenicity of rare coding variants.
A recently described method to predict pathogenicity of DNA variants uses a DNA language model and can be applied to both coding and non-coding variants. For coding variants the performance of this method, termed GPN-MSA (genomic pretrained network with multiple-sequence alignment), was reported to be superior to CADD. We compare the performance of this method against 45 other predictors applied to rare coding variants in 18 gene-phenotype pairs. We find that while GPN-MSA produces stronger evidence for association than CADD it is not the best-performing method for any gene and on average other prediction methods are superior. While GPN-MSA may be useful for predicting the pathogenicity of non-coding variants, it would seem sensible for clinicians and researchers to utilise other methods when dealing with coding variants.This research has been conducted using the UK Biobank Resource.
期刊介绍:
The Journal of Human Genetics is an international journal publishing articles on human genetics, including medical genetics and human genome analysis. It covers all aspects of human genetics, including molecular genetics, clinical genetics, behavioral genetics, immunogenetics, pharmacogenomics, population genetics, functional genomics, epigenetics, genetic counseling and gene therapy.
Articles on the following areas are especially welcome: genetic factors of monogenic and complex disorders, genome-wide association studies, genetic epidemiology, cancer genetics, personal genomics, genotype-phenotype relationships and genome diversity.