Human Genetics最新文献

筛选
英文 中文
An AI-based approach driven by genotypes and phenotypes to uplift the diagnostic yield of genetic diseases. 以基因型和表型为驱动力的人工智能方法,提高遗传疾病的诊断率。
IF 3.8 2区 生物学
Human Genetics Pub Date : 2025-03-01 Epub Date: 2024-03-23 DOI: 10.1007/s00439-023-02638-x
S Zucca, G Nicora, F De Paoli, M G Carta, R Bellazzi, P Magni, E Rizzo, I Limongelli
{"title":"An AI-based approach driven by genotypes and phenotypes to uplift the diagnostic yield of genetic diseases.","authors":"S Zucca, G Nicora, F De Paoli, M G Carta, R Bellazzi, P Magni, E Rizzo, I Limongelli","doi":"10.1007/s00439-023-02638-x","DOIUrl":"10.1007/s00439-023-02638-x","url":null,"abstract":"<p><p>Identifying disease-causing variants in Rare Disease patients' genome is a challenging problem. To accomplish this task, we describe a machine learning framework, that we called \"Suggested Diagnosis\", whose aim is to prioritize genetic variants in an exome/genome based on the probability of being disease-causing. To do so, our method leverages standard guidelines for germline variant interpretation as defined by the American College of Human Genomics (ACMG) and the Association for Molecular Pathology (AMP), inheritance information, phenotypic similarity, and variant quality. Starting from (1) the VCF file containing proband's variants, (2) the list of proband's phenotypes encoded in Human Phenotype Ontology terms, and optionally (3) the information about family members (if available), the \"Suggested Diagnosis\" ranks all the variants according to their machine learning prediction. This method significantly reduces the number of variants that need to be evaluated by geneticists by pinpointing causative variants in the very first positions of the prioritized list. Most importantly, our approach proved to be among the top performers within the CAGI6 Rare Genome Project Challenge, where it was able to rank the true causative variant among the first positions and, uniquely among all the challenge participants, increased the diagnostic yield of 12.5% by solving 2 undiagnosed cases.</p>","PeriodicalId":13175,"journal":{"name":"Human Genetics","volume":" ","pages":"159-171"},"PeriodicalIF":3.8,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140193639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Critical assessment of missense variant effect predictors on disease-relevant variant data. 错义变异效应预测因子对疾病相关变异数据的关键评估。
IF 3.8 2区 生物学
Human Genetics Pub Date : 2025-03-01 Epub Date: 2025-03-21 DOI: 10.1007/s00439-025-02732-2
Ruchir Rastogi, Ryan Chung, Sindy Li, Chang Li, Kyoungyeul Lee, Junwoo Woo, Dong-Wook Kim, Changwon Keum, Giulia Babbi, Pier Luigi Martelli, Castrense Savojardo, Rita Casadio, Kirsley Chennen, Thomas Weber, Olivier Poch, François Ancien, Gabriel Cia, Fabrizio Pucci, Daniele Raimondi, Wim Vranken, Marianne Rooman, Céline Marquet, Tobias Olenyi, Burkhard Rost, Gaia Andreoletti, Akash Kamandula, Yisu Peng, Constantina Bakolitsa, Matthew Mort, David N Cooper, Timothy Bergquist, Vikas Pejaver, Xiaoming Liu, Predrag Radivojac, Steven E Brenner, Nilah M Ioannidis
{"title":"Critical assessment of missense variant effect predictors on disease-relevant variant data.","authors":"Ruchir Rastogi, Ryan Chung, Sindy Li, Chang Li, Kyoungyeul Lee, Junwoo Woo, Dong-Wook Kim, Changwon Keum, Giulia Babbi, Pier Luigi Martelli, Castrense Savojardo, Rita Casadio, Kirsley Chennen, Thomas Weber, Olivier Poch, François Ancien, Gabriel Cia, Fabrizio Pucci, Daniele Raimondi, Wim Vranken, Marianne Rooman, Céline Marquet, Tobias Olenyi, Burkhard Rost, Gaia Andreoletti, Akash Kamandula, Yisu Peng, Constantina Bakolitsa, Matthew Mort, David N Cooper, Timothy Bergquist, Vikas Pejaver, Xiaoming Liu, Predrag Radivojac, Steven E Brenner, Nilah M Ioannidis","doi":"10.1007/s00439-025-02732-2","DOIUrl":"10.1007/s00439-025-02732-2","url":null,"abstract":"<p><p>Regular, systematic, and independent assessments of computational tools that are used to predict the pathogenicity of missense variants are necessary to evaluate their clinical and research utility and guide future improvements. The Critical Assessment of Genome Interpretation (CAGI) conducts the ongoing Annotate-All-Missense (Missense Marathon) challenge, in which missense variant effect predictors (also called variant impact predictors) are evaluated on missense variants added to disease-relevant databases following the prediction submission deadline. Here we assess predictors submitted to the CAGI 6 Annotate-All-Missense challenge, predictors commonly used in clinical genetics, and recently developed deep learning methods. We examine performance across a range of settings relevant for clinical and research applications, focusing on different subsets of the evaluation data as well as high-specificity and high-sensitivity regimes. Our evaluations reveal notable advances in current methods relative to older, well-cited tools in the field. While meta-predictors tend to outperform their constituent individual predictors, several newer individual predictors perform comparably to commonly used meta-predictors. Predictor performance varies between high-specificity and high-sensitivity regimes, highlighting that different methods may be optimal for different use cases. We also characterize two potential sources of bias. Predictors that incorporate allele frequency as a predictive feature tend to have reduced performance when distinguishing pathogenic variants from very rare benign variants, and predictors trained on pathogenicity labels from curated variant databases often inherit gene-level label imbalances. Our findings help illuminate the clinical and research utility of modern missense variant effect predictors and identify potential areas for future development.</p>","PeriodicalId":13175,"journal":{"name":"Human Genetics","volume":" ","pages":"281-293"},"PeriodicalIF":3.8,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143669683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Impact of calmodulin missense variants associated with congenital arrhythmia on the thermal stability and the degree of unfolding. 与先天性心律失常有关的钙调素错义变体对热稳定性和展开程度的影响
IF 3.8 2区 生物学
Human Genetics Pub Date : 2025-03-01 Epub Date: 2023-12-28 DOI: 10.1007/s00439-023-02629-y
Giuditta Dal Cortivo, Valerio Marino, Davide Zamboni, Daniele Dell'Orco
{"title":"Impact of calmodulin missense variants associated with congenital arrhythmia on the thermal stability and the degree of unfolding.","authors":"Giuditta Dal Cortivo, Valerio Marino, Davide Zamboni, Daniele Dell'Orco","doi":"10.1007/s00439-023-02629-y","DOIUrl":"10.1007/s00439-023-02629-y","url":null,"abstract":"<p><p>Thermal denaturation profiles of proteins that bind several ligands may deviate from the single transition, making their thermodynamic description challenging. We report an empirical method that estimates melting temperatures (T<sub>m</sub>) from multi-transition thermal denaturation profiles of 16 variants of calmodulin (CaM) associated with congenital arrhythmia. Differences in T<sub>m</sub> estimated by empirical fitting correlate (for apo CaM variants) with those obtained by thermodynamic models. Most CaM variants were more stable than the wild type (WT) in the absence of Ca<sup>2+</sup>, but less stable in the presence of Ca<sup>2+</sup>, and displayed either WT-like or higher unfolding percentages in their apo-form, as evaluated by circular dichroism spectroscopy.</p>","PeriodicalId":13175,"journal":{"name":"Human Genetics","volume":" ","pages":"337-341"},"PeriodicalIF":3.8,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139048655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Genetic variants and phenotypic data curated for the CAGI6 intellectual disability panel challenge. 遗传变异和表型数据为CAGI6智力残疾小组挑战策划。
IF 3.8 2区 生物学
Human Genetics Pub Date : 2025-03-01 Epub Date: 2025-02-28 DOI: 10.1007/s00439-025-02733-1
Maria Cristina Aspromonte, Alessio Del Conte, Roberta Polli, Demetrio Baldo, Francesco Benedicenti, Elisa Bettella, Stefania Bigoni, Stefania Boni, Claudia Ciaccio, Stefano D'Arrigo, Ilaria Donati, Elisa Granocchio, Isabella Mammi, Donatella Milani, Susanna Negrin, Margherita Nosadini, Fiorenza Soli, Franco Stanzial, Licia Turolla, Damiano Piovesan, Silvio C E Tosatto, Alessandra Murgia, Emanuela Leonardi
{"title":"Genetic variants and phenotypic data curated for the CAGI6 intellectual disability panel challenge.","authors":"Maria Cristina Aspromonte, Alessio Del Conte, Roberta Polli, Demetrio Baldo, Francesco Benedicenti, Elisa Bettella, Stefania Bigoni, Stefania Boni, Claudia Ciaccio, Stefano D'Arrigo, Ilaria Donati, Elisa Granocchio, Isabella Mammi, Donatella Milani, Susanna Negrin, Margherita Nosadini, Fiorenza Soli, Franco Stanzial, Licia Turolla, Damiano Piovesan, Silvio C E Tosatto, Alessandra Murgia, Emanuela Leonardi","doi":"10.1007/s00439-025-02733-1","DOIUrl":"10.1007/s00439-025-02733-1","url":null,"abstract":"<p><p>Neurodevelopmental disorders (NDDs) are common conditions including clinically diverse and genetically heterogeneous diseases, such as intellectual disability, autism spectrum disorders, and epilepsy. The intricate genetic underpinnings of NDDs pose a formidable challenge, given their multifaceted genetic architecture and heterogeneous clinical presentations. This work delves into the intricate interplay between genetic variants and phenotypic manifestations in neurodevelopmental disorders, presenting a dataset curated for the Critical Assessment of Genome Interpretation (CAGI6) ID Panel Challenge. The CAGI6 competition serves as a platform for evaluating the efficacy of computational methods in predicting phenotypic outcomes from genetic data. In this study, a targeted gene panel sequencing has been used to investigate the genetic causes of NDDs in a cohort of 415 paediatric patients. We identified 60 pathogenic and 49 likely pathogenic variants in 102 individuals that accounted for 25% of NDD cases in the cohort. The most mutated genes were ANKRD11, MECP2, ARID1B, ASH1L, CHD8, KDM5C, MED12 and PTCHD1 The majority of pathogenic variants were de novo, with some inherited from mildly affected parents. Loss-of-function variants were the most common type of pathogenic variant. In silico analysis tools were used to assess the potential impact of variants on splicing and structural/functional effects of missense variants. The study highlights the challenges in variant interpretation especially in cases with atypical phenotypic manifestations. Overall, this study provides valuable insights into the genetic causes of NDDs and emphasises the importance of understanding the underlying genetic factors for accurate diagnosis, and intervention development in neurodevelopmental conditions.</p>","PeriodicalId":13175,"journal":{"name":"Human Genetics","volume":" ","pages":"309-326"},"PeriodicalIF":3.8,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143523342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring the effects of missense mutations on protein thermodynamics through structure-based approaches: findings from the CAGI6 challenges. 通过基于结构的方法探索错义突变对蛋白质热力学的影响:来自 CAGI6 挑战的发现。
IF 3.8 2区 生物学
Human Genetics Pub Date : 2025-03-01 Epub Date: 2024-01-16 DOI: 10.1007/s00439-023-02623-4
Carlos H M Rodrigues, Stephanie Portelli, David B Ascher
{"title":"Exploring the effects of missense mutations on protein thermodynamics through structure-based approaches: findings from the CAGI6 challenges.","authors":"Carlos H M Rodrigues, Stephanie Portelli, David B Ascher","doi":"10.1007/s00439-023-02623-4","DOIUrl":"10.1007/s00439-023-02623-4","url":null,"abstract":"<p><p>Missense mutations are known contributors to diverse genetic disorders, due to their subtle, single amino acid changes imparted on the resultant protein. Because of this, understanding the impact of these mutations on protein stability and function is crucial for unravelling disease mechanisms and developing targeted therapies. The Critical Assessment of Genome Interpretation (CAGI) provides a valuable platform for benchmarking state-of-the-art computational methods in predicting the impact of disease-related mutations on protein thermodynamics. Here we report the performance of our comprehensive platform of structure-based computational approaches to evaluate mutations impacting protein structure and function on 3 challenges from CAGI6: Calmodulin, MAPK1 and MAPK3. Our stability predictors have achieved correlations of up to 0.74 and AUCs of 1 when predicting changes in ΔΔG for MAPK1 and MAPK3, respectively, and AUC of up to 0.75 in the Calmodulin challenge. Overall, our study highlights the importance of structure-based approaches in understanding the effects of missense mutations on protein thermodynamics. The results obtained from the CAGI6 challenges contribute to the ongoing efforts to enhance our understanding of disease mechanisms and facilitate the development of personalised medicine approaches.</p>","PeriodicalId":13175,"journal":{"name":"Human Genetics","volume":" ","pages":"327-335"},"PeriodicalIF":3.8,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139472312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Structure-informed protein language models are robust predictors for variant effects. 结构信息蛋白质语言模型是变异效应的稳健预测器。
IF 3.8 2区 生物学
Human Genetics Pub Date : 2025-03-01 Epub Date: 2024-08-08 DOI: 10.1007/s00439-024-02695-w
Yuanfei Sun, Yang Shen
{"title":"Structure-informed protein language models are robust predictors for variant effects.","authors":"Yuanfei Sun, Yang Shen","doi":"10.1007/s00439-024-02695-w","DOIUrl":"10.1007/s00439-024-02695-w","url":null,"abstract":"<p><p>Emerging variant effect predictors, protein language models (pLMs) learn evolutionary distribution of functional sequences to capture fitness landscape. Considering that variant effects are manifested through biological contexts beyond sequence (such as structure), we first assess how much structure context is learned in sequence-only pLMs and affecting variant effect prediction. And we establish a need to inject into pLMs protein structural context purposely and controllably. We thus introduce a framework of structure-informed pLMs (SI-pLMs), by extending masked sequence denoising to cross-modality denoising for both sequence and structure. Numerical results over deep mutagenesis scanning benchmarks show that our SI-pLMs, even when using smaller models and less data, are robustly top performers against competing methods including other pLMs, which shows that introducing biological context can be more effective at capturing fitness landscape than simply using larger models or bigger data. Case studies reveal that, compared to sequence-only pLMs, SI-pLMs can be better at capturing fitness landscape because (a) learned embeddings of low/high-fitness sequences can be more separable and (b) learned amino-acid distributions of functionally and evolutionarily conserved residues can be of much lower entropy, thus much more conserved, than other residues. Our SI-pLMs are applicable to revising any sequence-only pLMs through model architecture and training objectives. They do not require structure data as model inputs for variant effect prediction and only use structures as context provider and model regularizer during training.</p>","PeriodicalId":13175,"journal":{"name":"Human Genetics","volume":" ","pages":"209-225"},"PeriodicalIF":3.8,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141906463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluation of enzyme activity predictions for variants of unknown significance in Arylsulfatase A. 对芳基磺化酶A未知变异的酶活性预测评价。
IF 3.8 2区 生物学
Human Genetics Pub Date : 2025-03-01 Epub Date: 2025-03-08 DOI: 10.1007/s00439-025-02731-3
Shantanu Jain, Marena Trinidad, Thanh Binh Nguyen, Kaiya Jones, Santiago Diaz Neto, Fang Ge, Ailin Glagovsky, Cameron Jones, Giankaleb Moran, Boqi Wang, Kobra Rahimi, Sümeyra Zeynep Çalıcı, Luis R Cedillo, Silvia Berardelli, Buse Özden, Ken Chen, Panagiotis Katsonis, Amanda Williams, Olivier Lichtarge, Sadhna Rana, Swatantra Pradhan, Rajgopal Srinivasan, Rakshanda Sajeed, Dinesh Joshi, Eshel Faraggi, Robert Jernigan, Andrzej Kloczkowski, Jierui Xu, Zigang Song, Selen Özkan, Natàlia Padilla, Xavier de la Cruz, Rocio Acuna-Hidalgo, Andrea Grafmüller, Laura T Jiménez Barrón, Matteo Manfredi, Castrense Savojardo, Giulia Babbi, Pier Luigi Martelli, Rita Casadio, Yuanfei Sun, Shaowen Zhu, Yang Shen, Fabrizio Pucci, Marianne Rooman, Gabriel Cia, Daniele Raimondi, Pauline Hermans, Sofia Kwee, Ella Chen, Courtney Astore, Akash Kamandula, Vikas Pejaver, Rashika Ramola, Michelle Velyunskiy, Daniel Zeiberg, Reet Mishra, Teague Sterling, Jennifer L Goldstein, Jose Lugo-Martinez, Sufyan Kazi, Sindy Li, Kinsey Long, Steven E Brenner, Constantina Bakolitsa, Predrag Radivojac, Dean Suhr, Teryn Suhr, Wyatt T Clark
{"title":"Evaluation of enzyme activity predictions for variants of unknown significance in Arylsulfatase A.","authors":"Shantanu Jain, Marena Trinidad, Thanh Binh Nguyen, Kaiya Jones, Santiago Diaz Neto, Fang Ge, Ailin Glagovsky, Cameron Jones, Giankaleb Moran, Boqi Wang, Kobra Rahimi, Sümeyra Zeynep Çalıcı, Luis R Cedillo, Silvia Berardelli, Buse Özden, Ken Chen, Panagiotis Katsonis, Amanda Williams, Olivier Lichtarge, Sadhna Rana, Swatantra Pradhan, Rajgopal Srinivasan, Rakshanda Sajeed, Dinesh Joshi, Eshel Faraggi, Robert Jernigan, Andrzej Kloczkowski, Jierui Xu, Zigang Song, Selen Özkan, Natàlia Padilla, Xavier de la Cruz, Rocio Acuna-Hidalgo, Andrea Grafmüller, Laura T Jiménez Barrón, Matteo Manfredi, Castrense Savojardo, Giulia Babbi, Pier Luigi Martelli, Rita Casadio, Yuanfei Sun, Shaowen Zhu, Yang Shen, Fabrizio Pucci, Marianne Rooman, Gabriel Cia, Daniele Raimondi, Pauline Hermans, Sofia Kwee, Ella Chen, Courtney Astore, Akash Kamandula, Vikas Pejaver, Rashika Ramola, Michelle Velyunskiy, Daniel Zeiberg, Reet Mishra, Teague Sterling, Jennifer L Goldstein, Jose Lugo-Martinez, Sufyan Kazi, Sindy Li, Kinsey Long, Steven E Brenner, Constantina Bakolitsa, Predrag Radivojac, Dean Suhr, Teryn Suhr, Wyatt T Clark","doi":"10.1007/s00439-025-02731-3","DOIUrl":"10.1007/s00439-025-02731-3","url":null,"abstract":"<p><p>Continued advances in variant effect prediction are necessary to demonstrate the ability of machine learning methods to accurately determine the clinical impact of variants of unknown significance (VUS). Towards this goal, the ARSA Critical Assessment of Genome Interpretation (CAGI) challenge was designed to characterize progress by utilizing 219 experimentally assayed missense VUS in the Arylsulfatase A (ARSA) gene to assess the performance of community-submitted predictions of variant functional effects. The challenge involved 15 teams, and evaluated additional predictions from established and recently released models. Notably, a model developed by participants of a genetics and coding bootcamp, trained with standard machine-learning tools in Python, demonstrated superior performance among submissions. Furthermore, the study observed that state-of-the-art deep learning methods provided small but statistically significant improvement in predictive performance compared to less elaborate techniques. These findings underscore the utility of variant effect prediction, and the potential for models trained with modest resources to accurately classify VUS in genetic and clinical research.</p>","PeriodicalId":13175,"journal":{"name":"Human Genetics","volume":" ","pages":"295-308"},"PeriodicalIF":3.8,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143585545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An augmented transformer model trained on protein family specific variant data leads to improved prediction of variants of uncertain significance. 基于蛋白质家族特异性变异数据训练的增强变压器模型可以提高对不确定意义变异的预测能力。
IF 3.8 2区 生物学
Human Genetics Pub Date : 2025-03-01 Epub Date: 2025-01-27 DOI: 10.1007/s00439-025-02727-z
Dinesh Joshi, Swatantra Pradhan, Rakshanda Sajeed, Rajgopal Srinivasan, Sadhna Rana
{"title":"An augmented transformer model trained on protein family specific variant data leads to improved prediction of variants of uncertain significance.","authors":"Dinesh Joshi, Swatantra Pradhan, Rakshanda Sajeed, Rajgopal Srinivasan, Sadhna Rana","doi":"10.1007/s00439-025-02727-z","DOIUrl":"10.1007/s00439-025-02727-z","url":null,"abstract":"<p><p>Variants of uncertain significance (VUS) represent variants that lack sufficient evidence to be confidently associated with a disease, thus posing a challenge in the interpretation of genetic testing results. Here we report an improved method for predicting the VUS of Arylsulfatase A (ARSA) gene as part of the Critical Assessment of Genome Interpretation challenge (CAGI6). Our method uses a transfer learning approach that leverages a pre-trained protein language model to predict the impact of mutations on the activity of the ARSA enzyme, whose deficiency is known to cause a rare genetic disorder, metachromatic leukodystrophy. Our innovative framework combines zero-shot log odds scores and embeddings from the ESM, an evolutionary scale model as features for training a supervised model on gene variants functionally related to the ARSA gene. The zero-shot log odds score feature captures the generic properties of the proteins learned due to its pre-training on millions of sequences in the UniProt data, while the ESM embeddings for the proteins in the ARSA family capture features specific to the family. We also tested our approach on another enzyme, N-acetyl-glucosaminidase (NAGLU), that belongs to the same superfamily as ARSA. Our results demonstrate that the performance of our family models (augmented ESM models) is either comparable or better than the ESM models. The ARSA model compares favorably with the majority of state-of-the-art predictors on area under precision and recall curve (AUPRC) performance metric. However, the NAGLU model outperforms all pathogenicity predictors evaluated in this study on AUPRC metric. The improved AUPRC has relevance in a diagnostic setting where variant prioritization generally entails identifying a small number of pathogenic variants from a larger number of benign variants. Our results also indicate that genes that have sparse or no experimental variant impact data, the family variant data can serve as a proxy training data for making accurate predictions. Attention analysis of active sites and binding sites in ARSA and NAGLU proteins shed light on probable mechanisms of pathogenicity for positions that are highly attended.</p>","PeriodicalId":13175,"journal":{"name":"Human Genetics","volume":" ","pages":"143-158"},"PeriodicalIF":3.8,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143046678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assessing the predicted impact of single amino acid substitutions in calmodulin for CAGI6 challenges. 评估单氨基酸取代钙调素对cag6挑战的预测影响。
IF 3.8 2区 生物学
Human Genetics Pub Date : 2025-03-01 Epub Date: 2024-12-23 DOI: 10.1007/s00439-024-02720-y
Paola Turina, Giuditta Dal Cortivo, Carlos A Enriquez Sandoval, Emil Alexov, David B Ascher, Giulia Babbi, Constantina Bakolitsa, Rita Casadio, Piero Fariselli, Lukas Folkman, Akash Kamandula, Panagiotis Katsonis, Dong Li, Olivier Lichtarge, Pier Luigi Martelli, Shailesh Kumar Panday, Douglas E V Pires, Stephanie Portelli, Fabrizio Pucci, Carlos H M Rodrigues, Marianne Rooman, Castrense Savojardo, Martin Schwersensky, Yang Shen, Alexey V Strokach, Yuanfei Sun, Junwoo Woo, Predrag Radivojac, Steven E Brenner, Daniele Dell'Orco, Emidio Capriotti
{"title":"Assessing the predicted impact of single amino acid substitutions in calmodulin for CAGI6 challenges.","authors":"Paola Turina, Giuditta Dal Cortivo, Carlos A Enriquez Sandoval, Emil Alexov, David B Ascher, Giulia Babbi, Constantina Bakolitsa, Rita Casadio, Piero Fariselli, Lukas Folkman, Akash Kamandula, Panagiotis Katsonis, Dong Li, Olivier Lichtarge, Pier Luigi Martelli, Shailesh Kumar Panday, Douglas E V Pires, Stephanie Portelli, Fabrizio Pucci, Carlos H M Rodrigues, Marianne Rooman, Castrense Savojardo, Martin Schwersensky, Yang Shen, Alexey V Strokach, Yuanfei Sun, Junwoo Woo, Predrag Radivojac, Steven E Brenner, Daniele Dell'Orco, Emidio Capriotti","doi":"10.1007/s00439-024-02720-y","DOIUrl":"10.1007/s00439-024-02720-y","url":null,"abstract":"<p><p>Recent thermodynamic and functional studies have been conducted to evaluate the impact of amino acid substitutions on Calmodulin (CaM). The Critical Assessment of Genome Interpretation (CAGI) data provider at University of Verona (Italy) measured the melting temperature (T<sub>m</sub>) and the percentage of unfolding (%unfold) of a set of CaM variants (CaM challenge dataset). Thermodynamic measurements for the equilibrium unfolding of CaM were obtained by monitoring far-UV Circular Dichroism as a function of temperature. These measurements were used to determine the T<sub>m</sub> and the percentage of protein remaining unfolded at the highest temperature. The CaM challenge dataset, comprising a total of 15 single amino acid substitutions, was used to evaluate the effectiveness of computational methods in predicting the T<sub>m</sub> and unfolding percentages associated with the variants, and categorizing them as destabilizing or not. For the sixth edition of CAGI, nine independent research groups from four continents (Asia, Australia, Europe, and North America) submitted over 52 sets of predictions, derived from various approaches. In this manuscript, we summarize the results of our assessment to highlight the potential limitations of current algorithms and provide insights into the future development of more accurate prediction tools. By evaluating the thermodynamic stability of CaM variants, this study aims to enhance our understanding of the relationship between amino acid substitutions and protein stability, ultimately contributing to more accurate predictions of the effects of genetic variants.</p>","PeriodicalId":13175,"journal":{"name":"Human Genetics","volume":" ","pages":"113-125"},"PeriodicalIF":3.8,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11975486/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142876860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating predictors of kinase activity of STK11 variants identified in primary human non-small cell lung cancers. 评估原发性人类非小细胞肺癌中STK11变异激酶活性的预测因子。
IF 3.8 2区 生物学
Human Genetics Pub Date : 2025-03-01 Epub Date: 2025-02-12 DOI: 10.1007/s00439-025-02726-0
Yile Chen, Kyoungyeul Lee, Junwoo Woo, Dong-Wook Kim, Changwon Keum, Giulia Babbi, Rita Casadio, Pier Luigi Martelli, Castrense Savojardo, Matteo Manfredi, Yang Shen, Yuanfei Sun, Panagiotis Katsonis, Olivier Lichtarge, Vikas Pejaver, David J Seward, Akash Kamandula, Constantina Bakolitsa, Steven E Brenner, Predrag Radivojac, Anne O'Donnell-Luria, Sean D Mooney, Shantanu Jain
{"title":"Evaluating predictors of kinase activity of STK11 variants identified in primary human non-small cell lung cancers.","authors":"Yile Chen, Kyoungyeul Lee, Junwoo Woo, Dong-Wook Kim, Changwon Keum, Giulia Babbi, Rita Casadio, Pier Luigi Martelli, Castrense Savojardo, Matteo Manfredi, Yang Shen, Yuanfei Sun, Panagiotis Katsonis, Olivier Lichtarge, Vikas Pejaver, David J Seward, Akash Kamandula, Constantina Bakolitsa, Steven E Brenner, Predrag Radivojac, Anne O'Donnell-Luria, Sean D Mooney, Shantanu Jain","doi":"10.1007/s00439-025-02726-0","DOIUrl":"10.1007/s00439-025-02726-0","url":null,"abstract":"<p><p>Critical evaluation of computational tools for predicting variant effects is important considering their increased use in disease diagnosis and driving molecular discoveries. In the sixth edition of the Critical Assessment of Genome Interpretation (CAGI) challenge, a dataset of 28 STK11 rare variants (27 missense, 1 single amino acid deletion), identified in primary non-small cell lung cancer biopsies, was experimentally assayed to characterize computational methods from four participating teams and five publicly available tools. Predictors demonstrated a high level of performance on key evaluation metrics, measuring correlation with the assay outputs and separating loss-of-function (LoF) variants from wildtype-like (WT-like) variants. The best participant model, 3Cnet, performed competitively with well-known tools. Unique to this challenge was that the functional data was generated with both biological and technical replicates, thus allowing the assessors to realistically establish maximum predictive performance based on experimental variability. Three out of the five publicly available tools and 3Cnet approached the performance of the assay replicates in separating LoF variants from WT-like variants. Surprisingly, REVEL, an often-used model, achieved a comparable correlation with the real-valued assay output as that seen for the experimental replicates. Performing variant interpretation by combining the new functional evidence with computational and population data evidence led to 16 new variants receiving a clinically actionable classification of likely pathogenic (LP) or likely benign (LB). Overall, the STK11 challenge highlights the utility of variant effect predictors in biomedical sciences and provides encouraging results for driving research in the field of computational genome interpretation.</p>","PeriodicalId":13175,"journal":{"name":"Human Genetics","volume":" ","pages":"127-142"},"PeriodicalIF":3.8,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143399119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信