Prediction of Deleterious Single Amino Acid Polymorphisms with a Consensus Holdout Sampler

IF 1.4 4区生物学 Q4 BIOCHEMISTRY & MOLECULAR BIOLOGY

Current Genomics Pub Date : 2024-03-18 DOI:10.2174/0113892029236347240308054538

Óscar Álvarez-Machancoses, Eshel Faraggi, Enrique deAndrés-Galiana, Juan Fernández-Martínez, Andrzej Kloczkowski

{"title":"Prediction of Deleterious Single Amino Acid Polymorphisms with a Consensus Holdout Sampler","authors":"Óscar Álvarez-Machancoses, Eshel Faraggi, Enrique deAndrés-Galiana, Juan Fernández-Martínez, Andrzej Kloczkowski","doi":"10.2174/0113892029236347240308054538","DOIUrl":null,"url":null,"abstract":"Background: Single Amino Acid Polymorphisms (SAPs) or nonsynonymous Single Nucleotide Variants (nsSNVs) are the most common genetic variations. They result from missense mutations where a single base pair substitution changes the genetic code in such a way that the triplet of bases (codon) at a given position is coding a different amino acid. Since genetic mutations sometimes cause genetic diseases, it is important to comprehend and foresee which variations are harmful and which ones are neutral (not causing changes in the phenotype). This can be posed as a classification problem. Methods: Computational methods using machine intelligence are gradually replacing repetitive and exceedingly overpriced mutagenic tests. By and large, uneven quality, deficiencies, and irregularities of nsSNVs datasets debase the convenience of artificial intelligence-based methods. Subsequently, strong and more exact approaches are needed to address these problems. In the present work paper, we show a consensus classifier built on the holdout sampler, which appears strong and precise and outflanks all other popular methods. objective: Roughly a half of known disease-related mutations are due to non-synonymous variants [8-9], expressed as amino-acid mutations. Therefore, it is important to unravel the links between nonsynonymous Single Nucleotide Variants and associated diseases to discriminate between pathogenic and neutral substitutions. It has been found that these substitutions could be directly related to pathological effects such as Parkinson’s or Alzheimer’s diseases, or to the involvement in complex diseases, such as cancer development. Results: We produced 100 holdouts to test the structures and diverse classification variables of diverse classifiers during the training phase. The finest performing holdouts were chosen to develop a consensus classifier and tested using a k-fold (1 ≤ k ≤5) cross-validation method. We also examined which protein properties have the biggest impact on the precise prediction of the effects of nsSNVs. Conclusion: Our Consensus Holdout Sampler outflanks other popular algorithms, and gives excellent results, highly accurate with low standard deviation. The advantage of our method emerges from using a tree of holdouts, where diverse LM/AI-based programs are sampled in diverse ways.","PeriodicalId":10803,"journal":{"name":"Current Genomics","volume":"27 1","pages":""},"PeriodicalIF":1.4000,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current Genomics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.2174/0113892029236347240308054538","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Single Amino Acid Polymorphisms (SAPs) or nonsynonymous Single Nucleotide Variants (nsSNVs) are the most common genetic variations. They result from missense mutations where a single base pair substitution changes the genetic code in such a way that the triplet of bases (codon) at a given position is coding a different amino acid. Since genetic mutations sometimes cause genetic diseases, it is important to comprehend and foresee which variations are harmful and which ones are neutral (not causing changes in the phenotype). This can be posed as a classification problem. Methods: Computational methods using machine intelligence are gradually replacing repetitive and exceedingly overpriced mutagenic tests. By and large, uneven quality, deficiencies, and irregularities of nsSNVs datasets debase the convenience of artificial intelligence-based methods. Subsequently, strong and more exact approaches are needed to address these problems. In the present work paper, we show a consensus classifier built on the holdout sampler, which appears strong and precise and outflanks all other popular methods. objective: Roughly a half of known disease-related mutations are due to non-synonymous variants [8-9], expressed as amino-acid mutations. Therefore, it is important to unravel the links between nonsynonymous Single Nucleotide Variants and associated diseases to discriminate between pathogenic and neutral substitutions. It has been found that these substitutions could be directly related to pathological effects such as Parkinson’s or Alzheimer’s diseases, or to the involvement in complex diseases, such as cancer development. Results: We produced 100 holdouts to test the structures and diverse classification variables of diverse classifiers during the training phase. The finest performing holdouts were chosen to develop a consensus classifier and tested using a k-fold (1 ≤ k ≤5) cross-validation method. We also examined which protein properties have the biggest impact on the precise prediction of the effects of nsSNVs. Conclusion: Our Consensus Holdout Sampler outflanks other popular algorithms, and gives excellent results, highly accurate with low standard deviation. The advantage of our method emerges from using a tree of holdouts, where diverse LM/AI-based programs are sampled in diverse ways.

查看原文本刊更多论文

用共识保持取样器预测有害的单氨基酸多态性

背景：单氨基酸多态性（SAP）或非同义单核苷酸变异（nsSNV）是最常见的基因变异。它们是由错义突变引起的，在错义突变中，单碱基对置换改变了遗传密码，使特定位置的三联碱基（密码子）编码不同的氨基酸。由于基因突变有时会导致遗传疾病，因此理解和预见哪些变异是有害的，哪些变异是中性的（不会导致表型变化）就显得尤为重要。这可以看作是一个分类问题。方法：使用机器智能的计算方法正逐渐取代重复性的、价格过高的诱变试验。总的来说，nsSNVs 数据集的质量参差不齐、缺陷和不规则性削弱了基于人工智能的方法的便利性。因此，需要更强大、更精确的方法来解决这些问题。在本论文中，我们展示了一种建立在holdout采样器基础上的共识分类器，它显得强大而精确，超越了所有其他流行方法：已知的与疾病相关的突变中大约有一半是由非同义变异引起的[8-9]，表现为氨基酸突变。因此，揭示非同义单核苷酸变异与相关疾病之间的联系，以区分致病性和中性变异非常重要。研究发现，这些变异可能与帕金森病或阿尔茨海默病等病理效应直接相关，也可能与癌症发展等复杂疾病有关。结果在训练阶段，我们制作了 100 个暂存模型来测试不同分类器的结构和各种分类变量。我们选取了表现最出色的候选者来开发共识分类器，并使用 k 倍（1 ≤ k ≤5）交叉验证法进行测试。我们还研究了哪些蛋白质特性对精确预测 nsSNV 的影响影响最大。结论我们的 "共识保持采样器 "超越了其他流行的算法，取得了出色的结果，准确度高，标准偏差小。我们的方法的优势来自于使用一棵保留树，在这棵树上，基于 LM/AI 的不同程序以不同的方式进行采样。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Current Genomics 生物-生化与分子生物学

CiteScore

5.20

自引率

0.00%

发文量

审稿时长

>0 weeks

期刊介绍： Current Genomics is a peer-reviewed journal that provides essential reading about the latest and most important developments in genome science and related fields of research. Systems biology, systems modeling, machine learning, network inference, bioinformatics, computational biology, epigenetics, single cell genomics, extracellular vesicles, quantitative biology, and synthetic biology for the study of evolution, development, maintenance, aging and that of human health, human diseases, clinical genomics and precision medicine are topics of particular interest. The journal covers plant genomics. The journal will not consider articles dealing with breeding and livestock. Current Genomics publishes three types of articles including: i) Research papers from internationally-recognized experts reporting on new and original data generated at the genome scale level. Position papers dealing with new or challenging methodological approaches, whether experimental or mathematical, are greatly welcome in this section. ii) Authoritative and comprehensive full-length or mini reviews from widely recognized experts, covering the latest developments in genome science and related fields of research such as systems biology, statistics and machine learning, quantitative biology, and precision medicine. Proposals for mini-hot topics (2-3 review papers) and full hot topics (6-8 review papers) guest edited by internationally-recognized experts are welcome in this section. Hot topic proposals should not contain original data and they should contain articles originating from at least 2 different countries. iii) Opinion papers from internationally recognized experts addressing contemporary questions and issues in the field of genome science and systems biology and basic and clinical research practices.