Classification and prediction of variants associated with hearing loss using sequence information in the vicinity of mutation sites

IF 2.6 4区生物学 Q2 BIOLOGY

Computational Biology and Chemistry Pub Date : 2024-12-15 DOI:10.1016/j.compbiolchem.2024.108321

Xiao Liu, Li Teng, Jing Sun

{"title":"Classification and prediction of variants associated with hearing loss using sequence information in the vicinity of mutation sites","authors":"Xiao Liu, Li Teng, Jing Sun","doi":"10.1016/j.compbiolchem.2024.108321","DOIUrl":null,"url":null,"abstract":"<div><div>Hearing impairment is a major global health problem, affecting more than 5 % of the world's population at various ages, from neonates to the elderly. Among the common genetic variations in humans, single nucleotide variations and small insertions or deletions predominate. The study of hearing loss resulting from these variations is proving invaluable in the analysis and diagnosis of hearing disorders. The identification of pathogenic mutations is frequently a lengthy and laborious process. Existing computational prediction tools have been developed primarily for common diseases and genome-wide analyses, with less focus on deafness. This study proposes a novel approach that focuses on the regions surrounding mutation sites. Mutation sites associated with deafness and their flanking regions of different lengths were extracted from relevant databases and combined into seven distinct segments of different lengths. The information-theoretic features of these segments were computed. Five machine learning algorithms were then used for training, resulting in the construction of a model capable of classifying and predicting deafness-related mutations. For fragments encompassing the 250 bp regions upstream and downstream of the mutations, the average AUC of the five classifiers on the independent test set is 0.89 and the average ACC is 0.85, indicating that the model has a high recognition rate of the pathogenic deafness mutation site. An ensemble approach was also applied to predict variants of uncertain significance (VUS) that may be associated with deafness. These variants were then scored and ranked to assess their likelihood of contributing to the condition.</div></div>","PeriodicalId":10616,"journal":{"name":"Computational Biology and Chemistry","volume":"115 ","pages":"Article 108321"},"PeriodicalIF":2.6000,"publicationDate":"2024-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Biology and Chemistry","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1476927124003098","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Hearing impairment is a major global health problem, affecting more than 5 % of the world's population at various ages, from neonates to the elderly. Among the common genetic variations in humans, single nucleotide variations and small insertions or deletions predominate. The study of hearing loss resulting from these variations is proving invaluable in the analysis and diagnosis of hearing disorders. The identification of pathogenic mutations is frequently a lengthy and laborious process. Existing computational prediction tools have been developed primarily for common diseases and genome-wide analyses, with less focus on deafness. This study proposes a novel approach that focuses on the regions surrounding mutation sites. Mutation sites associated with deafness and their flanking regions of different lengths were extracted from relevant databases and combined into seven distinct segments of different lengths. The information-theoretic features of these segments were computed. Five machine learning algorithms were then used for training, resulting in the construction of a model capable of classifying and predicting deafness-related mutations. For fragments encompassing the 250 bp regions upstream and downstream of the mutations, the average AUC of the five classifiers on the independent test set is 0.89 and the average ACC is 0.85, indicating that the model has a high recognition rate of the pathogenic deafness mutation site. An ensemble approach was also applied to predict variants of uncertain significance (VUS) that may be associated with deafness. These variants were then scored and ranked to assess their likelihood of contributing to the condition.

查看原文本刊更多论文

利用突变位点附近的序列信息，对与听力损失相关的变异进行分类和预测。

听力障碍是一个重大的全球健康问题，影响到从新生儿到老年人的世界各年龄段人口的5% %以上。在人类常见的遗传变异中，单核苷酸变异和小的插入或缺失占主导地位。对这些变异导致的听力损失的研究在听力障碍的分析和诊断中被证明是无价的。鉴定致病突变往往是一个漫长而费力的过程。现有的计算预测工具主要用于常见疾病和全基因组分析，对耳聋的关注较少。这项研究提出了一种新的方法，重点关注突变位点周围的区域。从相关数据库中提取与耳聋相关的突变位点及其不同长度的侧翼区域，并将其组合成7个不同长度的不同片段。计算了这些片段的信息论特征。然后使用五种机器学习算法进行训练，从而构建了一个能够分类和预测耳聋相关突变的模型。对于突变上下行250 bp区域的片段，独立测试集上5个分类器的平均AUC为0.89，平均ACC为0.85，表明该模型对致病性耳聋突变位点具有较高的识别率。一个集合方法也被应用于预测不确定意义变异（VUS），可能与耳聋有关。然后对这些变异进行评分和排名，以评估它们导致这种情况的可能性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computational Biology and Chemistry 生物-计算机：跨学科应用

CiteScore

6.10

自引率

3.20%

发文量

142

审稿时长

24 days

期刊介绍： Computational Biology and Chemistry publishes original research papers and review articles in all areas of computational life sciences. High quality research contributions with a major computational component in the areas of nucleic acid and protein sequence research, molecular evolution, molecular genetics (functional genomics and proteomics), theory and practice of either biology-specific or chemical-biology-specific modeling, and structural biology of nucleic acids and proteins are particularly welcome. Exceptionally high quality research work in bioinformatics, systems biology, ecology, computational pharmacology, metabolism, biomedical engineering, epidemiology, and statistical genetics will also be considered. Given their inherent uncertainty, protein modeling and molecular docking studies should be thoroughly validated. In the absence of experimental results for validation, the use of molecular dynamics simulations along with detailed free energy calculations, for example, should be used as complementary techniques to support the major conclusions. Submissions of premature modeling exercises without additional biological insights will not be considered. Review articles will generally be commissioned by the editors and should not be submitted to the journal without explicit invitation. However prospective authors are welcome to send a brief (one to three pages) synopsis, which will be evaluated by the editors.