{"title":"Classification and prediction of variants associated with hearing loss using sequence information in the vicinity of mutation sites.","authors":"Xiao Liu, Li Teng, Jing Sun","doi":"10.1016/j.compbiolchem.2024.108321","DOIUrl":null,"url":null,"abstract":"<p><p>Hearing impairment is a major global health problem, affecting more than 5 % of the world's population at various ages, from neonates to the elderly. Among the common genetic variations in humans, single nucleotide variations and small insertions or deletions predominate. The study of hearing loss resulting from these variations is proving invaluable in the analysis and diagnosis of hearing disorders. The identification of pathogenic mutations is frequently a lengthy and laborious process. Existing computational prediction tools have been developed primarily for common diseases and genome-wide analyses, with less focus on deafness. This study proposes a novel approach that focuses on the regions surrounding mutation sites. Mutation sites associated with deafness and their flanking regions of different lengths were extracted from relevant databases and combined into seven distinct segments of different lengths. The information-theoretic features of these segments were computed. Five machine learning algorithms were then used for training, resulting in the construction of a model capable of classifying and predicting deafness-related mutations. For fragments encompassing the 250 bp regions upstream and downstream of the mutations, the average AUC of the five classifiers on the independent test set is 0.89 and the average ACC is 0.85, indicating that the model has a high recognition rate of the pathogenic deafness mutation site. An ensemble approach was also applied to predict variants of uncertain significance (VUS) that may be associated with deafness. These variants were then scored and ranked to assess their likelihood of contributing to the condition.</p>","PeriodicalId":93952,"journal":{"name":"Computational biology and chemistry","volume":"115 ","pages":"108321"},"PeriodicalIF":0.0000,"publicationDate":"2024-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational biology and chemistry","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1016/j.compbiolchem.2024.108321","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Hearing impairment is a major global health problem, affecting more than 5 % of the world's population at various ages, from neonates to the elderly. Among the common genetic variations in humans, single nucleotide variations and small insertions or deletions predominate. The study of hearing loss resulting from these variations is proving invaluable in the analysis and diagnosis of hearing disorders. The identification of pathogenic mutations is frequently a lengthy and laborious process. Existing computational prediction tools have been developed primarily for common diseases and genome-wide analyses, with less focus on deafness. This study proposes a novel approach that focuses on the regions surrounding mutation sites. Mutation sites associated with deafness and their flanking regions of different lengths were extracted from relevant databases and combined into seven distinct segments of different lengths. The information-theoretic features of these segments were computed. Five machine learning algorithms were then used for training, resulting in the construction of a model capable of classifying and predicting deafness-related mutations. For fragments encompassing the 250 bp regions upstream and downstream of the mutations, the average AUC of the five classifiers on the independent test set is 0.89 and the average ACC is 0.85, indicating that the model has a high recognition rate of the pathogenic deafness mutation site. An ensemble approach was also applied to predict variants of uncertain significance (VUS) that may be associated with deafness. These variants were then scored and ranked to assess their likelihood of contributing to the condition.