Dinucleotide composition representation -based deep learning to predict scoliosis-associated Fibrillin-1 genotypes.

IF 2.8 3区 生物学 Q2 GENETICS & HEREDITY
Frontiers in Genetics Pub Date : 2024-10-22 eCollection Date: 2024-01-01 DOI:10.3389/fgene.2024.1492226
Sen Zhang, Li-Na Dai, Qi Yin, Xiao-Ping Kang, Dan-Dan Zeng, Tao Jiang, Guang-Yu Zhao, Xiao-He Li, Jing Li
{"title":"Dinucleotide composition representation -based deep learning to predict scoliosis-associated Fibrillin-1 genotypes.","authors":"Sen Zhang, Li-Na Dai, Qi Yin, Xiao-Ping Kang, Dan-Dan Zeng, Tao Jiang, Guang-Yu Zhao, Xiao-He Li, Jing Li","doi":"10.3389/fgene.2024.1492226","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Scoliosis is a pathological spine structure deformation, predominantly classified as \"idiopathic\" due to its unknown etiology. However, it has been suggested that scoliosis may be linked to polygenic backgrounds. It is crucial to identify potential Adolescent Idiopathic Scoliosis (AIS)-related genetic backgrounds before scoliosis onset.</p><p><strong>Methods: </strong>The present study was designed to intelligently parse, decompose and predict AIS-related variants in ClinVar database. Possible AIS-related variant records downloaded from ClinVar were parsed for various labels, decomposed for Dinucleotide Compositional Representation (DCR) and other traits, screened for high-risk genes with statistical analysis, and then learned intelligently with deep learning to predict high-risk AIS genotypes.</p><p><strong>Results: </strong>Results demonstrated that the present framework is composed of all technical sections of data parsing, scoliosis genotyping, genome encoding, machine learning (ML)/deep learning (DL) and scoliosis genotype predicting. 58,000 scoliosis-related records were automatically parsed and statistically analyzed for high-risk genes and genotypes, such as <i>FBN1</i>, <i>LAMA2</i> and <i>SPG11</i>. All variant genes were decomposed for DCR and other traits. Unsupervised ML indicated marked inter-group separation and intra-group clustering of the DCR of <i>FBN1</i>, <i>LAMA2</i> or <i>SPG11</i> for the five types of variants (Pathogenic, Pathogeniclikely, Benign, Benignlikely and Uncertain). A FBN1 DCR-based Convolutional Neural Network (CNN) was trained for Pathogenic and Benign/ Benignlikely variants performed accurately on validation data and predicted 179 high-risk scoliosis variants. The trained predictor was interpretable for the similar distribution of variant types and variant locations within 2D structure units in the predicted 3D structure of <i>FBN1</i>.</p><p><strong>Discussion: </strong>In summary, scoliosis risk is predictable by deep learning based on genomic decomposed features of DCR. DCR-based classifier has predicted more scoliosis risk <i>FBN1</i> variants in ClinVar database. DCR-based models would be promising for genotype-to-phenotype prediction for more disease types.</p>","PeriodicalId":12750,"journal":{"name":"Frontiers in Genetics","volume":null,"pages":null},"PeriodicalIF":2.8000,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11534654/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Genetics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.3389/fgene.2024.1492226","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

Abstract

Introduction: Scoliosis is a pathological spine structure deformation, predominantly classified as "idiopathic" due to its unknown etiology. However, it has been suggested that scoliosis may be linked to polygenic backgrounds. It is crucial to identify potential Adolescent Idiopathic Scoliosis (AIS)-related genetic backgrounds before scoliosis onset.

Methods: The present study was designed to intelligently parse, decompose and predict AIS-related variants in ClinVar database. Possible AIS-related variant records downloaded from ClinVar were parsed for various labels, decomposed for Dinucleotide Compositional Representation (DCR) and other traits, screened for high-risk genes with statistical analysis, and then learned intelligently with deep learning to predict high-risk AIS genotypes.

Results: Results demonstrated that the present framework is composed of all technical sections of data parsing, scoliosis genotyping, genome encoding, machine learning (ML)/deep learning (DL) and scoliosis genotype predicting. 58,000 scoliosis-related records were automatically parsed and statistically analyzed for high-risk genes and genotypes, such as FBN1, LAMA2 and SPG11. All variant genes were decomposed for DCR and other traits. Unsupervised ML indicated marked inter-group separation and intra-group clustering of the DCR of FBN1, LAMA2 or SPG11 for the five types of variants (Pathogenic, Pathogeniclikely, Benign, Benignlikely and Uncertain). A FBN1 DCR-based Convolutional Neural Network (CNN) was trained for Pathogenic and Benign/ Benignlikely variants performed accurately on validation data and predicted 179 high-risk scoliosis variants. The trained predictor was interpretable for the similar distribution of variant types and variant locations within 2D structure units in the predicted 3D structure of FBN1.

Discussion: In summary, scoliosis risk is predictable by deep learning based on genomic decomposed features of DCR. DCR-based classifier has predicted more scoliosis risk FBN1 variants in ClinVar database. DCR-based models would be promising for genotype-to-phenotype prediction for more disease types.

基于深度学习的二核苷酸组成表示法预测脊柱侧弯相关的纤连蛋白-1基因型。
简介脊柱侧弯是一种病理性脊柱结构变形,由于病因不明,主要被归类为 "特发性"。然而,有人认为脊柱侧弯可能与多基因背景有关。因此,在脊柱侧弯症发病前确定潜在的青少年特发性脊柱侧弯症(AIS)相关遗传背景至关重要:本研究旨在智能解析、分解和预测 ClinVar 数据库中与 AIS 相关的变异。从ClinVar下载的可能的AIS相关变异记录被解析为各种标签,分解为二核苷酸组成表征(DCR)和其他性状,通过统计分析筛选出高风险基因,然后通过深度学习智能预测高风险AIS基因型:结果表明,本框架由数据解析、脊柱侧弯基因分型、基因组编码、机器学习(ML)/深度学习(DL)和脊柱侧弯基因型预测等所有技术环节组成。对 58,000 条脊柱侧弯相关记录进行了自动解析,并对高风险基因和基因型(如 FBN1、LAMA2 和 SPG11)进行了统计分析。所有变异基因都被分解为 DCR 和其他性状。无监督 ML 表明,FBN1、LAMA2 或 SPG11 的 DCR 在五种变异类型(致病、可能致病、良性、可能良性和不确定)中具有明显的组间分离和组内聚类。针对致病变异和良性/良性可能变异训练出的基于 FBN1 DCR 的卷积神经网络(CNN)在验证数据上表现准确,预测出了 179 个高风险脊柱侧凸变异。在预测的 FBN1 三维结构中,变异类型和变异位置在二维结构单元中的分布相似,因此训练出的预测结果是可解释的:总之,基于 DCR 的基因组分解特征的深度学习可以预测脊柱侧弯的风险。基于 DCR 的分类器在 ClinVar 数据库中预测出了更多有脊柱侧弯风险的 FBN1 变异。基于 DCR 的模型有望用于更多疾病类型的基因型到表型预测。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Frontiers in Genetics
Frontiers in Genetics Biochemistry, Genetics and Molecular Biology-Molecular Medicine
CiteScore
5.50
自引率
8.10%
发文量
3491
审稿时长
14 weeks
期刊介绍: Frontiers in Genetics publishes rigorously peer-reviewed research on genes and genomes relating to all the domains of life, from humans to plants to livestock and other model organisms. Led by an outstanding Editorial Board of the world’s leading experts, this multidisciplinary, open-access journal is at the forefront of communicating cutting-edge research to researchers, academics, clinicians, policy makers and the public. The study of inheritance and the impact of the genome on various biological processes is well documented. However, the majority of discoveries are still to come. A new era is seeing major developments in the function and variability of the genome, the use of genetic and genomic tools and the analysis of the genetic basis of various biological phenomena.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信