Screening of Important Markers in Peripheral Blood Mononuclear Cells to Predict Female Osteoporosis Risk Using LASSO Regression Algorithm and SVM Method.

Evolutionary Bioinformatics Online Pub Date : 2022-01-28 eCollection Date: 2022-01-01 DOI:10.1177/11769343221075014
Hongwei Tang, Qingtian Han, Yong Yin
{"title":"Screening of Important Markers in Peripheral Blood Mononuclear Cells to Predict Female Osteoporosis Risk Using LASSO Regression Algorithm and SVM Method.","authors":"Hongwei Tang,&nbsp;Qingtian Han,&nbsp;Yong Yin","doi":"10.1177/11769343221075014","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Osteoporosis is a bone disease that increases the patient's risk of fracture. We aimed to identify robust marker genes related to osteoporosis based on different bioinformatic methods and multiple datasets.</p><p><strong>Methods: </strong>Three datasets from Gene Expression Omnibus (GEO) were utilized for analysis separately. Significantly differentially expressed genes (DEGs) from comparing high hip and low hip low bone mineral density (BMD) groups in the first dataset were identified for Gene Ontology (GO), Gene set enrichment analysis (GSEA) and Kyoto encyclopedia of genes and genomes (KEGG) to investigate the discrepantly enriched biological processes between high hip and low hip group. Last absolute shrinkage and selection operator (LASSO), SVM model and protein-protein interaction (PPI) regulatory network were performed and generated robust marker genes for downstream TF-target and miRNA-target prediction.</p><p><strong>Results: </strong>Several DEGs between high hip BMD group and low hip BMD group were obtained. And the metabolism-related pathways such as metabolic pathways, carbon metabolism, glyoxylate and dicarboxylate metabolism shown enrichment in these DEGs. Integration with LASSO regression analysis, 8 differential expression genes (<i>SH3BP1, NARF, ANKRD34B, RNF40, ZNF473, AKT1, SHMT1</i>, and <i>VASH1</i>) in GSE62402 were identified as the optimal differential genes combination. Moreover, the SVM validation analysis in GSE56814 and GSE56815 datasets showed that the characteristic gene combinations presented high diagnostic effects, and the model AUC areas for GSE56814 was 0.899 and for GSE56815 was 0.921. Furthermore, the subcellular localization analysis of the 8 genes revealed that 4 proteins were located in the cytoplasm, 3 proteins were located in the nucleus, and 1 protein was located in the mitochondria. Additionally, the related TFs and miRNAs by performing TF-target and miRNA-target prediction for 5 genes (<i>AKT1, SHMT1, ZNF473, RNF40</i> and <i>VASH1</i>) were investigated from PPI network.</p><p><strong>Conclusion: </strong>The optimal differential genes combination (<i>SH3BP1, NARF, ANKRD34B, RNF40, ZNF473, AKT1, SHMT1</i>, and <i>VASH1</i>) presented high diagnostic effect for osteoporosis risk.</p>","PeriodicalId":136690,"journal":{"name":"Evolutionary Bioinformatics Online","volume":" ","pages":"11769343221075014"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/42/f5/10.1177_11769343221075014.PMC8801634.pdf","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Evolutionary Bioinformatics Online","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1177/11769343221075014","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2022/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Background: Osteoporosis is a bone disease that increases the patient's risk of fracture. We aimed to identify robust marker genes related to osteoporosis based on different bioinformatic methods and multiple datasets.

Methods: Three datasets from Gene Expression Omnibus (GEO) were utilized for analysis separately. Significantly differentially expressed genes (DEGs) from comparing high hip and low hip low bone mineral density (BMD) groups in the first dataset were identified for Gene Ontology (GO), Gene set enrichment analysis (GSEA) and Kyoto encyclopedia of genes and genomes (KEGG) to investigate the discrepantly enriched biological processes between high hip and low hip group. Last absolute shrinkage and selection operator (LASSO), SVM model and protein-protein interaction (PPI) regulatory network were performed and generated robust marker genes for downstream TF-target and miRNA-target prediction.

Results: Several DEGs between high hip BMD group and low hip BMD group were obtained. And the metabolism-related pathways such as metabolic pathways, carbon metabolism, glyoxylate and dicarboxylate metabolism shown enrichment in these DEGs. Integration with LASSO regression analysis, 8 differential expression genes (SH3BP1, NARF, ANKRD34B, RNF40, ZNF473, AKT1, SHMT1, and VASH1) in GSE62402 were identified as the optimal differential genes combination. Moreover, the SVM validation analysis in GSE56814 and GSE56815 datasets showed that the characteristic gene combinations presented high diagnostic effects, and the model AUC areas for GSE56814 was 0.899 and for GSE56815 was 0.921. Furthermore, the subcellular localization analysis of the 8 genes revealed that 4 proteins were located in the cytoplasm, 3 proteins were located in the nucleus, and 1 protein was located in the mitochondria. Additionally, the related TFs and miRNAs by performing TF-target and miRNA-target prediction for 5 genes (AKT1, SHMT1, ZNF473, RNF40 and VASH1) were investigated from PPI network.

Conclusion: The optimal differential genes combination (SH3BP1, NARF, ANKRD34B, RNF40, ZNF473, AKT1, SHMT1, and VASH1) presented high diagnostic effect for osteoporosis risk.

Abstract Image

Abstract Image

Abstract Image

利用LASSO回归算法和SVM方法筛选外周血单核细胞重要标志物预测女性骨质疏松风险。
背景:骨质疏松症是一种增加患者骨折风险的骨骼疾病。我们的目的是基于不同的生物信息学方法和多个数据集,确定与骨质疏松症相关的健壮标记基因。方法:利用GEO (Gene Expression Omnibus)的3个数据集分别进行分析。在第一个数据集中,通过基因本体(GO)、基因集富集分析(GSEA)和京都基因与基因组百科全书(KEGG),鉴定高臀和低臀低骨密度(BMD)组比较中显著差异表达的基因(DEGs),以研究高臀和低臀组之间差异富集的生物过程。最后进行绝对收缩和选择算子(LASSO)、支持向量机(SVM)模型和蛋白-蛋白相互作用(PPI)调控网络,生成了用于下游tf靶点和mirna靶点预测的鲁棒标记基因。结果:髋部骨密度高组与髋部骨密度低组有一定的差异。代谢途径、碳代谢、乙醛酸盐和二羧酸盐代谢等代谢相关途径在这些deg中表现出富集。结合LASSO回归分析,鉴定出GSE62402中8个差异表达基因(SH3BP1、NARF、ANKRD34B、RNF40、ZNF473、AKT1、SHMT1和VASH1)为最优差异基因组合。此外,对GSE56814和GSE56815数据集的SVM验证分析表明,特征基因组合具有较高的诊断效果,GSE56814和GSE56815的模型AUC面积分别为0.899和0.921。对8个基因进行亚细胞定位分析,发现4个蛋白位于细胞质中,3个蛋白位于细胞核中,1个蛋白位于线粒体中。此外,通过对5个基因(AKT1、SHMT1、ZNF473、RNF40和VASH1)进行TF-target和miRNA-target预测,从PPI网络中研究相关的tf和mirna。结论:最佳差异基因组合(SH3BP1、NARF、ANKRD34B、RNF40、ZNF473、AKT1、SHMT1、VASH1)对骨质疏松症风险具有较高的诊断效果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信