Development of machine learning models for diagnostic biomarker identification and immune cell infiltration analysis in PCOS.

IF 3.8 3区 医学 Q1 REPRODUCTIVE BIOLOGY
Wenxiu Chen, Jianliang Miao, Jingfei Chen, Jianlin Chen
{"title":"Development of machine learning models for diagnostic biomarker identification and immune cell infiltration analysis in PCOS.","authors":"Wenxiu Chen, Jianliang Miao, Jingfei Chen, Jianlin Chen","doi":"10.1186/s13048-024-01583-1","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Polycystic ovary syndrome (PCOS) is a common endocrine disorder affecting women of reproductive age. It is characterized by symptoms such as hyperandrogenemia, oligo or anovulation and polycystic ovarian, significantly impacting quality of life. However, the practical implementation of machine learning (ML) in PCOS diagnosis is hindered by the limitations related to data size and algorithmic models. To address this research gap, we have increased the sample size in our study and aim to utilize two ML algorithms to analyze and validate diagnostic biomarkers, as well as explore immune cell infiltration patterns in PCOS.</p><p><strong>Methods: </strong>We performed RNA-seq analysis on granulosa cell, including 13 samples from normal controls and 25 samples from women with PCOS. The data from our study were combined with publicly available databases. Batch effects were corrected using the 'sva' package in R software. Differential expression analysis was performed to identify genes that exhibited significant differences between the two groups. These differentially expressed genes (DEGs) were further analyzed for Gene Ontology (GO) terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. Hub genes were selected by intersecting the results of both methods after using LASSO and SVM-RFE for central gene selection for DEGs. Receiver Operating Characteristic (ROC) curves were employed to verify the accuracy of models by SVM and XGBoost. CIBERSORT analysis was performed to determine the relative abundances of immune cell populations. GSEA was analyzed to illustrate the expression patterns of genes within highly enriched functional pathways. RT-qPCR was used to validate the reliability of hub genes.</p><p><strong>Results: </strong>824 DEGs were found between the normal control and PCOS groups, including 376 upregulated and 448 downregulated genes. These DEGs were associated with endocytosis, salmonella infection and focal adhesion based on the KEGG enrichment analysis. Through overlapping LASSO and SVM-RFE algorithms, we identified four hub genes (CNTN2, CASR, CACNB3, MFAP2) that are significantly associated with the PCOS group. The diagnostic efficacy validation set using SVM and XGBoost yielded AUC values of 0.795 and 0.875, respectively, indicating their potential as diagnostic biomarkers. Consistent with the data analysis, the upregulation of CNTN2, CASR, CACNB3, and MFAP2 in PCOS was confirmed by RT-qPCR analysis on human granulosa cells. Furthermore, according to CIBERSORT analysis, a significant reduction in CD4 memory resting T cells was revealed in the PCOS group compared to the normal control group (P < 0.05).</p><p><strong>Conclusions: </strong>This study identified CNTN2, CASR, CACNB3, and MFAP2 as potential diagnostic biomarkers for PCOS, which provides strong evidence for existing research on hub genes. Furthermore, the analysis of immune cell infiltration revealed the significant involvement of CD4 memory resting T cells in the onset and progression of PCOS. These findings shed light on potential mechanisms underlying PCOS pathogenesis and provide valuable insights for future research and therapeutic interventions.</p>","PeriodicalId":16610,"journal":{"name":"Journal of Ovarian Research","volume":"18 1","pages":"1"},"PeriodicalIF":3.8000,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11697806/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Ovarian Research","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s13048-024-01583-1","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"REPRODUCTIVE BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Polycystic ovary syndrome (PCOS) is a common endocrine disorder affecting women of reproductive age. It is characterized by symptoms such as hyperandrogenemia, oligo or anovulation and polycystic ovarian, significantly impacting quality of life. However, the practical implementation of machine learning (ML) in PCOS diagnosis is hindered by the limitations related to data size and algorithmic models. To address this research gap, we have increased the sample size in our study and aim to utilize two ML algorithms to analyze and validate diagnostic biomarkers, as well as explore immune cell infiltration patterns in PCOS.

Methods: We performed RNA-seq analysis on granulosa cell, including 13 samples from normal controls and 25 samples from women with PCOS. The data from our study were combined with publicly available databases. Batch effects were corrected using the 'sva' package in R software. Differential expression analysis was performed to identify genes that exhibited significant differences between the two groups. These differentially expressed genes (DEGs) were further analyzed for Gene Ontology (GO) terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. Hub genes were selected by intersecting the results of both methods after using LASSO and SVM-RFE for central gene selection for DEGs. Receiver Operating Characteristic (ROC) curves were employed to verify the accuracy of models by SVM and XGBoost. CIBERSORT analysis was performed to determine the relative abundances of immune cell populations. GSEA was analyzed to illustrate the expression patterns of genes within highly enriched functional pathways. RT-qPCR was used to validate the reliability of hub genes.

Results: 824 DEGs were found between the normal control and PCOS groups, including 376 upregulated and 448 downregulated genes. These DEGs were associated with endocytosis, salmonella infection and focal adhesion based on the KEGG enrichment analysis. Through overlapping LASSO and SVM-RFE algorithms, we identified four hub genes (CNTN2, CASR, CACNB3, MFAP2) that are significantly associated with the PCOS group. The diagnostic efficacy validation set using SVM and XGBoost yielded AUC values of 0.795 and 0.875, respectively, indicating their potential as diagnostic biomarkers. Consistent with the data analysis, the upregulation of CNTN2, CASR, CACNB3, and MFAP2 in PCOS was confirmed by RT-qPCR analysis on human granulosa cells. Furthermore, according to CIBERSORT analysis, a significant reduction in CD4 memory resting T cells was revealed in the PCOS group compared to the normal control group (P < 0.05).

Conclusions: This study identified CNTN2, CASR, CACNB3, and MFAP2 as potential diagnostic biomarkers for PCOS, which provides strong evidence for existing research on hub genes. Furthermore, the analysis of immune cell infiltration revealed the significant involvement of CD4 memory resting T cells in the onset and progression of PCOS. These findings shed light on potential mechanisms underlying PCOS pathogenesis and provide valuable insights for future research and therapeutic interventions.

多囊卵巢综合征诊断生物标志物鉴定和免疫细胞浸润分析的机器学习模型的开发。
背景:多囊卵巢综合征(PCOS)是影响育龄妇女的一种常见内分泌疾病。其特点是高雄激素血症、少排卵或无排卵和多囊卵巢等症状,严重影响生活质量。然而,由于数据大小和算法模型的限制,机器学习(ML)在PCOS诊断中的实际实施受到阻碍。为了解决这一研究空白,我们增加了研究的样本量,并旨在利用两种ML算法来分析和验证诊断性生物标志物,以及探索PCOS的免疫细胞浸润模式。方法:对13例正常对照和25例PCOS患者的颗粒细胞进行RNA-seq分析。我们的研究数据与公开可用的数据库相结合。批处理效果使用R软件中的“sva”包进行修正。进行差异表达分析,以确定两组之间表现出显著差异的基因。这些差异表达基因(DEGs)进一步分析基因本体(GO)术语和京都基因与基因组百科全书(KEGG)途径。利用LASSO和SVM-RFE对DEGs进行中心基因选择,将两种方法的结果交叉筛选中心基因。采用受试者工作特征(ROC)曲线验证SVM和XGBoost模型的准确性。采用CIBERSORT分析确定免疫细胞群的相对丰度。分析GSEA是为了说明基因在高度富集的功能通路中的表达模式。采用RT-qPCR验证枢纽基因的可靠性。结果:正常对照组与PCOS组间共发现824个deg,其中上调基因376个,下调基因448个。根据KEGG富集分析,这些deg与内吞作用、沙门氏菌感染和局灶黏附有关。通过重叠LASSO和SVM-RFE算法,我们确定了与PCOS组显著相关的四个中心基因(CNTN2、CASR、CACNB3、MFAP2)。使用SVM和XGBoost的诊断功效验证集的AUC值分别为0.795和0.875,表明它们具有作为诊断生物标志物的潜力。与数据分析一致,通过对人颗粒细胞的RT-qPCR分析,证实了PCOS中CNTN2、CASR、CACNB3和MFAP2的上调。此外,根据CIBERSORT分析,与正常对照组相比,PCOS组CD4记忆性静止T细胞显著减少(P)。结论:本研究确定了CNTN2、CASR、CACNB3和MFAP2作为PCOS的潜在诊断生物标志物,为现有枢纽基因的研究提供了有力的证据。此外,免疫细胞浸润分析显示CD4记忆性静息T细胞显著参与PCOS的发生和发展。这些发现揭示了多囊卵巢综合征的潜在发病机制,并为未来的研究和治疗干预提供了有价值的见解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Ovarian Research
Journal of Ovarian Research REPRODUCTIVE BIOLOGY-
CiteScore
6.20
自引率
2.50%
发文量
125
审稿时长
>12 weeks
期刊介绍: Journal of Ovarian Research is an open access, peer reviewed, online journal that aims to provide a forum for high-quality basic and clinical research on ovarian function, abnormalities, and cancer. The journal focuses on research that provides new insights into ovarian functions as well as prevention and treatment of diseases afflicting the organ. Topical areas include, but are not restricted to: Ovary development, hormone secretion and regulation Follicle growth and ovulation Infertility and Polycystic ovarian syndrome Regulation of pituitary and other biological functions by ovarian hormones Ovarian cancer, its prevention, diagnosis and treatment Drug development and screening Role of stem cells in ovary development and function.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信