{"title":"Rank Matrix Approach for Endometriosis: Integrating Data and Constructing Diagnostic Models","authors":"Ranze Xie, Deqing Hong, Jiaqi Yuan, Peng Xu, Wenbin Liu, Zheng Ye","doi":"10.2174/0115748936296151240605053713","DOIUrl":null,"url":null,"abstract":"Background: Endometriosis is a debilitating gynecological disorder characterized by chronic pain, infertility, and the growth of endometrial tissue outside the uterus. Accurate and early detection of this condition is crucial for effective management and treatment. Methods: We developed a gene rank matrix-based model to integrate endometriosis cohorts across multiple platforms. After removing batch effects, we identified 83 genes associated with endometriosis and further refined a diagnostic model using 11 of these genes. The model was trained on two platforms and validated on two others using SVM, Random Forest, Logistic Regression, and gradient-boosting machine learning algorithms. Results: The integration via the gene rank matrix effectively mitigated batch effects. Utilizing a gradient boosting classifier with a subset of 11 genes, the model demonstrated commendable diagnostic efficacy, achieving an Area Under the Curve (AUC) of 0.77, an accuracy of 0.72, and an F1 score of 0.72 for the training dataset. When subjected to validation, the model maintained its performance, yielding an AUC of 0.769, an accuracy of 0.719, and an F1 score of 0.732. These 11 genes were found to be associated with immunosuppression. Conclusion: Our approach to integrating gene rank matrices effectively consolidates endometriosis data across diverse platforms. The diagnostic model, harnessing the predictive power of 11 specific genes, surpasses alternative models, thereby offering promising prospects for aiding clinical diagnosis of endometriosis. Further validation is imperative to elucidate the functional significance of these 11 genes. Our study underscores the potential of data integration coupled with machine learning techniques in advancing the diagnosis of intricate diseases, such as endometriosis.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"22 1","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.2174/0115748936296151240605053713","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Endometriosis is a debilitating gynecological disorder characterized by chronic pain, infertility, and the growth of endometrial tissue outside the uterus. Accurate and early detection of this condition is crucial for effective management and treatment. Methods: We developed a gene rank matrix-based model to integrate endometriosis cohorts across multiple platforms. After removing batch effects, we identified 83 genes associated with endometriosis and further refined a diagnostic model using 11 of these genes. The model was trained on two platforms and validated on two others using SVM, Random Forest, Logistic Regression, and gradient-boosting machine learning algorithms. Results: The integration via the gene rank matrix effectively mitigated batch effects. Utilizing a gradient boosting classifier with a subset of 11 genes, the model demonstrated commendable diagnostic efficacy, achieving an Area Under the Curve (AUC) of 0.77, an accuracy of 0.72, and an F1 score of 0.72 for the training dataset. When subjected to validation, the model maintained its performance, yielding an AUC of 0.769, an accuracy of 0.719, and an F1 score of 0.732. These 11 genes were found to be associated with immunosuppression. Conclusion: Our approach to integrating gene rank matrices effectively consolidates endometriosis data across diverse platforms. The diagnostic model, harnessing the predictive power of 11 specific genes, surpasses alternative models, thereby offering promising prospects for aiding clinical diagnosis of endometriosis. Further validation is imperative to elucidate the functional significance of these 11 genes. Our study underscores the potential of data integration coupled with machine learning techniques in advancing the diagnosis of intricate diseases, such as endometriosis.
期刊介绍:
Current Bioinformatics aims to publish all the latest and outstanding developments in bioinformatics. Each issue contains a series of timely, in-depth/mini-reviews, research papers and guest edited thematic issues written by leaders in the field, covering a wide range of the integration of biology with computer and information science.
The journal focuses on advances in computational molecular/structural biology, encompassing areas such as computing in biomedicine and genomics, computational proteomics and systems biology, and metabolic pathway engineering. Developments in these fields have direct implications on key issues related to health care, medicine, genetic disorders, development of agricultural products, renewable energy, environmental protection, etc.