Chen-Yu Wang , Dee Pei , Chun-Kai Wang , Jyun-Cheng Ke , Siou-Ting Lee , Ta-Wei Chu , Yao-Jen Liang
{"title":"Using machine learning to predict patients with polycystic ovary disease in Chinese women","authors":"Chen-Yu Wang , Dee Pei , Chun-Kai Wang , Jyun-Cheng Ke , Siou-Ting Lee , Ta-Wei Chu , Yao-Jen Liang","doi":"10.1016/j.tjog.2024.09.019","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><div>With an estimated global frequency ranging from5 % to 21 %, polycystic ovary syndrome (PCOS) is one of the most prevalent hormonal disorders. There are many factors found to be related to PCOS. However, most of these researches used traditional methods such as multiple logistic regression (LR). Nowadays, machine learning (Mach-L) emerges as a new method and can be used in medical researches. In the present study, there were two goals: 1. Compare the accuracy of five alternative Mach-L techniques with that of conventional LR. 2. Use Mach-L to forecast PCOS and prioritize the risk factors.</div></div><div><h3>Materials and methods</h3><div>Totally, 170 PCOS patients and 950 control participants were included. We collected information on demographics, biochemistry, and lifestyle. PCOS was identified using Rotterdam criteria. Random Forest (RF), stochastic gradient boosting (SGB), multivariate adaptive regression splines (MARS), extreme gradient boosting (XGBoost), and gradient boosting with categorical features support (CatBoost) are five Mach-L algorithms that were used. Models with lower estimation errors were better.</div></div><div><h3>Results</h3><div>By using <em>t</em>-test, we found subjects with PCOS were younger, glutamic oxaloacetic transaminase (GOT), glutamic pyruvic transaminase (GPT), γ-Glutamyl transferase (γ-GT), Triglyceride (TG), and educational levels were higher. All the five Mach-L methods had lower estimation errors compared to LR. The average of the AUC derived from Mach-L was mean AUC of 0.6669, higher than the that of LR (0.5908). Finally, age, TG, GPT, white blood cell count (WBC), uric acid (UA), and platelet (Plt) were the six most important risk factors selected by Mach-L.</div></div><div><h3>Conclusion</h3><div>Mach-L methods overtook conventional LR and age was the most significant factor, followed by TG, GPT, WBC, UA, and Plt in a cohort of Chinese women.</div></div>","PeriodicalId":49449,"journal":{"name":"Taiwanese Journal of Obstetrics & Gynecology","volume":"64 1","pages":"Pages 68-75"},"PeriodicalIF":2.0000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Taiwanese Journal of Obstetrics & Gynecology","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1028455924002791","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"OBSTETRICS & GYNECOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Objective
With an estimated global frequency ranging from5 % to 21 %, polycystic ovary syndrome (PCOS) is one of the most prevalent hormonal disorders. There are many factors found to be related to PCOS. However, most of these researches used traditional methods such as multiple logistic regression (LR). Nowadays, machine learning (Mach-L) emerges as a new method and can be used in medical researches. In the present study, there were two goals: 1. Compare the accuracy of five alternative Mach-L techniques with that of conventional LR. 2. Use Mach-L to forecast PCOS and prioritize the risk factors.
Materials and methods
Totally, 170 PCOS patients and 950 control participants were included. We collected information on demographics, biochemistry, and lifestyle. PCOS was identified using Rotterdam criteria. Random Forest (RF), stochastic gradient boosting (SGB), multivariate adaptive regression splines (MARS), extreme gradient boosting (XGBoost), and gradient boosting with categorical features support (CatBoost) are five Mach-L algorithms that were used. Models with lower estimation errors were better.
Results
By using t-test, we found subjects with PCOS were younger, glutamic oxaloacetic transaminase (GOT), glutamic pyruvic transaminase (GPT), γ-Glutamyl transferase (γ-GT), Triglyceride (TG), and educational levels were higher. All the five Mach-L methods had lower estimation errors compared to LR. The average of the AUC derived from Mach-L was mean AUC of 0.6669, higher than the that of LR (0.5908). Finally, age, TG, GPT, white blood cell count (WBC), uric acid (UA), and platelet (Plt) were the six most important risk factors selected by Mach-L.
Conclusion
Mach-L methods overtook conventional LR and age was the most significant factor, followed by TG, GPT, WBC, UA, and Plt in a cohort of Chinese women.
期刊介绍:
Taiwanese Journal of Obstetrics and Gynecology is a peer-reviewed journal and open access publishing editorials, reviews, original articles, short communications, case reports, research letters, correspondence and letters to the editor in the field of obstetrics and gynecology.
The aims of the journal are to:
1.Publish cutting-edge, innovative and topical research that addresses screening, diagnosis, management and care in women''s health
2.Deliver evidence-based information
3.Promote the sharing of clinical experience
4.Address women-related health promotion
The journal provides comprehensive coverage of topics in obstetrics & gynecology and women''s health including maternal-fetal medicine, reproductive endocrinology/infertility, and gynecologic oncology. Taiwan Association of Obstetrics and Gynecology.