Waikato Environment for Knowledge Analysis (WEKA) as a Data Analysis Method Identifying Potential Hematological Parameters for Early Diagnosis of Cervical Cancer.
{"title":"Waikato Environment for Knowledge Analysis (WEKA) as a Data Analysis Method Identifying Potential Hematological Parameters for Early Diagnosis of Cervical Cancer.","authors":"Hung-Ming Chiu, Shih-En Lin, Yen-Wei Chu, Chih-Jung Chen","doi":"10.21873/invivo.13909","DOIUrl":null,"url":null,"abstract":"<p><strong>Background/aim: </strong>The present study explored the use of Waikato Environment for Knowledge Analysis (WEKA) to analyze hematological parameters for distinguishing potential development and progression of cervical cancer. Specifically, we aimed to identify significant biomarkers capable of differentiating atypical squamous cells of undetermined significance (ASC-US) and low-grade squamous intraepithelial lesions (LSIL) from cervical cancer-negative and advanced conditions such as cervical adenocarcinoma.</p><p><strong>Materials and methods: </strong>Hematological and biochemical data were collected from patients and analyzed using data-mining algorithms available in WEKA. The random forest algorithm was employed to identify patterns among key hematological and biochemical biomarkers, alongside one-way analysis of variance to determine significant alterations in these parameters across cancer-negative, ASC-US, LSIL and adenocarcinoma groups.</p><p><strong>Results: </strong>Random forest was the classifier model that demonstrated superior performance metrics with high recall (1.000) and accuracy (0.843), Matthews correlation coefficient (0.510) and area under the curve (0.708), effectively identifying significant patterns within the datasets. One-way analysis of variance indicated significant alterations in red and white blood cell counts, platelet count, hemoglobin, hematocrit and other white blood cell parameters between cancer-negative, ASC-US, LSIL and adenocarcinoma, emphasizing the role of hematological parameters in identifying progression risk.</p><p><strong>Conclusion: </strong>The consistency in conclusions drawn from data mining and statistical analyses highlight the utility of hematological parameters as potential non-invasive biomarkers for cervical cancer screening and progression monitoring. These findings suggest that integrating machine-learning algorithms, particularly random forest, with hematological analysis might enhance early diagnosis and improve clinical outcomes for patients with cervical abnormalities.</p>","PeriodicalId":13364,"journal":{"name":"In vivo","volume":"39 2","pages":"1042-1053"},"PeriodicalIF":1.8000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11884440/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"In vivo","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.21873/invivo.13909","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MEDICINE, RESEARCH & EXPERIMENTAL","Score":null,"Total":0}
引用次数: 0
Abstract
Background/aim: The present study explored the use of Waikato Environment for Knowledge Analysis (WEKA) to analyze hematological parameters for distinguishing potential development and progression of cervical cancer. Specifically, we aimed to identify significant biomarkers capable of differentiating atypical squamous cells of undetermined significance (ASC-US) and low-grade squamous intraepithelial lesions (LSIL) from cervical cancer-negative and advanced conditions such as cervical adenocarcinoma.
Materials and methods: Hematological and biochemical data were collected from patients and analyzed using data-mining algorithms available in WEKA. The random forest algorithm was employed to identify patterns among key hematological and biochemical biomarkers, alongside one-way analysis of variance to determine significant alterations in these parameters across cancer-negative, ASC-US, LSIL and adenocarcinoma groups.
Results: Random forest was the classifier model that demonstrated superior performance metrics with high recall (1.000) and accuracy (0.843), Matthews correlation coefficient (0.510) and area under the curve (0.708), effectively identifying significant patterns within the datasets. One-way analysis of variance indicated significant alterations in red and white blood cell counts, platelet count, hemoglobin, hematocrit and other white blood cell parameters between cancer-negative, ASC-US, LSIL and adenocarcinoma, emphasizing the role of hematological parameters in identifying progression risk.
Conclusion: The consistency in conclusions drawn from data mining and statistical analyses highlight the utility of hematological parameters as potential non-invasive biomarkers for cervical cancer screening and progression monitoring. These findings suggest that integrating machine-learning algorithms, particularly random forest, with hematological analysis might enhance early diagnosis and improve clinical outcomes for patients with cervical abnormalities.
期刊介绍:
IN VIVO is an international peer-reviewed journal designed to bring together original high quality works and reviews on experimental and clinical biomedical research within the frames of physiology, pathology and disease management.
The topics of IN VIVO include: 1. Experimental development and application of new diagnostic and therapeutic procedures; 2. Pharmacological and toxicological evaluation of new drugs, drug combinations and drug delivery systems; 3. Clinical trials; 4. Development and characterization of models of biomedical research; 5. Cancer diagnosis and treatment; 6. Immunotherapy and vaccines; 7. Radiotherapy, Imaging; 8. Tissue engineering, Regenerative medicine; 9. Carcinogenesis.