Waikato Environment for Knowledge Analysis (WEKA) as a Data Analysis Method Identifying Potential Hematological Parameters for Early Diagnosis of Cervical Cancer.

IF 1.8 4区 医学 Q3 MEDICINE, RESEARCH & EXPERIMENTAL
In vivo Pub Date : 2025-03-01 DOI:10.21873/invivo.13909
Hung-Ming Chiu, Shih-En Lin, Yen-Wei Chu, Chih-Jung Chen
{"title":"Waikato Environment for Knowledge Analysis (WEKA) as a Data Analysis Method Identifying Potential Hematological Parameters for Early Diagnosis of Cervical Cancer.","authors":"Hung-Ming Chiu, Shih-En Lin, Yen-Wei Chu, Chih-Jung Chen","doi":"10.21873/invivo.13909","DOIUrl":null,"url":null,"abstract":"<p><strong>Background/aim: </strong>The present study explored the use of Waikato Environment for Knowledge Analysis (WEKA) to analyze hematological parameters for distinguishing potential development and progression of cervical cancer. Specifically, we aimed to identify significant biomarkers capable of differentiating atypical squamous cells of undetermined significance (ASC-US) and low-grade squamous intraepithelial lesions (LSIL) from cervical cancer-negative and advanced conditions such as cervical adenocarcinoma.</p><p><strong>Materials and methods: </strong>Hematological and biochemical data were collected from patients and analyzed using data-mining algorithms available in WEKA. The random forest algorithm was employed to identify patterns among key hematological and biochemical biomarkers, alongside one-way analysis of variance to determine significant alterations in these parameters across cancer-negative, ASC-US, LSIL and adenocarcinoma groups.</p><p><strong>Results: </strong>Random forest was the classifier model that demonstrated superior performance metrics with high recall (1.000) and accuracy (0.843), Matthews correlation coefficient (0.510) and area under the curve (0.708), effectively identifying significant patterns within the datasets. One-way analysis of variance indicated significant alterations in red and white blood cell counts, platelet count, hemoglobin, hematocrit and other white blood cell parameters between cancer-negative, ASC-US, LSIL and adenocarcinoma, emphasizing the role of hematological parameters in identifying progression risk.</p><p><strong>Conclusion: </strong>The consistency in conclusions drawn from data mining and statistical analyses highlight the utility of hematological parameters as potential non-invasive biomarkers for cervical cancer screening and progression monitoring. These findings suggest that integrating machine-learning algorithms, particularly random forest, with hematological analysis might enhance early diagnosis and improve clinical outcomes for patients with cervical abnormalities.</p>","PeriodicalId":13364,"journal":{"name":"In vivo","volume":"39 2","pages":"1042-1053"},"PeriodicalIF":1.8000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11884440/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"In vivo","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.21873/invivo.13909","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MEDICINE, RESEARCH & EXPERIMENTAL","Score":null,"Total":0}
引用次数: 0

Abstract

Background/aim: The present study explored the use of Waikato Environment for Knowledge Analysis (WEKA) to analyze hematological parameters for distinguishing potential development and progression of cervical cancer. Specifically, we aimed to identify significant biomarkers capable of differentiating atypical squamous cells of undetermined significance (ASC-US) and low-grade squamous intraepithelial lesions (LSIL) from cervical cancer-negative and advanced conditions such as cervical adenocarcinoma.

Materials and methods: Hematological and biochemical data were collected from patients and analyzed using data-mining algorithms available in WEKA. The random forest algorithm was employed to identify patterns among key hematological and biochemical biomarkers, alongside one-way analysis of variance to determine significant alterations in these parameters across cancer-negative, ASC-US, LSIL and adenocarcinoma groups.

Results: Random forest was the classifier model that demonstrated superior performance metrics with high recall (1.000) and accuracy (0.843), Matthews correlation coefficient (0.510) and area under the curve (0.708), effectively identifying significant patterns within the datasets. One-way analysis of variance indicated significant alterations in red and white blood cell counts, platelet count, hemoglobin, hematocrit and other white blood cell parameters between cancer-negative, ASC-US, LSIL and adenocarcinoma, emphasizing the role of hematological parameters in identifying progression risk.

Conclusion: The consistency in conclusions drawn from data mining and statistical analyses highlight the utility of hematological parameters as potential non-invasive biomarkers for cervical cancer screening and progression monitoring. These findings suggest that integrating machine-learning algorithms, particularly random forest, with hematological analysis might enhance early diagnosis and improve clinical outcomes for patients with cervical abnormalities.

背景/目的:本研究探索使用怀卡托知识分析环境(WEKA)分析血液学参数,以区分宫颈癌的潜在发展和进展。具体来说,我们旨在找出能够将意义未定的非典型鳞状细胞(ASC-US)和低级别鳞状上皮内病变(LSIL)与宫颈癌阴性和宫颈腺癌等晚期病变区分开来的重要生物标志物。采用随机森林算法识别关键血液学和生化生物标志物之间的模式,同时进行单因素方差分析,以确定这些参数在癌症阴性、ASC-US、LSIL 和腺癌组中的显著变化:结果:随机森林分类器模型表现出卓越的性能指标,具有较高的召回率(1.000)和准确率(0.843)、马修斯相关系数(0.510)和曲线下面积(0.708),能有效识别数据集中的重要模式。单因素方差分析表明,在癌症阴性、ASC-US、LSIL 和腺癌之间,红细胞和白细胞计数、血小板计数、血红蛋白、血细胞比容和其他白细胞参数发生了显著变化,这强调了血液学参数在识别进展风险中的作用:从数据挖掘和统计分析中得出的结论具有一致性,这凸显了血液学参数作为潜在的非侵入性生物标记物在宫颈癌筛查和进展监测中的作用。这些研究结果表明,将机器学习算法(尤其是随机森林)与血液学分析相结合,可能会提高宫颈异常患者的早期诊断率并改善临床疗效。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
In vivo
In vivo 医学-医学:研究与实验
CiteScore
4.20
自引率
4.30%
发文量
330
审稿时长
3-8 weeks
期刊介绍: IN VIVO is an international peer-reviewed journal designed to bring together original high quality works and reviews on experimental and clinical biomedical research within the frames of physiology, pathology and disease management. The topics of IN VIVO include: 1. Experimental development and application of new diagnostic and therapeutic procedures; 2. Pharmacological and toxicological evaluation of new drugs, drug combinations and drug delivery systems; 3. Clinical trials; 4. Development and characterization of models of biomedical research; 5. Cancer diagnosis and treatment; 6. Immunotherapy and vaccines; 7. Radiotherapy, Imaging; 8. Tissue engineering, Regenerative medicine; 9. Carcinogenesis.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信