Handling Imbalance and Limited Data in Thyroid Ultrasound and Diabetic Retinopathy Datasets Using Discrete Levy Flights Grey Wolf Optimizer Based Random Forest for Robust Medical Data Classification

IF 1.8 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Shobha Aswal, Neelu Jyothi Ahuja, Ritika Mehra
{"title":"Handling Imbalance and Limited Data in Thyroid Ultrasound and Diabetic Retinopathy Datasets Using Discrete Levy Flights Grey Wolf Optimizer Based Random Forest for Robust Medical Data Classification","authors":"Shobha Aswal, Neelu Jyothi Ahuja, Ritika Mehra","doi":"10.1145/3648363","DOIUrl":null,"url":null,"abstract":"<p>In the field of disease diagnosis, medical image classification faces an inherent challenge due to various factors involving data imbalance, image quality variability, annotation variability, and limited data availability and data representativeness. Such challenges affect the algorithm's classification ability on the medical images in an adverse way, which leads to biased model outcomes and inaccurate interpretations. In this paper, a novel Discrete Levy Flight Grey Wolf Optimizer (DLFGWO) is combined with the Random Forest (RF) classifier to address the above limitations on the biomedical datasets and to achieve better classification rate. The DLFGWO-RF resolves the image quality variability in ultrasound images and limits the inaccuracies on classification using RF by handling the incomplete and noisy data. The sheer focus on the majority class may lead to unequal distribution of classes and thus leads to data imbalance. The DLFGWO balances such distribution by leveraging grey wolves and its exploration and exploitation capabilities are improved using Discrete Levy Flight (DLF). It further optimizes the classifier's performance to achieve balanced classification rate. DLFGWO-RF is designed to perform classification even on limited datasets, thereby the requirement of numerous expert annotations can thus be reduced. In diabetic retinopathy grading, the DLFGWO-RF reduces disagreements in annotation variability using subjective interpretations. However, the representativeness of the diabetic retinopathy dataset fails to capture the entire population diversity, which limits the generalization ability of the proposed DLFGWO-RF. Thus, fine-tuning of RF can robustly adapt to the subgroups in the dataset, enhancing its overall performance. The experiments are conducted on two widely used medical image datasets to test the efficacy of the model. The experimental results show that the DLFGWO-RF classifier achieves improved classification accuracy between 90-95%, which outperforms the existing techniques for various imbalanced datasets.</p>","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"176 1","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2024-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Asian and Low-Resource Language Information Processing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3648363","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

In the field of disease diagnosis, medical image classification faces an inherent challenge due to various factors involving data imbalance, image quality variability, annotation variability, and limited data availability and data representativeness. Such challenges affect the algorithm's classification ability on the medical images in an adverse way, which leads to biased model outcomes and inaccurate interpretations. In this paper, a novel Discrete Levy Flight Grey Wolf Optimizer (DLFGWO) is combined with the Random Forest (RF) classifier to address the above limitations on the biomedical datasets and to achieve better classification rate. The DLFGWO-RF resolves the image quality variability in ultrasound images and limits the inaccuracies on classification using RF by handling the incomplete and noisy data. The sheer focus on the majority class may lead to unequal distribution of classes and thus leads to data imbalance. The DLFGWO balances such distribution by leveraging grey wolves and its exploration and exploitation capabilities are improved using Discrete Levy Flight (DLF). It further optimizes the classifier's performance to achieve balanced classification rate. DLFGWO-RF is designed to perform classification even on limited datasets, thereby the requirement of numerous expert annotations can thus be reduced. In diabetic retinopathy grading, the DLFGWO-RF reduces disagreements in annotation variability using subjective interpretations. However, the representativeness of the diabetic retinopathy dataset fails to capture the entire population diversity, which limits the generalization ability of the proposed DLFGWO-RF. Thus, fine-tuning of RF can robustly adapt to the subgroups in the dataset, enhancing its overall performance. The experiments are conducted on two widely used medical image datasets to test the efficacy of the model. The experimental results show that the DLFGWO-RF classifier achieves improved classification accuracy between 90-95%, which outperforms the existing techniques for various imbalanced datasets.

使用基于灰狼优化器的离散利维飞行随机森林处理甲状腺超声和糖尿病视网膜病变数据集中的不平衡和有限数据,实现可靠的医疗数据分类
在疾病诊断领域,由于数据不平衡、图像质量变化、注释变化以及数据可用性和数据代表性有限等各种因素,医学图像分类面临着固有的挑战。这些挑战会对算法对医学图像的分类能力产生不利影响,从而导致模型结果有偏差和解释不准确。本文将新颖的离散李维灰狼优化器(DLFGWO)与随机森林(RF)分类器相结合,以解决生物医学数据集的上述局限性,并获得更好的分类率。DLFGWO-RF 解决了超声图像中的图像质量变异问题,并通过处理不完整和有噪声的数据限制了 RF 分类的不准确性。只关注大多数类别可能会导致类别分布不均,从而导致数据失衡。DLFGWO 通过利用灰狼来平衡这种分布,并利用离散列维飞行(DLF)提高了探索和利用能力。它进一步优化了分类器的性能,以实现均衡的分类率。DLFGWO-RF 即使在有限的数据集上也能进行分类,因此可以减少对大量专家注释的需求。在糖尿病视网膜病变分级中,DLFGWO-RF 利用主观解释减少了注释差异中的分歧。然而,糖尿病视网膜病变数据集的代表性无法捕捉整个人群的多样性,这限制了所提出的 DLFGWO-RF 的泛化能力。因此,对射频进行微调可以稳健地适应数据集中的亚群,从而提高其整体性能。实验在两个广泛使用的医学图像数据集上进行,以检验模型的有效性。实验结果表明,DLFGWO-RF 分类器的分类准确率提高了 90-95% 之间,在各种不平衡数据集上优于现有技术。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
3.60
自引率
15.00%
发文量
241
期刊介绍: The ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) publishes high quality original archival papers and technical notes in the areas of computation and processing of information in Asian languages, low-resource languages of Africa, Australasia, Oceania and the Americas, as well as related disciplines. The subject areas covered by TALLIP include, but are not limited to: -Computational Linguistics: including computational phonology, computational morphology, computational syntax (e.g. parsing), computational semantics, computational pragmatics, etc. -Linguistic Resources: including computational lexicography, terminology, electronic dictionaries, cross-lingual dictionaries, electronic thesauri, etc. -Hardware and software algorithms and tools for Asian or low-resource language processing, e.g., handwritten character recognition. -Information Understanding: including text understanding, speech understanding, character recognition, discourse processing, dialogue systems, etc. -Machine Translation involving Asian or low-resource languages. -Information Retrieval: including natural language processing (NLP) for concept-based indexing, natural language query interfaces, semantic relevance judgments, etc. -Information Extraction and Filtering: including automatic abstraction, user profiling, etc. -Speech processing: including text-to-speech synthesis and automatic speech recognition. -Multimedia Asian Information Processing: including speech, image, video, image/text translation, etc. -Cross-lingual information processing involving Asian or low-resource languages. -Papers that deal in theory, systems design, evaluation and applications in the aforesaid subjects are appropriate for TALLIP. Emphasis will be placed on the originality and the practical significance of the reported research.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信