{"title":"An Automated Histopathological Colorectal Cancer Multi-Class Classification System Based on Optimal Image Processing and Prominent Features","authors":"Tasnim Jahan Tonni, Shakil Rana, Kaniz Fatema, Asif Karim, Md. Awlad Hossen Rony, Md. Zahid Hasan, Md. Saddam Hossain Mukta, Sami Azam","doi":"10.1111/coin.70007","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>Colorectal cancer (CRC) is characterized by the uncontrollable growth of cancerous cells within the rectal mucosa. In contrast, colon polyps, precancerous growths, can develop into colon cancer, causing symptoms like rectal bleeding, abdominal pain, diarrhea, weight loss, and constipation. It is the leading cause of death worldwide, and this potentially fatal cancer severely afflicts the elderly. Furthermore, early diagnosis is crucial for effective treatment, as it is often more time-consuming and laborious for experts. This study improved the accuracy of CRC multi-class classification compared to previous research utilizing diverse datasets, such as NCT-CRC-HE-100 K (100,000 images) and CRC-VAL-HE-7 K (7,180 images). Initially, we utilized various image processing techniques on the NCT-CRC-HE-100 K dataset to improve image quality and noise-freeness, followed by multiple feature extraction and selection methods to identify prominent features from a large data hub and experimenting with different approaches to select the best classifiers for these critical features. The third ensemble model (XGB-LightGBM-RF) achieved an optimum accuracy of 99.63% with 40 prominent features using univariate feature selection methods. Moreover, the third ensemble model also achieved 99.73% accuracy from the CRC-VAL-HE-7 K dataset. After combining two datasets, the third ensemble model achieved 99.27% accuracy. In addition, we trained and tested our model with two different datasets. We used 80% data from NCT-CRC-HE-100 K and 20% data from CRC-VAL-HE-7 K, respectively, for training and testing purposes, while the third ensemble model obtained 98.43% accuracy in multi-class classification. The results show that this new framework, which was created using the third ensemble model, can help experts figure out what kinds of CRC diseases people are dealing with at the very beginning of an investigation.</p>\n </div>","PeriodicalId":55228,"journal":{"name":"Computational Intelligence","volume":"40 6","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Intelligence","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/coin.70007","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Colorectal cancer (CRC) is characterized by the uncontrollable growth of cancerous cells within the rectal mucosa. In contrast, colon polyps, precancerous growths, can develop into colon cancer, causing symptoms like rectal bleeding, abdominal pain, diarrhea, weight loss, and constipation. It is the leading cause of death worldwide, and this potentially fatal cancer severely afflicts the elderly. Furthermore, early diagnosis is crucial for effective treatment, as it is often more time-consuming and laborious for experts. This study improved the accuracy of CRC multi-class classification compared to previous research utilizing diverse datasets, such as NCT-CRC-HE-100 K (100,000 images) and CRC-VAL-HE-7 K (7,180 images). Initially, we utilized various image processing techniques on the NCT-CRC-HE-100 K dataset to improve image quality and noise-freeness, followed by multiple feature extraction and selection methods to identify prominent features from a large data hub and experimenting with different approaches to select the best classifiers for these critical features. The third ensemble model (XGB-LightGBM-RF) achieved an optimum accuracy of 99.63% with 40 prominent features using univariate feature selection methods. Moreover, the third ensemble model also achieved 99.73% accuracy from the CRC-VAL-HE-7 K dataset. After combining two datasets, the third ensemble model achieved 99.27% accuracy. In addition, we trained and tested our model with two different datasets. We used 80% data from NCT-CRC-HE-100 K and 20% data from CRC-VAL-HE-7 K, respectively, for training and testing purposes, while the third ensemble model obtained 98.43% accuracy in multi-class classification. The results show that this new framework, which was created using the third ensemble model, can help experts figure out what kinds of CRC diseases people are dealing with at the very beginning of an investigation.
结直肠癌(CRC)的特点是直肠粘膜内癌细胞的不可控生长。相反,结肠息肉,癌前病变,可以发展成结肠癌,引起直肠出血、腹痛、腹泻、体重减轻和便秘等症状。它是世界范围内导致死亡的主要原因,这种潜在的致命癌症严重折磨着老年人。此外,早期诊断对于有效治疗至关重要,因为专家通常更费时费力。与以往利用NCT-CRC-HE-100 K(10万张图像)和CRC- val - he -7 K(7180张图像)等不同数据集的研究相比,本研究提高了CRC多类分类的准确性。首先,我们在nct - crc - he - 100k数据集上使用了各种图像处理技术来提高图像质量和无噪声性,然后使用多种特征提取和选择方法来识别大型数据中心的突出特征,并尝试使用不同的方法来选择这些关键特征的最佳分类器。第三个集成模型(XGB-LightGBM-RF)采用单变量特征选择方法,具有40个突出特征,准确率达到99.63%。此外,在crc - val - he - 7k数据集上,第三个集成模型的准确率也达到了99.73%。结合两个数据集后,第三个集成模型的准确率达到99.27%。此外,我们用两个不同的数据集训练和测试了我们的模型。我们分别使用80%来自NCT-CRC-HE-100 K和20%来自CRC-VAL-HE-7 K的数据进行训练和测试,而第三个集成模型在多类分类中获得了98.43%的准确率。结果表明,这个使用第三个集成模型创建的新框架可以帮助专家在调查开始时弄清楚人们正在处理的结直肠癌疾病类型。
期刊介绍:
This leading international journal promotes and stimulates research in the field of artificial intelligence (AI). Covering a wide range of issues - from the tools and languages of AI to its philosophical implications - Computational Intelligence provides a vigorous forum for the publication of both experimental and theoretical research, as well as surveys and impact studies. The journal is designed to meet the needs of a wide range of AI workers in academic and industrial research.