Yishan Guo , Chenjie Chang , Cheng Chen , Jiahao Li , Jun Yu , Xue Wu , Yuxuan Guo , Shunzhe Mao , Wei Bi , Chen Chen , Xiaoyi Lv
{"title":"Meta-ensemble learning with adaptive sampling for imbalanced medical Raman spectroscopy data","authors":"Yishan Guo , Chenjie Chang , Cheng Chen , Jiahao Li , Jun Yu , Xue Wu , Yuxuan Guo , Shunzhe Mao , Wei Bi , Chen Chen , Xiaoyi Lv","doi":"10.1016/j.asoc.2025.113142","DOIUrl":null,"url":null,"abstract":"<div><div>Raman spectroscopy combined with artificial intelligence algorithms for disease diagnosis has been widely used in the medical field with great potential. However, since the low prevalence of certain diseases makes it difficult to obtain disease samples, the problem of medical Raman spectroscopy data imbalance occurs in disease diagnosis, where the model tends to predict samples in most classes and has a lower diagnostic accuracy for the disease class, which may delay the patient’s treatment or lead to an increased risk of misdiagnosis. In this study, we propose the MAEL model, which utilizes meta-learning ideas to automatically learn a sampling strategy from the data and adaptively resample the query set iteratively, to tackle the problem of unbalance in medical Raman spectroscopy data. We apply the model for the first time to unbalanced medical Raman spectral data to improve the unbalanced data distribution of the spectral data, and use integration learning to integrate all sampling results during model training to improve model performance. We used three metrics, AUC-PRC, G-mean, and F1-score values, to evaluate the performance of the model and compared it with six traditional data balancing methods. The experimental results show that the MAEL model achieves significant improvements on various medical Raman spectroscopy datasets, with maximum improvements of 0.364, 0.563, and 0.587 for the AUC-PRC, G-mean, and F1-score values, respectively. This study provides an effective way to solve the data imbalance problem in medical spectroscopy and has potential applications.</div></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":"176 ","pages":"Article 113142"},"PeriodicalIF":7.2000,"publicationDate":"2025-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Soft Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1568494625004533","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Raman spectroscopy combined with artificial intelligence algorithms for disease diagnosis has been widely used in the medical field with great potential. However, since the low prevalence of certain diseases makes it difficult to obtain disease samples, the problem of medical Raman spectroscopy data imbalance occurs in disease diagnosis, where the model tends to predict samples in most classes and has a lower diagnostic accuracy for the disease class, which may delay the patient’s treatment or lead to an increased risk of misdiagnosis. In this study, we propose the MAEL model, which utilizes meta-learning ideas to automatically learn a sampling strategy from the data and adaptively resample the query set iteratively, to tackle the problem of unbalance in medical Raman spectroscopy data. We apply the model for the first time to unbalanced medical Raman spectral data to improve the unbalanced data distribution of the spectral data, and use integration learning to integrate all sampling results during model training to improve model performance. We used three metrics, AUC-PRC, G-mean, and F1-score values, to evaluate the performance of the model and compared it with six traditional data balancing methods. The experimental results show that the MAEL model achieves significant improvements on various medical Raman spectroscopy datasets, with maximum improvements of 0.364, 0.563, and 0.587 for the AUC-PRC, G-mean, and F1-score values, respectively. This study provides an effective way to solve the data imbalance problem in medical spectroscopy and has potential applications.
期刊介绍:
Applied Soft Computing is an international journal promoting an integrated view of soft computing to solve real life problems.The focus is to publish the highest quality research in application and convergence of the areas of Fuzzy Logic, Neural Networks, Evolutionary Computing, Rough Sets and other similar techniques to address real world complexities.
Applied Soft Computing is a rolling publication: articles are published as soon as the editor-in-chief has accepted them. Therefore, the web site will continuously be updated with new articles and the publication time will be short.