Zhou Zhan , Peng Qinghua , Xiao Xiaoxia , Zou Beiji , Liu Bin , Guo Shuixia
{"title":"An interpretability model for syndrome differentiation of HBV-ACLF in traditional Chinese medicine using small-sample imbalanced data","authors":"Zhou Zhan , Peng Qinghua , Xiao Xiaoxia , Zou Beiji , Liu Bin , Guo Shuixia","doi":"10.1016/j.dcmed.2024.09.005","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><div>Clinical medical record data associated with hepatitis B-related acute-on-chronic liver failure (HBV-ACLF) generally have small sample sizes and a class imbalance. However, most machine learning models are designed based on balanced data and lack interpretability. This study aimed to propose a traditional Chinese medicine (TCM) diagnostic model for HBV-ACLF based on the TCM syndrome differentiation and treatment theory, which is clinically interpretable and highly accurate.</div></div><div><h3>Methods</h3><div>We collected medical records from 261 patients diagnosed with HBV-ACLF, including three syndromes: Yang jaundice (214 cases), Yang-Yin jaundice (41 cases), and Yin jaundice (6 cases). To avoid overfitting of the machine learning model, we excluded the cases of Yin jaundice. After data standardization and cleaning, we obtained 255 relevant medical records of Yang jaundice and Yang-Yin jaundice. To address the class imbalance issue, we employed the oversampling method and five machine learning methods, including logistic regression (LR), support vector machine (SVM), decision tree (DT), random forest (RF), and extreme gradient boosting (XGBoost) to construct the syndrome diagnosis models. This study used precision, F1 score, the area under the receiver operating characteristic (ROC) curve (AUC), and accuracy as model evaluation metrics. The model with the best classification performance was selected to extract the diagnostic rule, and its clinical significance was thoroughly analyzed. Furthermore, we proposed a novel multiple-round stable rule extraction (MRSRE) method to obtain a stable rule set of features that can exhibit the model’s clinical interpretability.</div></div><div><h3>Results</h3><div>The precision of the five machine learning models built using oversampled balanced data exceeded 0.90. Among these models, the accuracy of RF classification of syndrome types was 0.92, and the mean F1 scores of the two categories of Yang jaundice and Yang-Yin jaundice were 0.93 and 0.94, respectively. Additionally, the AUC was 0.98. The extraction rules of the RF syndrome differentiation model based on the MRSRE method revealed that the common features of Yang jaundice and Yang-Yin jaundice were wiry pulse, yellowing of the urine, skin, and eyes, normal tongue body, healthy sublingual vessel, nausea, oil loathing, and poor appetite. The main features of Yang jaundice were a red tongue body and thickened sublingual vessels, whereas those of Yang-Yin jaundice were a dark tongue body, pale white tongue body, white tongue coating, lack of strength, slippery pulse, light red tongue body, slimy tongue coating, and abdominal distension. This is aligned with the classifications made by TCM experts based on TCM syndrome differentiation and treatment theory.</div></div><div><h3>Conclusion</h3><div>Our model can be utilized for differentiating HBV-ACLF syndromes, which has the potential to be applied to generate other clinically interpretable models with high accuracy on clinical data characterized by small sample sizes and a class imbalance.</div></div>","PeriodicalId":33578,"journal":{"name":"Digital Chinese Medicine","volume":"7 2","pages":"Pages 137-147"},"PeriodicalIF":0.0000,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital Chinese Medicine","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2589377724000430","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0
Abstract
Objective
Clinical medical record data associated with hepatitis B-related acute-on-chronic liver failure (HBV-ACLF) generally have small sample sizes and a class imbalance. However, most machine learning models are designed based on balanced data and lack interpretability. This study aimed to propose a traditional Chinese medicine (TCM) diagnostic model for HBV-ACLF based on the TCM syndrome differentiation and treatment theory, which is clinically interpretable and highly accurate.
Methods
We collected medical records from 261 patients diagnosed with HBV-ACLF, including three syndromes: Yang jaundice (214 cases), Yang-Yin jaundice (41 cases), and Yin jaundice (6 cases). To avoid overfitting of the machine learning model, we excluded the cases of Yin jaundice. After data standardization and cleaning, we obtained 255 relevant medical records of Yang jaundice and Yang-Yin jaundice. To address the class imbalance issue, we employed the oversampling method and five machine learning methods, including logistic regression (LR), support vector machine (SVM), decision tree (DT), random forest (RF), and extreme gradient boosting (XGBoost) to construct the syndrome diagnosis models. This study used precision, F1 score, the area under the receiver operating characteristic (ROC) curve (AUC), and accuracy as model evaluation metrics. The model with the best classification performance was selected to extract the diagnostic rule, and its clinical significance was thoroughly analyzed. Furthermore, we proposed a novel multiple-round stable rule extraction (MRSRE) method to obtain a stable rule set of features that can exhibit the model’s clinical interpretability.
Results
The precision of the five machine learning models built using oversampled balanced data exceeded 0.90. Among these models, the accuracy of RF classification of syndrome types was 0.92, and the mean F1 scores of the two categories of Yang jaundice and Yang-Yin jaundice were 0.93 and 0.94, respectively. Additionally, the AUC was 0.98. The extraction rules of the RF syndrome differentiation model based on the MRSRE method revealed that the common features of Yang jaundice and Yang-Yin jaundice were wiry pulse, yellowing of the urine, skin, and eyes, normal tongue body, healthy sublingual vessel, nausea, oil loathing, and poor appetite. The main features of Yang jaundice were a red tongue body and thickened sublingual vessels, whereas those of Yang-Yin jaundice were a dark tongue body, pale white tongue body, white tongue coating, lack of strength, slippery pulse, light red tongue body, slimy tongue coating, and abdominal distension. This is aligned with the classifications made by TCM experts based on TCM syndrome differentiation and treatment theory.
Conclusion
Our model can be utilized for differentiating HBV-ACLF syndromes, which has the potential to be applied to generate other clinically interpretable models with high accuracy on clinical data characterized by small sample sizes and a class imbalance.