An interpretability model for syndrome differentiation of HBV-ACLF in traditional Chinese medicine using small-sample imbalanced data

Q3 Medicine

Digital Chinese Medicine Pub Date : 2024-06-01 DOI:10.1016/j.dcmed.2024.09.005

Zhou Zhan , Peng Qinghua , Xiao Xiaoxia , Zou Beiji , Liu Bin , Guo Shuixia

{"title":"An interpretability model for syndrome differentiation of HBV-ACLF in traditional Chinese medicine using small-sample imbalanced data","authors":"Zhou Zhan , Peng Qinghua , Xiao Xiaoxia , Zou Beiji , Liu Bin , Guo Shuixia","doi":"10.1016/j.dcmed.2024.09.005","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><div>Clinical medical record data associated with hepatitis B-related acute-on-chronic liver failure (HBV-ACLF) generally have small sample sizes and a class imbalance. However, most machine learning models are designed based on balanced data and lack interpretability. This study aimed to propose a traditional Chinese medicine (TCM) diagnostic model for HBV-ACLF based on the TCM syndrome differentiation and treatment theory, which is clinically interpretable and highly accurate.</div></div><div><h3>Methods</h3><div>We collected medical records from 261 patients diagnosed with HBV-ACLF, including three syndromes: Yang jaundice (214 cases), Yang-Yin jaundice (41 cases), and Yin jaundice (6 cases). To avoid overfitting of the machine learning model, we excluded the cases of Yin jaundice. After data standardization and cleaning, we obtained 255 relevant medical records of Yang jaundice and Yang-Yin jaundice. To address the class imbalance issue, we employed the oversampling method and five machine learning methods, including logistic regression (LR), support vector machine (SVM), decision tree (DT), random forest (RF), and extreme gradient boosting (XGBoost) to construct the syndrome diagnosis models. This study used precision, F1 score, the area under the receiver operating characteristic (ROC) curve (AUC), and accuracy as model evaluation metrics. The model with the best classification performance was selected to extract the diagnostic rule, and its clinical significance was thoroughly analyzed. Furthermore, we proposed a novel multiple-round stable rule extraction (MRSRE) method to obtain a stable rule set of features that can exhibit the model’s clinical interpretability.</div></div><div><h3>Results</h3><div>The precision of the five machine learning models built using oversampled balanced data exceeded 0.90. Among these models, the accuracy of RF classification of syndrome types was 0.92, and the mean F1 scores of the two categories of Yang jaundice and Yang-Yin jaundice were 0.93 and 0.94, respectively. Additionally, the AUC was 0.98. The extraction rules of the RF syndrome differentiation model based on the MRSRE method revealed that the common features of Yang jaundice and Yang-Yin jaundice were wiry pulse, yellowing of the urine, skin, and eyes, normal tongue body, healthy sublingual vessel, nausea, oil loathing, and poor appetite. The main features of Yang jaundice were a red tongue body and thickened sublingual vessels, whereas those of Yang-Yin jaundice were a dark tongue body, pale white tongue body, white tongue coating, lack of strength, slippery pulse, light red tongue body, slimy tongue coating, and abdominal distension. This is aligned with the classifications made by TCM experts based on TCM syndrome differentiation and treatment theory.</div></div><div><h3>Conclusion</h3><div>Our model can be utilized for differentiating HBV-ACLF syndromes, which has the potential to be applied to generate other clinically interpretable models with high accuracy on clinical data characterized by small sample sizes and a class imbalance.</div></div>","PeriodicalId":33578,"journal":{"name":"Digital Chinese Medicine","volume":"7 2","pages":"Pages 137-147"},"PeriodicalIF":0.0000,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital Chinese Medicine","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2589377724000430","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Medicine","Score":null,"Total":0}

引用次数: 0

Abstract

Objective

Clinical medical record data associated with hepatitis B-related acute-on-chronic liver failure (HBV-ACLF) generally have small sample sizes and a class imbalance. However, most machine learning models are designed based on balanced data and lack interpretability. This study aimed to propose a traditional Chinese medicine (TCM) diagnostic model for HBV-ACLF based on the TCM syndrome differentiation and treatment theory, which is clinically interpretable and highly accurate.

Methods

We collected medical records from 261 patients diagnosed with HBV-ACLF, including three syndromes: Yang jaundice (214 cases), Yang-Yin jaundice (41 cases), and Yin jaundice (6 cases). To avoid overfitting of the machine learning model, we excluded the cases of Yin jaundice. After data standardization and cleaning, we obtained 255 relevant medical records of Yang jaundice and Yang-Yin jaundice. To address the class imbalance issue, we employed the oversampling method and five machine learning methods, including logistic regression (LR), support vector machine (SVM), decision tree (DT), random forest (RF), and extreme gradient boosting (XGBoost) to construct the syndrome diagnosis models. This study used precision, F1 score, the area under the receiver operating characteristic (ROC) curve (AUC), and accuracy as model evaluation metrics. The model with the best classification performance was selected to extract the diagnostic rule, and its clinical significance was thoroughly analyzed. Furthermore, we proposed a novel multiple-round stable rule extraction (MRSRE) method to obtain a stable rule set of features that can exhibit the model’s clinical interpretability.

Results

The precision of the five machine learning models built using oversampled balanced data exceeded 0.90. Among these models, the accuracy of RF classification of syndrome types was 0.92, and the mean F1 scores of the two categories of Yang jaundice and Yang-Yin jaundice were 0.93 and 0.94, respectively. Additionally, the AUC was 0.98. The extraction rules of the RF syndrome differentiation model based on the MRSRE method revealed that the common features of Yang jaundice and Yang-Yin jaundice were wiry pulse, yellowing of the urine, skin, and eyes, normal tongue body, healthy sublingual vessel, nausea, oil loathing, and poor appetite. The main features of Yang jaundice were a red tongue body and thickened sublingual vessels, whereas those of Yang-Yin jaundice were a dark tongue body, pale white tongue body, white tongue coating, lack of strength, slippery pulse, light red tongue body, slimy tongue coating, and abdominal distension. This is aligned with the classifications made by TCM experts based on TCM syndrome differentiation and treatment theory.

Conclusion

Our model can be utilized for differentiating HBV-ACLF syndromes, which has the potential to be applied to generate other clinically interpretable models with high accuracy on clinical data characterized by small sample sizes and a class imbalance.

查看原文本刊更多论文

利用小样本不平衡数据建立中药 HBV-ACLF 证候区分的可解释性模型

目标与乙型肝炎相关的急性慢性肝衰竭（HBV-ACLF）相关的临床病历数据通常样本量较小，且存在类别不平衡的问题。然而，大多数机器学习模型都是基于平衡数据设计的，缺乏可解释性。本研究旨在提出一种基于中医辨证论治理论的 HBV-ACLF 中医诊断模型，该模型具有临床可解释性和高准确性：阳黄（214 例）、阳阴黄（41 例）和阴黄（6 例）。为了避免机器学习模型的过度拟合，我们排除了阴性黄疸的病例。经过数据标准化和清理后，我们获得了 255 份阳黄疸和阳阴黄疸的相关病历。为了解决类不平衡问题，我们采用了超采样方法和五种机器学习方法，包括逻辑回归（LR）、支持向量机（SVM）、决策树（DT）、随机森林（RF）和极梯度提升（XGBoost），来构建综合征诊断模型。本研究采用精确度、F1得分、接收者操作特征曲线下面积（AUC）和准确度作为模型评价指标。选择分类性能最佳的模型提取诊断规则，并深入分析其临床意义。此外，我们还提出了一种新颖的多轮稳定规则提取（MRSRE）方法，以获得稳定的规则特征集，从而展现模型的临床可解释性。在这些模型中，综合征类型 RF 分类的准确率为 0.92，而阳黄疸和阳阴黄疸两个类别的平均 F1 分数分别为 0.93 和 0.94。此外，AUC 为 0.98。基于 MRSRE 方法的射频综合征分型模型的提取规则显示，阳黄疸和阳阴黄疸的共同特征是脉细数，尿黄、肤黄、目黄，舌体正常，舌下血管健康，恶心、厌油、食欲不振。阳黄疸的主要特征是舌体红，舌下血管增粗，而阳阴黄疸的主要特征是舌体暗、舌体淡白、舌苔白、乏力、脉滑、舌体淡红、舌苔黏腻、腹胀。结论我们的模型可用于鉴别 HBV-ACLF 综合征，并有可能应用于生成其他临床可解释模型，在样本量较小且类群不平衡的临床数据上具有较高的准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊