Predictive model of Ki67 expression level in osteosarcoma based on weakly supervised segmentation and multi-type feature fusion

IF 4.8 2区医学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computer methods and programs in biomedicine Pub Date : 2025-10-08 DOI:10.1016/j.cmpb.2025.109098

Qi Wang , Qun Ma , Xiuyan Li , Siqi Ben , Jun Xue , Tianrui Shang , Xiaoxuan Jing , Aidong Liu

{"title":"Predictive model of Ki67 expression level in osteosarcoma based on weakly supervised segmentation and multi-type feature fusion","authors":"Qi Wang , Qun Ma , Xiuyan Li , Siqi Ben , Jun Xue , Tianrui Shang , Xiaoxuan Jing , Aidong Liu","doi":"10.1016/j.cmpb.2025.109098","DOIUrl":null,"url":null,"abstract":"<div><h3>Background and objective</h3><div>Osteosarcoma is a highly malignant bone tumor that occurs primarily in children and adolescents. Ki67 protein expression level (detected through immunohistochemistry) is an important indicator for assessing tumor proliferative activity. This study aims to develop an efficient and low-cost artificial intelligence model to predict Ki67 expression levels from pathological images.</div></div><div><h3>Methods</h3><div>73 hematoxylin and eosin-stained (H&E) whole slide images (WSIs) of osteosarcoma specimens were analyzed. Tumor regions were segmented using weakly supervised learning, followed by extraction of 215 nuclear features including shape, texture, spatial and topological features through the Hover-Net network. Feature selection was performed using five methods: least absolute shrinkage and selection operator (LASSO), mutual information (MI), recursive feature elimination (RFE), Wilcoxon rank sum test (WRST), and extreme gradient boosting (XGBoost), with the top 5 features selected from each method. These features were subsequently integrated with 8 machine learning classifiers: adaptive boosting (AdaBoost), balanced random forest (BalancedRF), k-nearest neighbors (KNN), light gradient boosting machine (LightGBM), multilayer perceptron (MLP), quadratic discriminant analysis (QDA), random forest (RF), and support vector machine (SVM) to determine the optimal hybrid model.</div></div><div><h3>Results</h3><div>By combining 5 key features with 8 machine learning classifiers, we selected the optimal hybrid model (XGBoost+SVM). This model demonstrated the best performance in accuracy (0.767 ± 0.018), recall (0.872 ± 0.036), F1-score (0.800 ± 0.012), and receiver operating characteristic-area under curve (ROC-AUC) (0.884 ± 0.045). The model showed both high accuracy and high sensitivity in Ki67 detection.</div></div><div><h3>Conclusion</h3><div>Our model provides an automated and reliable solution for osteosarcoma Ki67 assessment, reducing dependence on traditional immunohistochemistry. Its excellent performance indicates strong potential for clinical translation.</div></div>","PeriodicalId":10624,"journal":{"name":"Computer methods and programs in biomedicine","volume":"273 ","pages":"Article 109098"},"PeriodicalIF":4.8000,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer methods and programs in biomedicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169260725005140","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

Background and objective

Osteosarcoma is a highly malignant bone tumor that occurs primarily in children and adolescents. Ki67 protein expression level (detected through immunohistochemistry) is an important indicator for assessing tumor proliferative activity. This study aims to develop an efficient and low-cost artificial intelligence model to predict Ki67 expression levels from pathological images.

Methods

73 hematoxylin and eosin-stained (H&E) whole slide images (WSIs) of osteosarcoma specimens were analyzed. Tumor regions were segmented using weakly supervised learning, followed by extraction of 215 nuclear features including shape, texture, spatial and topological features through the Hover-Net network. Feature selection was performed using five methods: least absolute shrinkage and selection operator (LASSO), mutual information (MI), recursive feature elimination (RFE), Wilcoxon rank sum test (WRST), and extreme gradient boosting (XGBoost), with the top 5 features selected from each method. These features were subsequently integrated with 8 machine learning classifiers: adaptive boosting (AdaBoost), balanced random forest (BalancedRF), k-nearest neighbors (KNN), light gradient boosting machine (LightGBM), multilayer perceptron (MLP), quadratic discriminant analysis (QDA), random forest (RF), and support vector machine (SVM) to determine the optimal hybrid model.

Results

By combining 5 key features with 8 machine learning classifiers, we selected the optimal hybrid model (XGBoost+SVM). This model demonstrated the best performance in accuracy (0.767 ± 0.018), recall (0.872 ± 0.036), F1-score (0.800 ± 0.012), and receiver operating characteristic-area under curve (ROC-AUC) (0.884 ± 0.045). The model showed both high accuracy and high sensitivity in Ki67 detection.

Conclusion

Our model provides an automated and reliable solution for osteosarcoma Ki67 assessment, reducing dependence on traditional immunohistochemistry. Its excellent performance indicates strong potential for clinical translation.

查看原文本刊更多论文

基于弱监督分割和多类型特征融合的骨肉瘤Ki67表达水平预测模型

背景和目的：骨肉瘤是一种高度恶性的骨肿瘤，主要发生在儿童和青少年。Ki67蛋白表达水平（通过免疫组化检测）是评估肿瘤增殖活性的重要指标。本研究旨在开发一种高效、低成本的人工智能模型，从病理图像中预测Ki67的表达水平。方法：对73例骨肉瘤标本的苏木精和伊红染色（H&E）全片图像（WSIs）进行分析。使用弱监督学习对肿瘤区域进行分割，然后通过Hover-Net网络提取215个核特征，包括形状、纹理、空间和拓扑特征。使用最小绝对收缩和选择算子（LASSO）、互信息（MI）、递归特征消除（RFE）、Wilcoxon秩和检验（WRST）和极端梯度增强（XGBoost）五种方法进行特征选择，每种方法选择前5个特征。这些特征随后与8个机器学习分类器相结合：自适应增强（AdaBoost）、平衡随机森林（BalancedRF）、k近邻（KNN）、光梯度增强机（LightGBM）、多层感知器（MLP）、二次判别分析（QDA）、随机森林（RF）和支持向量机（SVM），以确定最优混合模型。结果：通过结合5个关键特征和8个机器学习分类器，我们选择了最优的混合模型（XGBoost+SVM）。该模型在准确率（0.767±0.018）、召回率（0.872±0.036）、f1评分（0.800±0.012）和受试者工作特征曲线下面积（ROC-AUC）（0.884±0.045）方面表现最佳。该模型对Ki67的检测具有较高的准确性和灵敏度。结论：我们的模型为骨肉瘤Ki67评估提供了一种自动化和可靠的解决方案，减少了对传统免疫组织化学的依赖。其优异的性能表明其具有很强的临床翻译潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computer methods and programs in biomedicine 工程技术-工程：生物医学

CiteScore

12.30

自引率

6.60%

发文量

601

审稿时长

135 days

期刊介绍： To encourage the development of formal computing methods, and their application in biomedical research and medical practice, by illustration of fundamental principles in biomedical informatics research; to stimulate basic research into application software design; to report the state of research of biomedical information processing projects; to report new computer methodologies applied in biomedical areas; the eventual distribution of demonstrable software to avoid duplication of effort; to provide a forum for discussion and improvement of existing software; to optimize contact between national organizations and regional user groups by promoting an international exchange of information on formal methods, standards and software in biomedicine. Computer Methods and Programs in Biomedicine covers computing methodology and software systems derived from computing science for implementation in all aspects of biomedical research and medical practice. It is designed to serve: biochemists; biologists; geneticists; immunologists; neuroscientists; pharmacologists; toxicologists; clinicians; epidemiologists; psychiatrists; psychologists; cardiologists; chemists; (radio)physicists; computer scientists; programmers and systems analysts; biomedical, clinical, electrical and other engineers; teachers of medical informatics and users of educational software.