Qi Wang , Qun Ma , Xiuyan Li , Siqi Ben , Jun Xue , Tianrui Shang , Xiaoxuan Jing , Aidong Liu
{"title":"Predictive model of Ki67 expression level in osteosarcoma based on weakly supervised segmentation and multi-type feature fusion","authors":"Qi Wang , Qun Ma , Xiuyan Li , Siqi Ben , Jun Xue , Tianrui Shang , Xiaoxuan Jing , Aidong Liu","doi":"10.1016/j.cmpb.2025.109098","DOIUrl":null,"url":null,"abstract":"<div><h3>Background and objective</h3><div>Osteosarcoma is a highly malignant bone tumor that occurs primarily in children and adolescents. Ki67 protein expression level (detected through immunohistochemistry) is an important indicator for assessing tumor proliferative activity. This study aims to develop an efficient and low-cost artificial intelligence model to predict Ki67 expression levels from pathological images.</div></div><div><h3>Methods</h3><div>73 hematoxylin and eosin-stained (H&E) whole slide images (WSIs) of osteosarcoma specimens were analyzed. Tumor regions were segmented using weakly supervised learning, followed by extraction of 215 nuclear features including shape, texture, spatial and topological features through the Hover-Net network. Feature selection was performed using five methods: least absolute shrinkage and selection operator (LASSO), mutual information (MI), recursive feature elimination (RFE), Wilcoxon rank sum test (WRST), and extreme gradient boosting (XGBoost), with the top 5 features selected from each method. These features were subsequently integrated with 8 machine learning classifiers: adaptive boosting (AdaBoost), balanced random forest (BalancedRF), k-nearest neighbors (KNN), light gradient boosting machine (LightGBM), multilayer perceptron (MLP), quadratic discriminant analysis (QDA), random forest (RF), and support vector machine (SVM) to determine the optimal hybrid model.</div></div><div><h3>Results</h3><div>By combining 5 key features with 8 machine learning classifiers, we selected the optimal hybrid model (XGBoost+SVM). This model demonstrated the best performance in accuracy (0.767 ± 0.018), recall (0.872 ± 0.036), F1-score (0.800 ± 0.012), and receiver operating characteristic-area under curve (ROC-AUC) (0.884 ± 0.045). The model showed both high accuracy and high sensitivity in Ki67 detection.</div></div><div><h3>Conclusion</h3><div>Our model provides an automated and reliable solution for osteosarcoma Ki67 assessment, reducing dependence on traditional immunohistochemistry. Its excellent performance indicates strong potential for clinical translation.</div></div>","PeriodicalId":10624,"journal":{"name":"Computer methods and programs in biomedicine","volume":"273 ","pages":"Article 109098"},"PeriodicalIF":4.8000,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer methods and programs in biomedicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169260725005140","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
Background and objective
Osteosarcoma is a highly malignant bone tumor that occurs primarily in children and adolescents. Ki67 protein expression level (detected through immunohistochemistry) is an important indicator for assessing tumor proliferative activity. This study aims to develop an efficient and low-cost artificial intelligence model to predict Ki67 expression levels from pathological images.
Methods
73 hematoxylin and eosin-stained (H&E) whole slide images (WSIs) of osteosarcoma specimens were analyzed. Tumor regions were segmented using weakly supervised learning, followed by extraction of 215 nuclear features including shape, texture, spatial and topological features through the Hover-Net network. Feature selection was performed using five methods: least absolute shrinkage and selection operator (LASSO), mutual information (MI), recursive feature elimination (RFE), Wilcoxon rank sum test (WRST), and extreme gradient boosting (XGBoost), with the top 5 features selected from each method. These features were subsequently integrated with 8 machine learning classifiers: adaptive boosting (AdaBoost), balanced random forest (BalancedRF), k-nearest neighbors (KNN), light gradient boosting machine (LightGBM), multilayer perceptron (MLP), quadratic discriminant analysis (QDA), random forest (RF), and support vector machine (SVM) to determine the optimal hybrid model.
Results
By combining 5 key features with 8 machine learning classifiers, we selected the optimal hybrid model (XGBoost+SVM). This model demonstrated the best performance in accuracy (0.767 ± 0.018), recall (0.872 ± 0.036), F1-score (0.800 ± 0.012), and receiver operating characteristic-area under curve (ROC-AUC) (0.884 ± 0.045). The model showed both high accuracy and high sensitivity in Ki67 detection.
Conclusion
Our model provides an automated and reliable solution for osteosarcoma Ki67 assessment, reducing dependence on traditional immunohistochemistry. Its excellent performance indicates strong potential for clinical translation.
期刊介绍:
To encourage the development of formal computing methods, and their application in biomedical research and medical practice, by illustration of fundamental principles in biomedical informatics research; to stimulate basic research into application software design; to report the state of research of biomedical information processing projects; to report new computer methodologies applied in biomedical areas; the eventual distribution of demonstrable software to avoid duplication of effort; to provide a forum for discussion and improvement of existing software; to optimize contact between national organizations and regional user groups by promoting an international exchange of information on formal methods, standards and software in biomedicine.
Computer Methods and Programs in Biomedicine covers computing methodology and software systems derived from computing science for implementation in all aspects of biomedical research and medical practice. It is designed to serve: biochemists; biologists; geneticists; immunologists; neuroscientists; pharmacologists; toxicologists; clinicians; epidemiologists; psychiatrists; psychologists; cardiologists; chemists; (radio)physicists; computer scientists; programmers and systems analysts; biomedical, clinical, electrical and other engineers; teachers of medical informatics and users of educational software.