Identifying key bioprocess variables using explainable machine learning to enhance culture efficiency and viability of umbilical cord-derived mesenchymal stem cells.
{"title":"Identifying key bioprocess variables using explainable machine learning to enhance culture efficiency and viability of umbilical cord-derived mesenchymal stem cells.","authors":"Tse-Pu Huang, Hsin-Hui Huang, Bing-Tsiong Li, Pei-Hung Shen, Gracy Thomas, Juin-Yi Han, Chi-Ming Chu, Kun-Yi Lin","doi":"10.7150/ijms.127764","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Human umbilical cord-derived mesenchymal stromal/stem cells (UC-MSCs) are promising for regenerative medicine, but consistent manufacturing quality is critical.</p><p><strong>Objective: </strong>To develop and interpret machine-learning models (Extreme gradient boosting (XGBoost), with Shapley Additive Explanations, SHAP) that identify facilitatory and inhibitory factors affecting UC-MSC culture duration and post-processing viability.</p><p><strong>Methods: </strong>We analyzed data from 203 UC-MSC manufacturing cases. Candidate predictors included neonatal characteristics (e.g., sex, delivery mode), processing timelines, medium composition, cell features, and operator-related factors. Performance was evaluated using accuracy, the area under the receiver operating characteristic curve (AUROC), the area under the precision-recall curve (AUPRC), log loss, and Brier score, with calibration assessed in cross-validation.</p><p><strong>Results: </strong>For predicting shorter culture duration (defined as a time interval between UC collection and the completion of cryopreservation of <600 h), the model achieved accuracy = 0.80, AUROC = 0.72, and log loss = 0.55; cross-validation yielded AUROC = 0.68, AUPRC = 0.81, and Brier score = 0.20 with good calibration. For predicting higher cell viability, the model achieved accuracy = 0.71, AUROC = 0.72, and log loss = 0.62; cross-validation yielded AUROC = 0.54, AUPRC = 0.58, and Brier score = 0.26. SHAP analysis indicated that shorter culture duration was most associated with medium composition, processing time, and delivery mode, whereas higher viability was linked to neonatal sex, operator identity, and processing time. Sensitivity analyses showed stable top-ranked features across decision-threshold shifts and after removing operator identity.</p><p><strong>Conclusions: </strong>An interpretable XGBoost+SHAP pipeline is effective for identifying process-critical drivers of UC-MSC culture duration. While current predictive precision for cell viability remains limited, the framework functions as a robust diagnostic tool for elucidating qualitative trends. By exploiting these insights, the model facilitates targeted optimization of media selection, timeline control, and standard operating procedures (SOPs), ultimately enhancing manufacturing quality.</p>","PeriodicalId":14031,"journal":{"name":"International Journal of Medical Sciences","volume":"23 5","pages":"1808-1821"},"PeriodicalIF":3.2000,"publicationDate":"2026-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13133879/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Medical Sciences","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.7150/ijms.127764","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2026/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Human umbilical cord-derived mesenchymal stromal/stem cells (UC-MSCs) are promising for regenerative medicine, but consistent manufacturing quality is critical.
Objective: To develop and interpret machine-learning models (Extreme gradient boosting (XGBoost), with Shapley Additive Explanations, SHAP) that identify facilitatory and inhibitory factors affecting UC-MSC culture duration and post-processing viability.
Methods: We analyzed data from 203 UC-MSC manufacturing cases. Candidate predictors included neonatal characteristics (e.g., sex, delivery mode), processing timelines, medium composition, cell features, and operator-related factors. Performance was evaluated using accuracy, the area under the receiver operating characteristic curve (AUROC), the area under the precision-recall curve (AUPRC), log loss, and Brier score, with calibration assessed in cross-validation.
Results: For predicting shorter culture duration (defined as a time interval between UC collection and the completion of cryopreservation of <600 h), the model achieved accuracy = 0.80, AUROC = 0.72, and log loss = 0.55; cross-validation yielded AUROC = 0.68, AUPRC = 0.81, and Brier score = 0.20 with good calibration. For predicting higher cell viability, the model achieved accuracy = 0.71, AUROC = 0.72, and log loss = 0.62; cross-validation yielded AUROC = 0.54, AUPRC = 0.58, and Brier score = 0.26. SHAP analysis indicated that shorter culture duration was most associated with medium composition, processing time, and delivery mode, whereas higher viability was linked to neonatal sex, operator identity, and processing time. Sensitivity analyses showed stable top-ranked features across decision-threshold shifts and after removing operator identity.
Conclusions: An interpretable XGBoost+SHAP pipeline is effective for identifying process-critical drivers of UC-MSC culture duration. While current predictive precision for cell viability remains limited, the framework functions as a robust diagnostic tool for elucidating qualitative trends. By exploiting these insights, the model facilitates targeted optimization of media selection, timeline control, and standard operating procedures (SOPs), ultimately enhancing manufacturing quality.
期刊介绍:
Original research papers, reviews, and short research communications in any medical related area can be submitted to the Journal on the understanding that the work has not been published previously in whole or part and is not under consideration for publication elsewhere. Manuscripts in basic science and clinical medicine are both considered. There is no restriction on the length of research papers and reviews, although authors are encouraged to be concise. Short research communication is limited to be under 2500 words.