{"title":"OncoTrace-TOO: Interpretable Machine Learning Framework for Cancer Tissue-of-Origin Identification Using Transcriptomic Signatures","authors":"Yang Hao, Haochun Huang, Daiyun Huang, Jianwen Ruan, Xin Liu, Jianquan Zhang","doi":"10.1002/cnr2.70311","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Background</h3>\n \n <p>Cancer of unknown primary remains a formidable diagnostic challenge due to the inability to pinpoint the primary tumor site, which restricts the use of targeted therapeutics. Although machine-learning methods that integrate transcriptomic approaches have provided valuable insights into tumor origins, they often face challenges in distinguishing biologically similar tumors and typically lack biological interpretability.</p>\n </section>\n \n <section>\n \n <h3> Aims</h3>\n \n <p>This study aims to develop a transparent and biologically interpretable machine learning framework to accurately classify tissue-of-origin across diverse cancer types, thereby facilitation clinical diagnosis.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>We designed OncoTrace-TOO, a novel tissue-of-origin classification model based on gene expression profiles. The model utilizes pan-cancer discriminative molecular features identified through one-vs-rest differential expression analysis and applies logistic regression as the classification algorithm.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>OncoTrace-TOO achieved an overall accuracy of 0.967, with perfect classification for seven cancer types (e.g., CHOL, DLBC, and LAML). The model demonstrated high predictive accuracy in both primary and metastatic cancers across TCGA and GEO validation datasets, with enhanced capability in resolving histologically related malignancies as well as classifying rare cancer subtypes. When applied to independent clinical tumor samples, the model achieved TOO prediction accuracies of 0.857, further validating its robustness. Importantly, the framework offers biologically interpretable predictions by revealing tumor-specific molecular signatures, thus enhancing its clinical applicability.</p>\n </section>\n \n <section>\n \n <h3> Conclusions</h3>\n \n <p>OncoTrace-TOO not only offers high predictive accuracy for tissue-of-origin classification, but also delivers biologically meaningful insights that support clinical decision-making. This framework holds promise for improving diagnostic precision and guiding personalized treatment in challenging cancer cases.</p>\n </section>\n </div>","PeriodicalId":9440,"journal":{"name":"Cancer reports","volume":"8 8","pages":""},"PeriodicalIF":1.9000,"publicationDate":"2025-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cnr2.70311","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cancer reports","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cnr2.70311","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background
Cancer of unknown primary remains a formidable diagnostic challenge due to the inability to pinpoint the primary tumor site, which restricts the use of targeted therapeutics. Although machine-learning methods that integrate transcriptomic approaches have provided valuable insights into tumor origins, they often face challenges in distinguishing biologically similar tumors and typically lack biological interpretability.
Aims
This study aims to develop a transparent and biologically interpretable machine learning framework to accurately classify tissue-of-origin across diverse cancer types, thereby facilitation clinical diagnosis.
Methods
We designed OncoTrace-TOO, a novel tissue-of-origin classification model based on gene expression profiles. The model utilizes pan-cancer discriminative molecular features identified through one-vs-rest differential expression analysis and applies logistic regression as the classification algorithm.
Results
OncoTrace-TOO achieved an overall accuracy of 0.967, with perfect classification for seven cancer types (e.g., CHOL, DLBC, and LAML). The model demonstrated high predictive accuracy in both primary and metastatic cancers across TCGA and GEO validation datasets, with enhanced capability in resolving histologically related malignancies as well as classifying rare cancer subtypes. When applied to independent clinical tumor samples, the model achieved TOO prediction accuracies of 0.857, further validating its robustness. Importantly, the framework offers biologically interpretable predictions by revealing tumor-specific molecular signatures, thus enhancing its clinical applicability.
Conclusions
OncoTrace-TOO not only offers high predictive accuracy for tissue-of-origin classification, but also delivers biologically meaningful insights that support clinical decision-making. This framework holds promise for improving diagnostic precision and guiding personalized treatment in challenging cancer cases.