{"title":"Retention time prediction of forensic compounds using ensemble machine learning and molecular descriptors.","authors":"Asena Avci Akca, Sefa Akca","doi":"10.1016/j.jchromb.2025.124812","DOIUrl":null,"url":null,"abstract":"<p><p>Retention time (RT) prediction can greatly improve the efficiency of chromatographic workflows in forensic toxicology, especially in high-throughput or non-targeted analytical workflows. In the present study, we compare the performance of four ensemble machine learning models-Random Forest (RF), Extra Trees, XGBoost, and LightGBM-in predicting RTs of 229 structurally diverse forensic compounds. Each compound was represented by a minimal set of RDKit-derived descriptors and an extended feature space that combines Mordred descriptors and Morgan circular fingerprints. All RTs were experimentally measured under standardized reversed-phase liquid chromatographic conditions. Model performance was evaluated using coefficient of determination (R<sup>2</sup>) and root-mean-square error (RMSE). Results show that models trained on extended descriptors (>2000 molecular features) outperformed those trained on basic descriptors, with XGBoost showing the highest predictive power (R<sup>2</sup> = 0.718, RMSE = 1.23). Feature importance analysis showed that RTs are not only affected by global molecular properties like hydrophobicity and size but also by topological and electronic features. These results highlight the value of ensemble learning in RT prediction and demonstrate its practical utility in compound screening and chromatographic method development in forensic toxicology.</p>","PeriodicalId":520661,"journal":{"name":"Journal of chromatography. B, Analytical technologies in the biomedical and life sciences","volume":"1267 ","pages":"124812"},"PeriodicalIF":0.0000,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of chromatography. B, Analytical technologies in the biomedical and life sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1016/j.jchromb.2025.124812","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Retention time (RT) prediction can greatly improve the efficiency of chromatographic workflows in forensic toxicology, especially in high-throughput or non-targeted analytical workflows. In the present study, we compare the performance of four ensemble machine learning models-Random Forest (RF), Extra Trees, XGBoost, and LightGBM-in predicting RTs of 229 structurally diverse forensic compounds. Each compound was represented by a minimal set of RDKit-derived descriptors and an extended feature space that combines Mordred descriptors and Morgan circular fingerprints. All RTs were experimentally measured under standardized reversed-phase liquid chromatographic conditions. Model performance was evaluated using coefficient of determination (R2) and root-mean-square error (RMSE). Results show that models trained on extended descriptors (>2000 molecular features) outperformed those trained on basic descriptors, with XGBoost showing the highest predictive power (R2 = 0.718, RMSE = 1.23). Feature importance analysis showed that RTs are not only affected by global molecular properties like hydrophobicity and size but also by topological and electronic features. These results highlight the value of ensemble learning in RT prediction and demonstrate its practical utility in compound screening and chromatographic method development in forensic toxicology.