{"title":"基于机器学习的卟啉衍生物生物活性预测:分子描述符、聚类和模型评估。","authors":"Tugba Muhlise Okyay, Ibrahim Yilmaz, Macit Koldas","doi":"10.1007/s43630-025-00733-8","DOIUrl":null,"url":null,"abstract":"<p><p>Understanding the relationship between molecular structure and bioactivity is crucial for optimizing porphyrin-based therapeutics. By integrating cheminformatics techniques with machine learning models, our work enables the efficient classification of compounds based on their molecular structures and their growth inhibition capabilities (IC<sub>50</sub>). A dataset of 317 porphyrin derivatives was compiled, incorporating molecular descriptors and biological activity data. Descriptive statistical analysis was performed to examine compound distribution and key features. Clustering analysis was conducted using hierarchical clustering and fingerprint similarity matrices to classify compounds based on structural similarity. Lipinski's Rule of Five was applied to assess drug-likeness, while Murcko scaffold analysis identified core structural patterns. Tumor response data were analyzed to evaluate therapeutic efficacy. Machine learning models were implemented to predict bioactivity. Descriptive statistics highlighted bioactive compounds, with TMPyP4 and Temaporfin being the most studied. Quantitative estimation of drug-likeness and the number of aliphatic carboxylic acids were identified as the most influential descriptors among others for bioactivity. Hierarchical clustering segmented porphyrins into nine structural groups. The analysis identified 168 pIC<sub>50</sub> active compounds, with 31 meeting Lipinski's criteria, and 11 overlapping as both effective and bioavailable. Tumor response analysis revealed three porphyrins achieving 100% response. Logistic Regression emerged as the best-performing model, achieving 83% accuracy, demonstrating robust predictive capabilities. This study successfully characterized porphyrin derivatives, reviewing key molecular features influencing bioactivity and evaluating their therapeutic potential. It highlights the potential of machine learning in predicting the biological activity status of porphyrin derivatives.</p>","PeriodicalId":98,"journal":{"name":"Photochemical & Photobiological Sciences","volume":" ","pages":""},"PeriodicalIF":2.7000,"publicationDate":"2025-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Machine learning-based bioactivity prediction of porphyrin derivatives: molecular descriptors, clustering, and model evaluation.\",\"authors\":\"Tugba Muhlise Okyay, Ibrahim Yilmaz, Macit Koldas\",\"doi\":\"10.1007/s43630-025-00733-8\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Understanding the relationship between molecular structure and bioactivity is crucial for optimizing porphyrin-based therapeutics. By integrating cheminformatics techniques with machine learning models, our work enables the efficient classification of compounds based on their molecular structures and their growth inhibition capabilities (IC<sub>50</sub>). A dataset of 317 porphyrin derivatives was compiled, incorporating molecular descriptors and biological activity data. Descriptive statistical analysis was performed to examine compound distribution and key features. Clustering analysis was conducted using hierarchical clustering and fingerprint similarity matrices to classify compounds based on structural similarity. Lipinski's Rule of Five was applied to assess drug-likeness, while Murcko scaffold analysis identified core structural patterns. Tumor response data were analyzed to evaluate therapeutic efficacy. Machine learning models were implemented to predict bioactivity. Descriptive statistics highlighted bioactive compounds, with TMPyP4 and Temaporfin being the most studied. Quantitative estimation of drug-likeness and the number of aliphatic carboxylic acids were identified as the most influential descriptors among others for bioactivity. Hierarchical clustering segmented porphyrins into nine structural groups. The analysis identified 168 pIC<sub>50</sub> active compounds, with 31 meeting Lipinski's criteria, and 11 overlapping as both effective and bioavailable. Tumor response analysis revealed three porphyrins achieving 100% response. Logistic Regression emerged as the best-performing model, achieving 83% accuracy, demonstrating robust predictive capabilities. This study successfully characterized porphyrin derivatives, reviewing key molecular features influencing bioactivity and evaluating their therapeutic potential. It highlights the potential of machine learning in predicting the biological activity status of porphyrin derivatives.</p>\",\"PeriodicalId\":98,\"journal\":{\"name\":\"Photochemical & Photobiological Sciences\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":2.7000,\"publicationDate\":\"2025-05-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Photochemical & Photobiological Sciences\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://doi.org/10.1007/s43630-025-00733-8\",\"RegionNum\":3,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Photochemical & Photobiological Sciences","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1007/s43630-025-00733-8","RegionNum":3,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
Machine learning-based bioactivity prediction of porphyrin derivatives: molecular descriptors, clustering, and model evaluation.
Understanding the relationship between molecular structure and bioactivity is crucial for optimizing porphyrin-based therapeutics. By integrating cheminformatics techniques with machine learning models, our work enables the efficient classification of compounds based on their molecular structures and their growth inhibition capabilities (IC50). A dataset of 317 porphyrin derivatives was compiled, incorporating molecular descriptors and biological activity data. Descriptive statistical analysis was performed to examine compound distribution and key features. Clustering analysis was conducted using hierarchical clustering and fingerprint similarity matrices to classify compounds based on structural similarity. Lipinski's Rule of Five was applied to assess drug-likeness, while Murcko scaffold analysis identified core structural patterns. Tumor response data were analyzed to evaluate therapeutic efficacy. Machine learning models were implemented to predict bioactivity. Descriptive statistics highlighted bioactive compounds, with TMPyP4 and Temaporfin being the most studied. Quantitative estimation of drug-likeness and the number of aliphatic carboxylic acids were identified as the most influential descriptors among others for bioactivity. Hierarchical clustering segmented porphyrins into nine structural groups. The analysis identified 168 pIC50 active compounds, with 31 meeting Lipinski's criteria, and 11 overlapping as both effective and bioavailable. Tumor response analysis revealed three porphyrins achieving 100% response. Logistic Regression emerged as the best-performing model, achieving 83% accuracy, demonstrating robust predictive capabilities. This study successfully characterized porphyrin derivatives, reviewing key molecular features influencing bioactivity and evaluating their therapeutic potential. It highlights the potential of machine learning in predicting the biological activity status of porphyrin derivatives.