Journal of Cheminformatics最新文献

筛选
英文 中文
Prediction of Pt, Ir, Ru, and Rh complexes light absorption in the therapeutic window for phototherapy using machine learning 利用机器学习预测Pt, Ir, Ru和Rh配合物在光疗治疗窗口中的光吸收
IF 7.1 2区 化学
Journal of Cheminformatics Pub Date : 2025-01-05 DOI: 10.1186/s13321-024-00939-5
V. Vigna, T. F. G. G. Cova, A. A. C. C. Pais, E. Sicilia
{"title":"Prediction of Pt, Ir, Ru, and Rh complexes light absorption in the therapeutic window for phototherapy using machine learning","authors":"V. Vigna,&nbsp;T. F. G. G. Cova,&nbsp;A. A. C. C. Pais,&nbsp;E. Sicilia","doi":"10.1186/s13321-024-00939-5","DOIUrl":"10.1186/s13321-024-00939-5","url":null,"abstract":"<div><p>Effective light-based cancer treatments, such as photodynamic therapy (PDT) and photoactivated chemotherapy (PACT), rely on compounds that are activated by light efficiently, and absorb within the therapeutic window (600–850 nm). Traditional prediction methods for these light absorption properties, including Time-Dependent Density Functional Theory (TDDFT), are often computationally intensive and time-consuming. In this study, we explore a machine learning (ML) approach to predict the light absorption in the region of the therapeutic window of platinum, iridium, ruthenium, and rhodium complexes, aiming at streamlining the screening of potential photoactivatable prodrugs. By compiling a dataset of 9775 complexes from the Reaxys database, we trained six classification models, including random forests, support vector machines, and neural networks, utilizing various molecular descriptors. Our findings indicate that the Extreme Gradient Boosting Classifier (XGBC) paired with AtomPairs2D descriptors delivers the highest predictive accuracy and robustness. This ML-based method significantly accelerates the identification of suitable compounds, providing a valuable tool for the early-stage design and development of phototherapy drugs. The method also allows to change relevant structural characteristics of a base molecule using information from the supervised approach.</p><p><b>Scientific Contribution:</b> The proposed machine learning (ML) approach predicts the ability of transition metal-based complexes to absorb light in the UV–vis therapeutic window, a key trait for phototherapeutic agents. While ML models have been used to predict UV–vis properties of organic molecules, applying this to metal complexes is novel. The model is efficient, fast, and resource-light, using decision tree-based algorithms that provide interpretable results. This interpretability helps to understand classification rules and facilitates targeted structural modifications to convert inactive complexes into potentially active ones.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2025-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-024-00939-5","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142925561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DeepTGIN: a novel hybrid multimodal approach using transformers and graph isomorphism networks for protein-ligand binding affinity prediction DeepTGIN:一种新的混合多模态方法,使用变压器和图同构网络进行蛋白质配体结合亲和力预测
IF 7.1 2区 化学
Journal of Cheminformatics Pub Date : 2024-12-29 DOI: 10.1186/s13321-024-00938-6
Guishen Wang, Hangchen Zhang, Mengting Shao, Yuncong Feng, Chen Cao, Xiaowen Hu
{"title":"DeepTGIN: a novel hybrid multimodal approach using transformers and graph isomorphism networks for protein-ligand binding affinity prediction","authors":"Guishen Wang,&nbsp;Hangchen Zhang,&nbsp;Mengting Shao,&nbsp;Yuncong Feng,&nbsp;Chen Cao,&nbsp;Xiaowen Hu","doi":"10.1186/s13321-024-00938-6","DOIUrl":"10.1186/s13321-024-00938-6","url":null,"abstract":"<p>Predicting protein-ligand binding affinity is essential for understanding protein-ligand interactions and advancing drug discovery. Recent research has demonstrated the advantages of sequence-based models and graph-based models. In this study, we present a novel hybrid multimodal approach, DeepTGIN, which integrates transformers and graph isomorphism networks to predict protein-ligand binding affinity. DeepTGIN is designed to learn sequence and graph features efficiently. The DeepTGIN model comprises three modules: the data representation module, the encoder module, and the prediction module. The transformer encoder learns sequential features from proteins and protein pockets separately, while the graph isomorphism network extracts graph features from the ligands. To evaluate the performance of DeepTGIN, we compared it with state-of-the-art models using the PDBbind 2016 core set and PDBbind 2013 core set. DeepTGIN outperforms these models in terms of R, RMSE, MAE, SD, and CI metrics. Ablation studies further demonstrate the effectiveness of the ligand features and the encoder module. The code is available at: https://github.com/zhc-moushang/DeepTGIN.</p><p>DeepTGIN is a novel hybrid multimodal deep learning model for predict protein-ligand binding affinity. The model combines the Transformer encoder to extract sequence features from protein and protein pocket, while integrating graph isomorphism networks to capture features from the ligand. This model addresses the limitations of existing methods in exploring protein pocket and ligand features.</p>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"16 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2024-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-024-00938-6","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142889770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
STOUT V2.0: SMILES to IUPAC name conversion using transformer models 使用变压器模型的SMILES到IUPAC名称转换
IF 7.1 2区 化学
Journal of Cheminformatics Pub Date : 2024-12-27 DOI: 10.1186/s13321-024-00941-x
Kohulan Rajan, Achim Zielesny, Christoph Steinbeck
{"title":"STOUT V2.0: SMILES to IUPAC name conversion using transformer models","authors":"Kohulan Rajan,&nbsp;Achim Zielesny,&nbsp;Christoph Steinbeck","doi":"10.1186/s13321-024-00941-x","DOIUrl":"10.1186/s13321-024-00941-x","url":null,"abstract":"<div><p>Naming chemical compounds systematically is a complex task governed by a set of rules established by the International Union of Pure and Applied Chemistry (IUPAC). These rules are universal and widely accepted by chemists worldwide, but their complexity makes it challenging for individuals to consistently apply them accurately. A translation method can be employed to address this challenge. Accurate translation of chemical compounds from SMILES notation into their corresponding IUPAC names is crucial, as it can significantly streamline the laborious process of naming chemical structures. Here, we present STOUT (SMILES-TO-IUPAC-name translator) V2, which addresses this challenge by introducing a transformer-based model that translates string representations of chemical structures into IUPAC names. Trained on a dataset of nearly 1 billion SMILES strings and their corresponding IUPAC names, STOUT V2 demonstrates exceptional accuracy in generating IUPAC names, even for complex chemical structures. The model's ability to capture intricate patterns and relationships within chemical structures enables it to generate precise and standardised IUPAC names. While established deterministic algorithms remain the gold standard for systematic chemical naming, our work, enabled by access to OpenEye’s Lexichem software through an academic license, demonstrates the potential of neural approaches to complement existing tools in chemical nomenclature.</p><p><b>Scientific contribution </b>STOUT V2, built upon transformer-based models, is a significant advancement from our previous work. The web application enhances its accessibility and utility. By making the model and source code fully open and well-documented, we aim to promote unrestricted use and encourage further development.</p><h3>Graphical Abstract</h3>\u0000<div><figure><div><div><picture><img></picture></div></div></figure></div></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"16 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2024-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-024-00941-x","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142888617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comprehensive benchmarking of computational tools for predicting toxicokinetic and physicochemical properties of chemicals 综合基准的计算工具,预测毒性动力学和物理化学性质的化学品
IF 7.1 2区 化学
Journal of Cheminformatics Pub Date : 2024-12-26 DOI: 10.1186/s13321-024-00931-z
Domenico Gadaleta, Eva Serrano-Candelas, Rita Ortega-Vallbona, Erika Colombo, Marina Garcia de Lomana, Giada Biava, Pablo Aparicio-Sánchez, Alessandra Roncaglioni, Rafael Gozalbes, Emilio Benfenati
{"title":"Comprehensive benchmarking of computational tools for predicting toxicokinetic and physicochemical properties of chemicals","authors":"Domenico Gadaleta,&nbsp;Eva Serrano-Candelas,&nbsp;Rita Ortega-Vallbona,&nbsp;Erika Colombo,&nbsp;Marina Garcia de Lomana,&nbsp;Giada Biava,&nbsp;Pablo Aparicio-Sánchez,&nbsp;Alessandra Roncaglioni,&nbsp;Rafael Gozalbes,&nbsp;Emilio Benfenati","doi":"10.1186/s13321-024-00931-z","DOIUrl":"10.1186/s13321-024-00931-z","url":null,"abstract":"<p>Ensuring the safety of chemicals for environmental and human health involves assessing physicochemical (PC) and toxicokinetic (TK) properties, which are crucial for absorption, distribution, metabolism, excretion, and toxicity (ADMET). Computational methods play a vital role in predicting these properties, given the current trends in reducing experimental approaches, especially those that involve animal experimentation. In the present manuscript, twelve software tools implementing Quantitative Structure–Activity Relationship (QSAR) models were selected for the prediction of 17 relevant PC and TK properties. A total of 41 validation datasets were collected from the literature, curated and used for assessing the models’ external predictivity, emphasizing the performance of the models inside the applicability domain. Overall, the results confirmed the adequate predictive performance of the majority of the selected tools, with models for PC properties (R<sup>2</sup> average = 0.717) generally outperforming those for TK properties (R<sup>2</sup> average = 0.639 for regression, average balanced accuracy = 0.780 for classification). Notably, several of the tools evaluated exhibited good predictivity across different properties and were identified as recurring optimal choices. Moreover, a systematic analysis of the chemical space covered by the external validation datasets confirmed the validity of the collected results for relevant chemical categories (e.g., drugs and industrial chemicals), further increasing the confidence in the overall evaluation. The best performing models were ultimately suggested for each investigated property and proposed as robust computational tools for high-throughput assessment of highly relevant chemical properties.</p><p>The present manuscript provides an overview of the state-of-the-art available computational tools for predicting the PC and TK properties of chemicals. The results here offer valuable guidance to researchers, regulatory authorities, and the industry in identifying robust computational tools suitable for predicting relevant chemical properties in the context of chemical design, toxicity and environmental fate assessment.</p>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"16 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-024-00931-z","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142888675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AttenhERG: a reliable and interpretable graph neural network framework for predicting hERG channel blockers 一个可靠和可解释的图神经网络框架,用于预测hERG通道阻滞剂
IF 7.1 2区 化学
Journal of Cheminformatics Pub Date : 2024-12-23 DOI: 10.1186/s13321-024-00940-y
Tianbiao Yang, Xiaoyu Ding, Elizabeth McMichael, Frank W. Pun, Alex Aliper, Feng Ren, Alex Zhavoronkov, Xiao Ding
{"title":"AttenhERG: a reliable and interpretable graph neural network framework for predicting hERG channel blockers","authors":"Tianbiao Yang,&nbsp;Xiaoyu Ding,&nbsp;Elizabeth McMichael,&nbsp;Frank W. Pun,&nbsp;Alex Aliper,&nbsp;Feng Ren,&nbsp;Alex Zhavoronkov,&nbsp;Xiao Ding","doi":"10.1186/s13321-024-00940-y","DOIUrl":"10.1186/s13321-024-00940-y","url":null,"abstract":"<div><p>Cardiotoxicity, particularly drug-induced arrhythmias, poses a significant challenge in drug development, highlighting the importance of early-stage prediction of human ether-a-go-go-related gene (hERG) toxicity. hERG encodes the pore-forming subunit of the cardiac potassium channel. Traditional methods are both costly and time-intensive, necessitating the development of computational approaches. In this study, we introduce AttenhERG, a novel graph neural network framework designed to predict hERG channel blockers reliably and interpretably. AttenhERG demonstrates improved performance compared to existing methods with an AUROC of 0.835, showcasing its efficacy in accurately predicting hERG activity across diverse datasets. Additionally, uncertainty evaluation analysis reveals the model's reliability, enhancing its utility in drug discovery and safety assessment. Case studies illustrate the practical application of AttenhERG in optimizing compounds for hERG toxicity, highlighting its potential in rational drug design.</p><p><b>Scientific contribution</b></p><p>AttenhERG is a breakthrough framework that significantly improves the interpretability and accuracy of predicting hERG channel blockers. By integrating uncertainty estimation, AttenhERG demonstrates superior reliability compared to benchmark models. Two case studies, involving APH1A and NMT1 inhibitors, further emphasize AttenhERG's practical application in compound optimization.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"16 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2024-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-024-00940-y","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142874098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correction: StreaMD: the toolkit for high-throughput molecular dynamics simulations 更正:StreaMD:用于高通量分子动力学模拟的工具包
IF 7.1 2区 化学
Journal of Cheminformatics Pub Date : 2024-12-23 DOI: 10.1186/s13321-024-00942-w
Aleksandra Ivanova, Olena Mokshyna, Pavel Polishchuk
{"title":"Correction: StreaMD: the toolkit for high-throughput molecular dynamics simulations","authors":"Aleksandra Ivanova,&nbsp;Olena Mokshyna,&nbsp;Pavel Polishchuk","doi":"10.1186/s13321-024-00942-w","DOIUrl":"10.1186/s13321-024-00942-w","url":null,"abstract":"","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"16 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2024-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-024-00942-w","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142875284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Interface-aware molecular generative framework for protein–protein interaction modulators 蛋白质相互作用调节剂的界面感知分子生成框架
IF 7.1 2区 化学
Journal of Cheminformatics Pub Date : 2024-12-20 DOI: 10.1186/s13321-024-00930-0
Jianmin Wang, Jiashun Mao, Chunyan Li, Hongxin Xiang, Xun Wang, Shuang Wang, Zixu Wang, Yangyang Chen, Yuquan Li, Kyoung Tai No, Tao Song, Xiangxiang Zeng
{"title":"Interface-aware molecular generative framework for protein–protein interaction modulators","authors":"Jianmin Wang,&nbsp;Jiashun Mao,&nbsp;Chunyan Li,&nbsp;Hongxin Xiang,&nbsp;Xun Wang,&nbsp;Shuang Wang,&nbsp;Zixu Wang,&nbsp;Yangyang Chen,&nbsp;Yuquan Li,&nbsp;Kyoung Tai No,&nbsp;Tao Song,&nbsp;Xiangxiang Zeng","doi":"10.1186/s13321-024-00930-0","DOIUrl":"10.1186/s13321-024-00930-0","url":null,"abstract":"<p>Protein–protein interactions (PPIs) play a crucial role in numerous biochemical and biological processes. Although several structure-based molecular generative models have been developed, PPI interfaces and compounds targeting PPIs exhibit distinct physicochemical properties compared to traditional binding pockets and small-molecule drugs. As a result, generating compounds that effectively target PPIs, particularly by considering PPI complexes or interface hotspot residues, remains a significant challenge. In this work, we constructed a comprehensive dataset of PPI interfaces with active and inactive compound pairs. Based on this, we propose a novel molecular generative framework tailored to PPI interfaces, named GENiPPI. Our evaluation demonstrates that GENiPPI captures the implicit relationships between the PPI interfaces and the active molecules, and can generate novel compounds that target these interfaces. Moreover, GENiPPI can generate structurally diverse novel compounds with limited PPI interface modulators. To the best of our knowledge, this is the first exploration of a structure-based molecular generative model focused on PPI interfaces, which could facilitate the design of PPI modulators. The PPI interface-based molecular generative model enriches the existing landscape of structure-based (pocket/interface) molecular generative model.</p><p>This study introduces GENiPPI, a protein-protein interaction (PPI) interface-aware molecular generative framework. The framework first employs Graph Attention Networks to capture atomic-level interaction features at the protein complex interface. Subsequently, Convolutional Neural Networks extract compound representations in voxel and electron density spaces. These features are integrated into a Conditional Wasserstein Generative Adversarial\u0000Network, which trains the model to generate compound representations targeting PPI interfaces. GENiPPI effectively captures the relationship between PPI interfaces and active/inactive compounds. Furthermore, in fewshot molecular generation, GENiPPI successfully generates compounds comparable to known disruptors. GENiPPI provides an efficient tool for structure-based design of PPI modulators.</p>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"16 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-024-00930-0","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142858556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MolNexTR: a generalized deep learning model for molecular image recognition MolNexTR:用于分子图像识别的广义深度学习模型
IF 7.1 2区 化学
Journal of Cheminformatics Pub Date : 2024-12-18 DOI: 10.1186/s13321-024-00926-w
Yufan Chen, Ching Ting Leung, Yong Huang, Jianwei Sun, Hao Chen, Hanyu Gao
{"title":"MolNexTR: a generalized deep learning model for molecular image recognition","authors":"Yufan Chen,&nbsp;Ching Ting Leung,&nbsp;Yong Huang,&nbsp;Jianwei Sun,&nbsp;Hao Chen,&nbsp;Hanyu Gao","doi":"10.1186/s13321-024-00926-w","DOIUrl":"10.1186/s13321-024-00926-w","url":null,"abstract":"<div><p>In the field of chemical structure recognition, the task of converting molecular images into machine-readable data formats such as SMILES string stands as a significant challenge, primarily due to the varied drawing styles and conventions prevalent in chemical literature. To bridge this gap, we proposed MolNexTR, a novel image-to-graph deep learning model that collaborates to fuse the strengths of ConvNext, a powerful Convolutional Neural Network variant, and Vision-TRansformer. This integration facilitates a more detailed extraction of both local and global features from molecular images. MolNexTR can predict atoms and bonds simultaneously and understand their layout rules. It also excels at flexibly integrating symbolic chemistry principles to discern chirality and decipher abbreviated structures. We further incorporate a series of advanced algorithms, including an improved data augmentation module, an image contamination module, and a post-processing module for getting the final SMILES output. These modules cooperate to enhance the model’s robustness to diverse styles of molecular images found in real literature. In our test sets, MolNexTR has demonstrated superior performance, achieving an accuracy rate of 81–97%, marking a significant advancement in the domain of molecular structure recognition.</p><p><b>Scientific contribution</b></p><p>MolNexTR is a novel image-to-graph model that incorporates a unique dual-stream encoder to extract complex molecular image features, and combines chemical rules to predict atoms and bonds while understanding atom and bond layout rules. In addition, it employs a series of novel augmentation algorithms to significantly enhance the robustness and performance of the model.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"16 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-024-00926-w","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142841267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FlavorMiner: a machine learning platform for extracting molecular flavor profiles from structural data FlavorMiner:一个从结构数据中提取分子风味特征的机器学习平台
IF 7.1 2区 化学
Journal of Cheminformatics Pub Date : 2024-12-10 DOI: 10.1186/s13321-024-00935-9
Fabio Herrera-Rocha, Miguel Fernández-Niño, Jorge Duitama, Mónica P. Cala, María José Chica, Ludger A. Wessjohann, Mehdi D. Davari, Andrés Fernando González Barrios
{"title":"FlavorMiner: a machine learning platform for extracting molecular flavor profiles from structural data","authors":"Fabio Herrera-Rocha,&nbsp;Miguel Fernández-Niño,&nbsp;Jorge Duitama,&nbsp;Mónica P. Cala,&nbsp;María José Chica,&nbsp;Ludger A. Wessjohann,&nbsp;Mehdi D. Davari,&nbsp;Andrés Fernando González Barrios","doi":"10.1186/s13321-024-00935-9","DOIUrl":"10.1186/s13321-024-00935-9","url":null,"abstract":"<div><p>Flavor is the main factor driving consumers acceptance of food products. However, tracking the biochemistry of flavor is a formidable challenge due to the complexity of food composition. Current methodologies for linking individual molecules to flavor in foods and beverages are expensive and time-consuming. Predictive models based on machine learning (ML) are emerging as an alternative to speed up this process. Nonetheless, the optimal approach to predict flavor features of molecules remains elusive. In this work we present FlavorMiner, an ML-based multilabel flavor predictor. FlavorMiner seamlessly integrates different combinations of algorithms and mathematical representations, augmented with class balance strategies to address the inherent class of the input dataset. Notably, Random Forest and K-Nearest Neighbors combined with Extended Connectivity Fingerprint and RDKit molecular descriptors consistently outperform other combinations in most cases. Resampling strategies surpass weight balance methods in mitigating bias associated with class imbalance. FlavorMiner exhibits remarkable accuracy, with an average ROC AUC score of 0.88. This algorithm was used to analyze cocoa metabolomics data, unveiling its profound potential to help extract valuable insights from intricate food metabolomics data. FlavorMiner can be used for flavor mining in any food product, drawing from a diverse training dataset that spans over 934 distinct food products.</p><p><b>Scientific Contribution</b> FlavorMiner is an advanced machine learning (ML)-based tool designed to predict molecular flavor features with high accuracy and efficiency, addressing the complexity of food metabolomics. By leveraging robust algorithmic combinations paired with mathematical representations FlavorMiner achieves high predictive performance. Applied to cocoa metabolomics, FlavorMiner demonstrated its capacity to extract meaningful insights, showcasing its versatility for flavor analysis across diverse food products. This study underscores the transformative potential of ML in accelerating flavor biochemistry research, offering a scalable solution for the food and beverage industry.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"16 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2024-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-024-00935-9","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142796783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Human-in-the-loop active learning for goal-oriented molecule generation 面向目标分子生成的人在环主动学习
IF 7.1 2区 化学
Journal of Cheminformatics Pub Date : 2024-12-09 DOI: 10.1186/s13321-024-00924-y
Yasmine Nahal, Janosch Menke, Julien Martinelli, Markus Heinonen, Mikhail Kabeshov, Jon Paul Janet, Eva Nittinger, Ola Engkvist, Samuel Kaski
{"title":"Human-in-the-loop active learning for goal-oriented molecule generation","authors":"Yasmine Nahal,&nbsp;Janosch Menke,&nbsp;Julien Martinelli,&nbsp;Markus Heinonen,&nbsp;Mikhail Kabeshov,&nbsp;Jon Paul Janet,&nbsp;Eva Nittinger,&nbsp;Ola Engkvist,&nbsp;Samuel Kaski","doi":"10.1186/s13321-024-00924-y","DOIUrl":"10.1186/s13321-024-00924-y","url":null,"abstract":"<p>Machine learning (ML) systems have enabled the modelling of quantitative structure–property relationships (QSPR) and structure-activity relationships (QSAR) using existing experimental data to predict target properties for new molecules. These property predictors hold significant potential in accelerating drug discovery by guiding generative artificial intelligence (AI) agents to explore desired chemical spaces. However, they often struggle to generalize due to the limited scope of the training data. When optimized by generative agents, this limitation can result in the generation of molecules with artificially high predicted probabilities of satisfying target properties, which subsequently fail experimental validation. To address this challenge, we propose an adaptive approach that integrates active learning (AL) and iterative feedback to refine property predictors, thereby improving the outcomes of their optimization by generative AI agents. Our method leverages the Expected Predictive Information Gain (EPIG) criterion to select additional molecules for evaluation by an oracle. This process aims to provide the greatest reduction in predictive uncertainty, enabling more accurate model evaluations of subsequently generated molecules. Recognizing the impracticality of immediate wet-lab or physics-based experiments due to time and logistical constraints, we propose leveraging human experts for their cost-effectiveness and domain knowledge to effectively augment property predictors, bridging gaps in the limited training data. Empirical evaluations through both simulated and real human-in-the-loop experiments demonstrate that our approach refines property predictors to better align with oracle assessments. Additionally, we observe improved accuracy of predicted properties as well as improved drug-likeness among the top-ranking generated molecules.</p><p>We present an adaptable framework that integrates AL and human expertise to refine property predictors for goal-oriented molecule generation. This approach is robust to noise in human feedback and ensures that navigating chemical space with human-refined predictors leverages human insights to identify molecules that not only satisfy predicted property profiles but also score highly on oracle models. Additionally, it prioritizes practical characteristics such as drug-likeness, synthetic accessibility, and a favorable balance between exploring diverse chemical space and exploiting similarity to existing training data.</p>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"16 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-024-00924-y","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142796789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信