{"title":"MolFCL: predicting molecular properties through chemistry-guided contrastive and prompt learning.","authors":"Xiang Tang, Qichang Zhao, Jianxin Wang, Guihua Duan","doi":"10.1093/bioinformatics/btaf061","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>Accurately identifying and predicting molecular properties is a crucial task in molecular machine learning, and the key lies in how to extract effective molecular representations. Contrastive learning opens new avenues for representation learning, and a large amount of unlabeled data enables the model to generalize to the huge chemical space. However, existing contrastive learning-based models face two challenges: (1) existing methods destroy the original molecular environment and ignore chemical prior information, and (2) there is a lack of a prior knowledge to guide the prediction of molecular properties.</p><p><strong>Results: </strong>In this work, we propose a molecular property prediction framework called MolFCL, which consists of fragment-based contrastive learning and functional group-based prompt learning. Specifically, we introduced fragment-fragment interactions for the first time in the contrastive learning framework and designed a fragment-based augmented molecular graph that integrates the original chemical environment and fragment reactions. Furthermore, we proposed a novel functional group-based prompt learning during fine-tuning, which first incorporates functional group knowledge and the corresponding atomic signals, to improve molecular representation and provide interpretable analyses. The results show that MolFCL outperforms state-of-the-art baseline models on 23 molecular property prediction datasets. Moreover, visualizations show that MolFCL can learn to embed molecules into representations that can distinguish chemical properties. MolFCL can give higher weight to functional groups consistent with chemical knowledge during the prediction of molecular properties, which offers an interpretable ability of the model. Overall, MolFCL is a practically useful tool for molecular property prediction and assists drug scientists in designing drugs more effectively.</p><p><strong>Availability and implementation: </strong>MolFCL is available at https://github.com/tangxiangcsu/MolFCL.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics (Oxford, England)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btaf061","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Motivation: Accurately identifying and predicting molecular properties is a crucial task in molecular machine learning, and the key lies in how to extract effective molecular representations. Contrastive learning opens new avenues for representation learning, and a large amount of unlabeled data enables the model to generalize to the huge chemical space. However, existing contrastive learning-based models face two challenges: (1) existing methods destroy the original molecular environment and ignore chemical prior information, and (2) there is a lack of a prior knowledge to guide the prediction of molecular properties.
Results: In this work, we propose a molecular property prediction framework called MolFCL, which consists of fragment-based contrastive learning and functional group-based prompt learning. Specifically, we introduced fragment-fragment interactions for the first time in the contrastive learning framework and designed a fragment-based augmented molecular graph that integrates the original chemical environment and fragment reactions. Furthermore, we proposed a novel functional group-based prompt learning during fine-tuning, which first incorporates functional group knowledge and the corresponding atomic signals, to improve molecular representation and provide interpretable analyses. The results show that MolFCL outperforms state-of-the-art baseline models on 23 molecular property prediction datasets. Moreover, visualizations show that MolFCL can learn to embed molecules into representations that can distinguish chemical properties. MolFCL can give higher weight to functional groups consistent with chemical knowledge during the prediction of molecular properties, which offers an interpretable ability of the model. Overall, MolFCL is a practically useful tool for molecular property prediction and assists drug scientists in designing drugs more effectively.
Availability and implementation: MolFCL is available at https://github.com/tangxiangcsu/MolFCL.
Supplementary information: Supplementary data are available at Bioinformatics online.