{"title":"Enhancing molecular representation via fusion of multimodal transformers with integrated periodic local and global features","authors":"Jia Ao, Xiangsheng Huang, Wei Dai, Cancan Ji","doi":"10.1007/s10822-025-00658-5","DOIUrl":null,"url":null,"abstract":"<div><p>Due to the complexity of molecules, molecular learning requires a large amount of molecular data. However, labeled data is typically limited, making self-supervised pretraining methods essential. Despite this, current pretraining methods often fail to sufficiently focus on both local and global molecular information. In this study, we propose a multi-modality self-supervised learning framework that simultaneously captures local and global information. Specifically, we encode SMILES sequences and molecular graphs separately and use a unified fusion approach to strengthen the interaction between the two modalities. Moreover, in the molecular graph encoding, we independently capture global and local information, and enhance the attention to bond features through information fusion. Additionally, we introduce the FA-FFN module to aggregate periodic features of the molecule. Experimental results show that MoleTGL exhibits superior performance compared to existing methods on seven classification tasks and six regression tasks related to molecular property prediction, and ablation studies confirm the effectiveness of local and global feature fusion and the superiority of the methods for acquiring local and global information.</p></div>","PeriodicalId":621,"journal":{"name":"Journal of Computer-Aided Molecular Design","volume":"39 1","pages":""},"PeriodicalIF":3.1000,"publicationDate":"2025-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computer-Aided Molecular Design","FirstCategoryId":"99","ListUrlMain":"https://link.springer.com/article/10.1007/s10822-025-00658-5","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Due to the complexity of molecules, molecular learning requires a large amount of molecular data. However, labeled data is typically limited, making self-supervised pretraining methods essential. Despite this, current pretraining methods often fail to sufficiently focus on both local and global molecular information. In this study, we propose a multi-modality self-supervised learning framework that simultaneously captures local and global information. Specifically, we encode SMILES sequences and molecular graphs separately and use a unified fusion approach to strengthen the interaction between the two modalities. Moreover, in the molecular graph encoding, we independently capture global and local information, and enhance the attention to bond features through information fusion. Additionally, we introduce the FA-FFN module to aggregate periodic features of the molecule. Experimental results show that MoleTGL exhibits superior performance compared to existing methods on seven classification tasks and six regression tasks related to molecular property prediction, and ablation studies confirm the effectiveness of local and global feature fusion and the superiority of the methods for acquiring local and global information.
期刊介绍:
The Journal of Computer-Aided Molecular Design provides a form for disseminating information on both the theory and the application of computer-based methods in the analysis and design of molecules. The scope of the journal encompasses papers which report new and original research and applications in the following areas:
- theoretical chemistry;
- computational chemistry;
- computer and molecular graphics;
- molecular modeling;
- protein engineering;
- drug design;
- expert systems;
- general structure-property relationships;
- molecular dynamics;
- chemical database development and usage.