{"title":"Molecular representation learning via multimodal fusion and decoupling","authors":"Xuan Zang , Junjie Zhang , Buzhou Tang","doi":"10.1016/j.inffus.2025.103493","DOIUrl":null,"url":null,"abstract":"<div><div>Recent years have seen growing attention on self-supervised learning in drug molecule research and discovery. Additionally, a series of methods have emerged that leverage both 2D and 3D structures for molecular representation learning. However, these methods focus only on the modal consistency between 2D and 3D molecular structure relying on molecule-level or atom-level alignment while ignoring modal complementarity. In this paper, we propose a multimodal fusion-then-decoupling self-supervised molecular representation learning method named MolMFD. First, we use a unified encoder to fuse 2D and 3D molecular structural information by incorporating atomic relative distances from both topological and geometric views. Then, we design a learnable noise injection strategy to decouple modality-specific representations, which are subsequently input into separate decoders to predict the structural information of each corresponding modality. Notably, we minimize mutual information to extract the 2D and 3D modality-specific characteristics, considering modality complementarity to enrich the fused molecular representations. We provide a theoretical analysis of the optimization issues and the overlooked complementarity problems in existing 2D and 3D multimodal molecular pre-training methods. Extensive molecular prediction experiments validate the effectiveness and superiority of our proposed MolMFD.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"125 ","pages":"Article 103493"},"PeriodicalIF":15.5000,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525005664","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Recent years have seen growing attention on self-supervised learning in drug molecule research and discovery. Additionally, a series of methods have emerged that leverage both 2D and 3D structures for molecular representation learning. However, these methods focus only on the modal consistency between 2D and 3D molecular structure relying on molecule-level or atom-level alignment while ignoring modal complementarity. In this paper, we propose a multimodal fusion-then-decoupling self-supervised molecular representation learning method named MolMFD. First, we use a unified encoder to fuse 2D and 3D molecular structural information by incorporating atomic relative distances from both topological and geometric views. Then, we design a learnable noise injection strategy to decouple modality-specific representations, which are subsequently input into separate decoders to predict the structural information of each corresponding modality. Notably, we minimize mutual information to extract the 2D and 3D modality-specific characteristics, considering modality complementarity to enrich the fused molecular representations. We provide a theoretical analysis of the optimization issues and the overlooked complementarity problems in existing 2D and 3D multimodal molecular pre-training methods. Extensive molecular prediction experiments validate the effectiveness and superiority of our proposed MolMFD.
期刊介绍:
Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.