A Topology-Enhanced Multi-Viewed Contrastive Approach for Molecular Graph Representation Learning and Classification.

IF 3.1 4区医学 Q3 CHEMISTRY, MEDICINAL

Molecular Informatics Pub Date : 2025-01-01 DOI:10.1002/minf.202400252

Phu Pham

{"title":"A Topology-Enhanced Multi-Viewed Contrastive Approach for Molecular Graph Representation Learning and Classification.","authors":"Phu Pham","doi":"10.1002/minf.202400252","DOIUrl":null,"url":null,"abstract":"<p><p>In recent times, graph representation learning has been becoming a hot research topic which has attracted a lot of attention from researchers. Graph embeddings have diverse applications across fields such as information and social network analysis, bioinformatics and cheminformatics, natural language processing (NLP), and recommendation systems. Among the advanced deep learning (DL) based architectures used in graph representation learning, graph neural networks (GNNs) have emerged as the dominant and highly effective framework. The recent GNN-based methods have demonstrated state-of-the-art performance on complex supervised and unsupervised tasks at both the node and graph levels. In recent years, to enhance multi-view and structured graph representations, contrastive learning-based techniques have been developed, introducing models known as graph contrastive learning (GCL) models. These GCL approaches leverage unsupervised contrastive methods to capture multi-view graph representations by comparing node and graph embeddings, yielding significant improvements in both graph-level representations and task-specific applications, such as molecular embedding and classification. However, as most GCL techniques are primarily designed to focus on the explicit graph structure through GNN-based encoders, they often overlook critical topological insights that could be provided through topological data analysis (TDA). Given the promising research indicating that topological features can greatly benefit various graph learning tasks, we propose a novel topology-enhanced, multi-view graph contrastive learning model called TMGCL. Our TMGCL model is designed to capture and utilize both comprehensive multi-scale topological and global structural information from graphs. This enhanced representation capability positions TMGCL to directly support a range of applications, such as molecular classification, with improved accuracy and robustness. Extensive experiments within two real-world datasets proved the effectiveness and outperformance of our proposed TMGCL in comparing with state-of-the-art GNN/GCL-based baselines.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"44 1","pages":"e202400252"},"PeriodicalIF":3.1000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Informatics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/minf.202400252","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}

引用次数: 0

Abstract

In recent times, graph representation learning has been becoming a hot research topic which has attracted a lot of attention from researchers. Graph embeddings have diverse applications across fields such as information and social network analysis, bioinformatics and cheminformatics, natural language processing (NLP), and recommendation systems. Among the advanced deep learning (DL) based architectures used in graph representation learning, graph neural networks (GNNs) have emerged as the dominant and highly effective framework. The recent GNN-based methods have demonstrated state-of-the-art performance on complex supervised and unsupervised tasks at both the node and graph levels. In recent years, to enhance multi-view and structured graph representations, contrastive learning-based techniques have been developed, introducing models known as graph contrastive learning (GCL) models. These GCL approaches leverage unsupervised contrastive methods to capture multi-view graph representations by comparing node and graph embeddings, yielding significant improvements in both graph-level representations and task-specific applications, such as molecular embedding and classification. However, as most GCL techniques are primarily designed to focus on the explicit graph structure through GNN-based encoders, they often overlook critical topological insights that could be provided through topological data analysis (TDA). Given the promising research indicating that topological features can greatly benefit various graph learning tasks, we propose a novel topology-enhanced, multi-view graph contrastive learning model called TMGCL. Our TMGCL model is designed to capture and utilize both comprehensive multi-scale topological and global structural information from graphs. This enhanced representation capability positions TMGCL to directly support a range of applications, such as molecular classification, with improved accuracy and robustness. Extensive experiments within two real-world datasets proved the effectiveness and outperformance of our proposed TMGCL in comparing with state-of-the-art GNN/GCL-based baselines.

查看原文本刊更多论文

分子图表示学习与分类的拓扑增强多视图对比方法。

近年来，图表示学习已经成为一个研究热点，引起了研究者的广泛关注。图嵌入在信息和社会网络分析、生物信息学和化学信息学、自然语言处理（NLP）和推荐系统等领域有着广泛的应用。在用于图表示学习的基于高级深度学习（DL）的架构中，图神经网络（gnn）已成为占主导地位的高效框架。最近基于gnn的方法在节点和图级别上都展示了复杂监督和无监督任务的最先进性能。近年来，为了增强多视图和结构化图表示，基于对比学习的技术得到了发展，引入了图对比学习（GCL）模型。这些GCL方法利用无监督的对比方法，通过比较节点和图嵌入来捕获多视图图表示，从而在图级表示和特定于任务的应用程序（如分子嵌入和分类）中产生重大改进。然而，由于大多数GCL技术主要是通过基于gnn的编码器来关注显式图结构，它们往往忽略了可以通过拓扑数据分析（TDA）提供的关键拓扑见解。鉴于有研究表明拓扑特征可以极大地促进各种图学习任务，我们提出了一种新的拓扑增强的多视图图对比学习模型TMGCL。我们的TMGCL模型旨在从图中捕获和利用全面的多尺度拓扑和全局结构信息。这种增强的表示能力使TMGCL能够直接支持一系列应用程序，例如分子分类，并且具有更高的准确性和健壮性。在两个真实数据集中进行的大量实验证明，与最先进的基于GNN/ gcl的基线相比，我们提出的TMGCL的有效性和卓越性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Molecular Informatics CHEMISTRY, MEDICINAL-MATHEMATICAL & COMPUTATIONAL BIOLOGY

CiteScore

7.30

自引率

2.80%

发文量

审稿时长

3 months

期刊介绍： Molecular Informatics is a peer-reviewed, international forum for publication of high-quality, interdisciplinary research on all molecular aspects of bio/cheminformatics and computer-assisted molecular design. Molecular Informatics succeeded QSAR & Combinatorial Science in 2010. Molecular Informatics presents methodological innovations that will lead to a deeper understanding of ligand-receptor interactions, macromolecular complexes, molecular networks, design concepts and processes that demonstrate how ideas and design concepts lead to molecules with a desired structure or function, preferably including experimental validation. The journal''s scope includes but is not limited to the fields of drug discovery and chemical biology, protein and nucleic acid engineering and design, the design of nanomolecular structures, strategies for modeling of macromolecular assemblies, molecular networks and systems, pharmaco- and chemogenomics, computer-assisted screening strategies, as well as novel technologies for the de novo design of biologically active molecules. As a unique feature Molecular Informatics publishes so-called "Methods Corner" review-type articles which feature important technological concepts and advances within the scope of the journal.