TopoFormer: Multiscale Topology-enabled Structure-to-Sequence Transformer for Protein-Ligand Interaction Predictions

Research Square Pub Date : 2024-02-09 DOI:10.21203/rs.3.rs-3640878/v1

Guo-Wei Wei, Dong Chen, Jian Liu

{"title":"TopoFormer: Multiscale Topology-enabled Structure-to-Sequence Transformer for Protein-Ligand Interaction Predictions","authors":"Guo-Wei Wei, Dong Chen, Jian Liu","doi":"10.21203/rs.3.rs-3640878/v1","DOIUrl":null,"url":null,"abstract":"Abstract Pre-trained deep Transformers have had tremendous success in a wide variety of disciplines. However, in computational biology, essentially all Transformers are built upon the biological sequences, which ignores vital stereochemical information and may result in crucial errors in downstream predictions. On the other hand, three-dimensional (3D) molecular structures are incompatible with the sequential architecture of Transformer and natural language processing (NLP) models in general. This work addresses this foundational challenge by a topological Transformer (TopoFormer). TopoFormer is built by integrating NLP and a multiscale topology techniques, the persistent topological hyperdigraph Laplacian (PTHL), which systematically converts intricate 3D protein-ligand complexes at various spatial scales into a NLP-admissible sequence of topological invariants and homotopic shapes. Element-specific PTHLs are further developed to embed crucial physical, chemical, and biological interactions into topological sequences. TopoFormer surges ahead of conventional algorithms and recent deep learning variants and gives rise to exemplary scoring accuracy and superior performance in ranking, docking, and screening tasks in a number of benchmark datasets. The proposed topological sequences can be extracted from all kinds of structural data in data science to facilitate various NLP models, heralding a new era in AI-driven discovery.","PeriodicalId":21039,"journal":{"name":"Research Square","volume":"51 10","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Research Square","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21203/rs.3.rs-3640878/v1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Abstract Pre-trained deep Transformers have had tremendous success in a wide variety of disciplines. However, in computational biology, essentially all Transformers are built upon the biological sequences, which ignores vital stereochemical information and may result in crucial errors in downstream predictions. On the other hand, three-dimensional (3D) molecular structures are incompatible with the sequential architecture of Transformer and natural language processing (NLP) models in general. This work addresses this foundational challenge by a topological Transformer (TopoFormer). TopoFormer is built by integrating NLP and a multiscale topology techniques, the persistent topological hyperdigraph Laplacian (PTHL), which systematically converts intricate 3D protein-ligand complexes at various spatial scales into a NLP-admissible sequence of topological invariants and homotopic shapes. Element-specific PTHLs are further developed to embed crucial physical, chemical, and biological interactions into topological sequences. TopoFormer surges ahead of conventional algorithms and recent deep learning variants and gives rise to exemplary scoring accuracy and superior performance in ranking, docking, and screening tasks in a number of benchmark datasets. The proposed topological sequences can be extracted from all kinds of structural data in data science to facilitate various NLP models, heralding a new era in AI-driven discovery.

查看原文本刊更多论文

TopoFormer：用于蛋白质配体相互作用预测的多尺度拓扑结构-序列转换器

摘要预训练的深度变换器在众多学科中取得了巨大成功。然而，在计算生物学领域，基本上所有的 Transformer 都是建立在生物序列基础上的，这就忽略了重要的立体化学信息，并可能导致下游预测的关键错误。另一方面，三维（3D）分子结构与 Transformer 的顺序架构以及一般的自然语言处理（NLP）模型不兼容。这项工作通过拓扑变换器（TopoFormer）解决了这一基本挑战。TopoFormer 是通过整合 NLP 和多尺度拓扑技术--持久拓扑超图拉普拉奇（PTHL）而建立的，它能系统地将各种空间尺度上错综复杂的三维蛋白质配体复合体转换成 NLP 可接受的拓扑不变式和同位形状序列。针对特定元素的 PTHL 得到进一步开发，以将关键的物理、化学和生物相互作用嵌入拓扑序列中。TopoFormer 超越了传统算法和最新的深度学习变体，在一些基准数据集的排序、对接和筛选任务中，具有典范的评分准确性和卓越的性能。所提出的拓扑序列可以从数据科学中的各种结构数据中提取出来，为各种 NLP 模型提供便利，预示着人工智能驱动的发现新时代的到来。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Research Square

自引率

0.00%

发文量