TC-DTA：利用变压器和卷积神经网络预测药物与目标的结合亲和力。

IF 4.4 4区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

IEEE Transactions on NanoBioscience Pub Date : 2024-08-12 DOI:10.1109/TNB.2024.3441590

Xiwei Tang;Yiqiang Zhou;Mengyun Yang;Wenjun Li

{"title":"TC-DTA：利用变压器和卷积神经网络预测药物与目标的结合亲和力。","authors":"Xiwei Tang;Yiqiang Zhou;Mengyun Yang;Wenjun Li","doi":"10.1109/TNB.2024.3441590","DOIUrl":null,"url":null,"abstract":"Bioinformatics is a rapidly evolving field that applies computational methods to analyze and interpret biological data. A key task in bioinformatics is identifying novel drug-target interactions (DTIs), which plays a crucial role in drug discovery. Most computational approaches treat DTI prediction as a binary classification problem, determining whether drug-target pairs interact. However, with the growing availability of drug-target binding affinity data, this binary task can be reframed as a regression problem focused on drug-target affinity (DTA). DTA quantifies the strength of drug-target binding, offering more detailed insights than DTI and serving as a valuable tool for virtual screening in drug discovery. Accurately predicting compound interactions with targets can accelerate the drug development process. In this study, we introduce a deep learning model named TC-DTA for DTA prediction, leveraging convolutional neural networks (CNN) and the encoder module of the transformer architecture. We begin by extracting raw drug SMILES strings and protein amino acid sequences from the dataset, which are then represented using various encoding methods. Subsequently, we employ CNN and the transformer’s encoder module to extract features from the drug SMILES strings and protein sequences, respectively. Finally, the feature information is concatenated and input into a multi-layer perceptron to predict binding affinity scores. We evaluated our model on two benchmark DTA datasets, Davis and KIBA, comparing it with methods such as KronRLS, SimBoost, and DeepDTA. Our model, TC-DTA, outperformed these baseline methods based on evaluation metrics like Mean Squared Error (MSE), Concordance Index (CI), and Regression towards the Mean Index (\n<inline-formula> <tex-math>${r}_{m}^{{2}}$ </tex-math></inline-formula>\n). These results highlight the effectiveness of the Transformer’s encoder and CNN in extracting meaningful representations from sequences, thereby enhancing DTA prediction accuracy. This deep learning model can accelerate drug discovery by identifying drug candidates with high binding affinity to specific targets. Compared to traditional methods, machine learning technology offers a more effective and efficient approach to drug discovery.","PeriodicalId":13264,"journal":{"name":"IEEE Transactions on NanoBioscience","volume":"23 4","pages":"572-578"},"PeriodicalIF":4.4000,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"TC-DTA: Predicting Drug-Target Binding Affinity With Transformer and Convolutional Neural Networks\",\"authors\":\"Xiwei Tang;Yiqiang Zhou;Mengyun Yang;Wenjun Li\",\"doi\":\"10.1109/TNB.2024.3441590\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Bioinformatics is a rapidly evolving field that applies computational methods to analyze and interpret biological data. A key task in bioinformatics is identifying novel drug-target interactions (DTIs), which plays a crucial role in drug discovery. Most computational approaches treat DTI prediction as a binary classification problem, determining whether drug-target pairs interact. However, with the growing availability of drug-target binding affinity data, this binary task can be reframed as a regression problem focused on drug-target affinity (DTA). DTA quantifies the strength of drug-target binding, offering more detailed insights than DTI and serving as a valuable tool for virtual screening in drug discovery. Accurately predicting compound interactions with targets can accelerate the drug development process. In this study, we introduce a deep learning model named TC-DTA for DTA prediction, leveraging convolutional neural networks (CNN) and the encoder module of the transformer architecture. We begin by extracting raw drug SMILES strings and protein amino acid sequences from the dataset, which are then represented using various encoding methods. Subsequently, we employ CNN and the transformer’s encoder module to extract features from the drug SMILES strings and protein sequences, respectively. Finally, the feature information is concatenated and input into a multi-layer perceptron to predict binding affinity scores. We evaluated our model on two benchmark DTA datasets, Davis and KIBA, comparing it with methods such as KronRLS, SimBoost, and DeepDTA. Our model, TC-DTA, outperformed these baseline methods based on evaluation metrics like Mean Squared Error (MSE), Concordance Index (CI), and Regression towards the Mean Index (\\n<inline-formula> <tex-math>${r}_{m}^{{2}}$ </tex-math></inline-formula>\\n). These results highlight the effectiveness of the Transformer’s encoder and CNN in extracting meaningful representations from sequences, thereby enhancing DTA prediction accuracy. This deep learning model can accelerate drug discovery by identifying drug candidates with high binding affinity to specific targets. Compared to traditional methods, machine learning technology offers a more effective and efficient approach to drug discovery.\",\"PeriodicalId\":13264,\"journal\":{\"name\":\"IEEE Transactions on NanoBioscience\",\"volume\":\"23 4\",\"pages\":\"572-578\"},\"PeriodicalIF\":4.4000,\"publicationDate\":\"2024-08-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on NanoBioscience\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10633780/\",\"RegionNum\":4,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on NanoBioscience","FirstCategoryId":"99","ListUrlMain":"https://ieeexplore.ieee.org/document/10633780/","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

摘要

生物信息学是一个发展迅速的领域，涉及应用计算方法分析和解读生物数据。生物信息学的一项重要任务是识别新的药物-靶点相互作用（DTI），这也是药物发现过程的重要组成部分。大多数预测 DTI 的计算方法都将其视为一项二元分类任务，即预测药物靶标对之间是否存在相互作用。近年来，随着药物-靶点结合亲和力数据量的不断增加，这种二元分类任务可以转化为药物-靶点亲和力（DTA）的回归任务，DTA 反映了药物-靶点的结合程度，能提供比 DTI 更详细、更具体的信息，成为虚拟筛选药物发现的重要工具。有效预测化合物与靶点的相互作用有助于加快药物发现过程。在本研究中，我们利用卷积神经网络（CNN）和变压器架构的编码器模块，提出了一种名为 TC-DTA 的深度学习模型，用于预测 DTA。首先，从数据集中提取原始药物 SMILES 字符串和蛋白质氨基酸序列。然后使用不同的编码方法对其进行表示。然后，我们使用 CNN 和变换器的编码器模块分别从药物 SMILES 字符串和蛋白质氨基酸序列中提取特征信息。最后，将获得的特征信息串联起来并输入多层感知器，以预测结合亲和力得分。我们在戴维斯和 KIBA 这两个基准 DTA 数据集上评估了我们的模型，并与 KronRLS、SimBoost 和 DeepDTA 等方法进行了对比。在平均平方误差、一致性指数和 r2m 指数等评估指标上，TC-DTA 均优于这些基准方法。这些结果证明了 Transformer 编码器和 CNN 从序列中提取有意义表征的有效性，从而提高了 DTA 预测的准确性。用于 DTA 预测的深度学习模型可以通过识别与特定靶点具有高结合亲和力的候选药物来加速药物发现。与传统方法相比，使用机器学习技术可以实现更有效、更高效的药物发现过程。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

TC-DTA: Predicting Drug-Target Binding Affinity With Transformer and Convolutional Neural Networks

Bioinformatics is a rapidly evolving field that applies computational methods to analyze and interpret biological data. A key task in bioinformatics is identifying novel drug-target interactions (DTIs), which plays a crucial role in drug discovery. Most computational approaches treat DTI prediction as a binary classification problem, determining whether drug-target pairs interact. However, with the growing availability of drug-target binding affinity data, this binary task can be reframed as a regression problem focused on drug-target affinity (DTA). DTA quantifies the strength of drug-target binding, offering more detailed insights than DTI and serving as a valuable tool for virtual screening in drug discovery. Accurately predicting compound interactions with targets can accelerate the drug development process. In this study, we introduce a deep learning model named TC-DTA for DTA prediction, leveraging convolutional neural networks (CNN) and the encoder module of the transformer architecture. We begin by extracting raw drug SMILES strings and protein amino acid sequences from the dataset, which are then represented using various encoding methods. Subsequently, we employ CNN and the transformer’s encoder module to extract features from the drug SMILES strings and protein sequences, respectively. Finally, the feature information is concatenated and input into a multi-layer perceptron to predict binding affinity scores. We evaluated our model on two benchmark DTA datasets, Davis and KIBA, comparing it with methods such as KronRLS, SimBoost, and DeepDTA. Our model, TC-DTA, outperformed these baseline methods based on evaluation metrics like Mean Squared Error (MSE), Concordance Index (CI), and Regression towards the Mean Index (

${r}_{m}^{{2}}$

). These results highlight the effectiveness of the Transformer’s encoder and CNN in extracting meaningful representations from sequences, thereby enhancing DTA prediction accuracy. This deep learning model can accelerate drug discovery by identifying drug candidates with high binding affinity to specific targets. Compared to traditional methods, machine learning technology offers a more effective and efficient approach to drug discovery.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on NanoBioscience 工程技术-纳米科技

CiteScore

7.00

自引率

5.10%

发文量

197

审稿时长

>12 weeks

期刊介绍： The IEEE Transactions on NanoBioscience reports on original, innovative and interdisciplinary work on all aspects of molecular systems, cellular systems, and tissues (including molecular electronics). Topics covered in the journal focus on a broad spectrum of aspects, both on foundations and on applications. Specifically, methods and techniques, experimental aspects, design and implementation, instrumentation and laboratory equipment, clinical aspects, hardware and software data acquisition and analysis and computer based modelling are covered (based on traditional or high performance computing - parallel computers or computer networks).