Improvement of multi-task learning by data enrichment: application for drug discovery

IF 3 3区生物学 Q3 BIOCHEMISTRY & MOLECULAR BIOLOGY

Journal of Computer-Aided Molecular Design Pub Date : 2023-03-21 DOI:10.1007/s10822-023-00500-w

Ekaterina A. Sosnina, Sergey Sosnin, Maxim V. Fedorov

{"title":"Improvement of multi-task learning by data enrichment: application for drug discovery","authors":"Ekaterina A. Sosnina, Sergey Sosnin, Maxim V. Fedorov","doi":"10.1007/s10822-023-00500-w","DOIUrl":null,"url":null,"abstract":"<div><p>Multi-task learning in deep neural networks has become a topic of growing importance in many research fields, including drug discovery. However, applying multi-task learning poses new challenges in improving prediction performance. This study investigated the potential of training data enrichment to enhance multi-task model prediction quality in drug discovery. The study evaluated four scenarios with varying degrees of information capacity of the training data and applied two types of test data to evaluate prediction performance. We used three datasets: ViralChEMBL, which consisted of binary activities of compounds against viral species, was applied for the classification task; pQSAR(159) and pQSAR(4267), which consisted of bio-activities of compounds and assays from the research of the profile-QSAR method, were applied for regression tasks. We built multi-task models based on the feed-forward DNNs using the PyTorch framework. Our findings showed that training data enrichment could be an effective means of enhancing prediction performance in multi-task learning, but the degree of improvement depends on the quality of the training data. The more unique compounds and targets the training data included, the more new compound-target interactions are required for prediction improvement. Also, we found out that even using multi-task learning, one could not predict the interactions of compounds that are highly dissimilar from those used for model training. The study provides some recommendations for effectively employing multi-task learning in drug discovery to improve prediction accuracy and facilitate the discovery of novel drug candidates.</p></div>","PeriodicalId":621,"journal":{"name":"Journal of Computer-Aided Molecular Design","volume":"37 4","pages":"183 - 200"},"PeriodicalIF":3.0000,"publicationDate":"2023-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computer-Aided Molecular Design","FirstCategoryId":"99","ListUrlMain":"https://link.springer.com/article/10.1007/s10822-023-00500-w","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}

引用次数: 4

Abstract

Multi-task learning in deep neural networks has become a topic of growing importance in many research fields, including drug discovery. However, applying multi-task learning poses new challenges in improving prediction performance. This study investigated the potential of training data enrichment to enhance multi-task model prediction quality in drug discovery. The study evaluated four scenarios with varying degrees of information capacity of the training data and applied two types of test data to evaluate prediction performance. We used three datasets: ViralChEMBL, which consisted of binary activities of compounds against viral species, was applied for the classification task; pQSAR(159) and pQSAR(4267), which consisted of bio-activities of compounds and assays from the research of the profile-QSAR method, were applied for regression tasks. We built multi-task models based on the feed-forward DNNs using the PyTorch framework. Our findings showed that training data enrichment could be an effective means of enhancing prediction performance in multi-task learning, but the degree of improvement depends on the quality of the training data. The more unique compounds and targets the training data included, the more new compound-target interactions are required for prediction improvement. Also, we found out that even using multi-task learning, one could not predict the interactions of compounds that are highly dissimilar from those used for model training. The study provides some recommendations for effectively employing multi-task learning in drug discovery to improve prediction accuracy and facilitate the discovery of novel drug candidates.

Abstract Image

查看原文本刊更多论文

通过数据充实改进多任务学习:在药物发现中的应用

深度神经网络中的多任务学习已经成为许多研究领域中越来越重要的课题，包括药物发现。然而，应用多任务学习对提高预测性能提出了新的挑战。本研究探讨了训练数据丰富在药物发现中提高多任务模型预测质量的潜力。研究评估了训练数据信息容量不同程度的四种场景，并应用两种类型的测试数据来评估预测性能。我们使用了三个数据集:ViralChEMBL是由化合物对病毒种的二元活性组成的，用于分类任务;pQSAR(159)和pQSAR(4267)由化合物的生物活性和谱图qsar法研究的结果组成，用于回归任务。我们使用PyTorch框架基于前馈dnn构建了多任务模型。研究结果表明，训练数据丰富是提高多任务学习预测性能的有效手段，但提高的程度取决于训练数据的质量。训练数据中包含的独特化合物和靶点越多，就需要更多新的化合物-靶点相互作用来改进预测。此外，我们发现，即使使用多任务学习，也无法预测与模型训练中使用的化合物高度不同的化合物之间的相互作用。该研究为有效地利用多任务学习进行药物发现提供了一些建议，以提高预测精度，促进新的候选药物的发现。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Computer-Aided Molecular Design 生物-计算机：跨学科应用

CiteScore

8.00

自引率

8.60%

发文量

审稿时长

3 months

期刊介绍： The Journal of Computer-Aided Molecular Design provides a form for disseminating information on both the theory and the application of computer-based methods in the analysis and design of molecules. The scope of the journal encompasses papers which report new and original research and applications in the following areas: - theoretical chemistry; - computational chemistry; - computer and molecular graphics; - molecular modeling; - protein engineering; - drug design; - expert systems; - general structure-property relationships; - molecular dynamics; - chemical database development and usage.