Quantifying the Hardness of Bioactivity Prediction Tasks for Transfer Learning

IF 5.3 2区 化学 Q1 CHEMISTRY, MEDICINAL
Hosein Fooladi, Steffen Hirte and Johannes Kirchmair*, 
{"title":"Quantifying the Hardness of Bioactivity Prediction Tasks for Transfer Learning","authors":"Hosein Fooladi,&nbsp;Steffen Hirte and Johannes Kirchmair*,&nbsp;","doi":"10.1021/acs.jcim.4c00160","DOIUrl":null,"url":null,"abstract":"<p >Today, machine learning methods are widely employed in drug discovery. However, the chronic lack of data continues to hamper their further development, validation, and application. Several modern strategies aim to mitigate the challenges associated with data scarcity by learning from data on related tasks. These knowledge-sharing approaches encompass transfer learning, multitask learning, and meta-learning. A key question remaining to be answered for these approaches is about the extent to which their performance can benefit from the relatedness of available source (training) tasks; in other words, how difficult (“hard”) a test task is to a model, given the available source tasks. This study introduces a new method for quantifying and predicting the hardness of a bioactivity prediction task based on its relation to the available training tasks. The approach involves the generation of protein and chemical representations and the calculation of distances between the bioactivity prediction task and the available training tasks. In the example of meta-learning on the FS-Mol data set, we demonstrate that the proposed task hardness metric is inversely correlated with performance (Pearson’s correlation coefficient <i>r</i> = −0.72). The metric will be useful in estimating the task-specific gain in performance that can be achieved through meta-learning.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"64 10","pages":"4031–4046"},"PeriodicalIF":5.3000,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.acs.org/doi/epdf/10.1021/acs.jcim.4c00160","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemical Information and Modeling ","FirstCategoryId":"92","ListUrlMain":"https://pubs.acs.org/doi/10.1021/acs.jcim.4c00160","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}
引用次数: 0

Abstract

Today, machine learning methods are widely employed in drug discovery. However, the chronic lack of data continues to hamper their further development, validation, and application. Several modern strategies aim to mitigate the challenges associated with data scarcity by learning from data on related tasks. These knowledge-sharing approaches encompass transfer learning, multitask learning, and meta-learning. A key question remaining to be answered for these approaches is about the extent to which their performance can benefit from the relatedness of available source (training) tasks; in other words, how difficult (“hard”) a test task is to a model, given the available source tasks. This study introduces a new method for quantifying and predicting the hardness of a bioactivity prediction task based on its relation to the available training tasks. The approach involves the generation of protein and chemical representations and the calculation of distances between the bioactivity prediction task and the available training tasks. In the example of meta-learning on the FS-Mol data set, we demonstrate that the proposed task hardness metric is inversely correlated with performance (Pearson’s correlation coefficient r = −0.72). The metric will be useful in estimating the task-specific gain in performance that can be achieved through meta-learning.

Abstract Image

Abstract Image

量化生物活性预测任务的难度,促进迁移学习
如今,机器学习方法已广泛应用于药物发现领域。然而,数据的长期匮乏继续阻碍着这些方法的进一步开发、验证和应用。有几种现代策略旨在通过学习相关任务的数据来缓解数据匮乏带来的挑战。这些知识共享方法包括迁移学习、多任务学习和元学习。这些方法有待回答的一个关键问题是,它们的性能能在多大程度上受益于可用源(训练)任务的相关性;换句话说,在可用源任务的情况下,测试任务对模型来说有多难("难")。本研究介绍了一种新方法,可根据生物活性预测任务与可用训练任务之间的关系来量化和预测生物活性预测任务的难度。该方法包括生成蛋白质和化学表征,以及计算生物活性预测任务与可用训练任务之间的距离。在 FS-Mol 数据集的元学习示例中,我们证明了所提出的任务硬度指标与性能成反比(皮尔逊相关系数 r = -0.72)。该指标有助于估算通过元学习获得的特定任务性能增益。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
9.80
自引率
10.70%
发文量
529
审稿时长
1.4 months
期刊介绍: The Journal of Chemical Information and Modeling publishes papers reporting new methodology and/or important applications in the fields of chemical informatics and molecular modeling. Specific topics include the representation and computer-based searching of chemical databases, molecular modeling, computer-aided molecular design of new materials, catalysts, or ligands, development of new computational methods or efficient algorithms for chemical software, and biopharmaceutical chemistry including analyses of biological activity and other issues related to drug discovery. Astute chemists, computer scientists, and information specialists look to this monthly’s insightful research studies, programming innovations, and software reviews to keep current with advances in this integral, multidisciplinary field. As a subscriber you’ll stay abreast of database search systems, use of graph theory in chemical problems, substructure search systems, pattern recognition and clustering, analysis of chemical and physical data, molecular modeling, graphics and natural language interfaces, bibliometric and citation analysis, and synthesis design and reactions databases.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信