PREVAIL: Pre-trained Variational Adversarial Active Learning for Molecular Property Prediction

2022 IEEE 8th International Conference on Cloud Computing and Intelligent Systems (CCIS) Pub Date : 2022-11-26 DOI:10.1109/CCIS57298.2022.10016422

Linjie Li, Yi Xiao, Dewei Ma, Kai Zheng

{"title":"PREVAIL: Pre-trained Variational Adversarial Active Learning for Molecular Property Prediction","authors":"Linjie Li, Yi Xiao, Dewei Ma, Kai Zheng","doi":"10.1109/CCIS57298.2022.10016422","DOIUrl":null,"url":null,"abstract":"Molecular property prediction is a fundamental task in drug discovery. The majority of the high-performing molecular property prediction methods currently were developed using deep learning techniques, which rely on massive labeled data. However, accurate molecular property annotation is time-consuming and expensive. Due to the fact that different samples usually have unequal importance in model training, we propose a pre-trained variational adversarial active learning, PREVAIL for short, to query the most informative samples to be annotated to reduce the annotation cost. Specifically, different from previous active learning whose initial set is sampled randomly, PREVAIL selects the most informative initial dataset by an autoencoder and K-Center greedy algorithm, which can avoid biases that affect the accuracy of the early decision-making process. Furthermore, PREVAIL simultaneously adapts the distribution of molecules and the information of the prediction task by incorporating the loss information of the molecular property prediction task into the latent space using task-aware variational adversarial active learning. Our benchmark experiments demonstrate that PREVAIL outperforms state-of-the-art active learning methods on molecular property prediction tasks.","PeriodicalId":374660,"journal":{"name":"2022 IEEE 8th International Conference on Cloud Computing and Intelligent Systems (CCIS)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 8th International Conference on Cloud Computing and Intelligent Systems (CCIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCIS57298.2022.10016422","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Molecular property prediction is a fundamental task in drug discovery. The majority of the high-performing molecular property prediction methods currently were developed using deep learning techniques, which rely on massive labeled data. However, accurate molecular property annotation is time-consuming and expensive. Due to the fact that different samples usually have unequal importance in model training, we propose a pre-trained variational adversarial active learning, PREVAIL for short, to query the most informative samples to be annotated to reduce the annotation cost. Specifically, different from previous active learning whose initial set is sampled randomly, PREVAIL selects the most informative initial dataset by an autoencoder and K-Center greedy algorithm, which can avoid biases that affect the accuracy of the early decision-making process. Furthermore, PREVAIL simultaneously adapts the distribution of molecules and the information of the prediction task by incorporating the loss information of the molecular property prediction task into the latent space using task-aware variational adversarial active learning. Our benchmark experiments demonstrate that PREVAIL outperforms state-of-the-art active learning methods on molecular property prediction tasks.

查看原文本刊更多论文

占上风:分子性质预测的预训练变分对抗主动学习

分子性质预测是药物发现的一项基础性工作。目前，大多数高性能的分子性质预测方法都是使用深度学习技术开发的，这种技术依赖于大量的标记数据。然而，精确的分子性质标注既耗时又昂贵。由于不同样本在模型训练中的重要性通常不相等，我们提出了一种预训练变分对抗主动学习(pretrained variational adversarial active learning，简称precpreci)来查询需要标注的信息量最大的样本，以降低标注成本。具体而言，与以往主动学习的初始集随机采样不同，该算法通过自编码器和K-Center贪婪算法选择信息量最大的初始数据集，避免了影响早期决策过程准确性的偏差。此外，通过使用任务感知变分对抗主动学习将分子性质预测任务的损失信息纳入潜在空间，同时适应分子的分布和预测任务的信息。我们的基准实验表明，在分子性质预测任务上，prevai优于最先进的主动学习方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE 8th International Conference on Cloud Computing and Intelligent Systems (CCIS)

自引率

0.00%

发文量