Auxiliary Discrminator Sequence Generative Adversarial Networks (ADSeqGAN) for Few Sample Molecule Generation.

ArXiv Pub Date : 2025-09-11
Haocheng Tang, Jing Long, Beihong Ji, Junmei Wang
{"title":"Auxiliary Discrminator Sequence Generative Adversarial Networks (ADSeqGAN) for Few Sample Molecule Generation.","authors":"Haocheng Tang, Jing Long, Beihong Ji, Junmei Wang","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>In this work, we introduce Auxiliary Discriminator Sequence Generative Adversarial Networks (ADSeqGAN), a novel approach for molecular generation in small-sample datasets. Traditional generative models often struggle with limited training data, particularly in drug discovery, where molecular datasets for specific therapeutic targets, such as nucleic acids binders and central nervous system (CNS) drugs, are scarce. ADSeqGAN addresses this challenge by integrating an auxiliary random forest classifier as an additional discriminator into the GAN framework, significantly improves molecular generation quality and class specificity. Our method incorporates pretrained generator and Wasserstein distance to enhance training stability and diversity. We evaluate ADSeqGAN across three representative cases. First, on nucleic acid- and protein-targeting molecules, ADSeqGAN shows superior capability in generating nucleic acid binders compared to baseline models. Second, through oversampling, it markedly improves CNS drug generation, achieving higher yields than traditional de novo models. Third, in cannabinoid receptor type 1 (CB1) ligand design, ADSeqGAN generates novel druglike molecules, with 32.8\\% predicted actives surpassing hit rates of CB1-focused and general-purpose libraries when assessed by a target-specific LRIP-SF scoring function. Overall, ADSeqGAN offers a versatile framework for molecular design in data-scarce scenarios, with demonstrated applications in nucleic acid binders, CNS drugs, and CB1 ligands.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12440062/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ArXiv","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In this work, we introduce Auxiliary Discriminator Sequence Generative Adversarial Networks (ADSeqGAN), a novel approach for molecular generation in small-sample datasets. Traditional generative models often struggle with limited training data, particularly in drug discovery, where molecular datasets for specific therapeutic targets, such as nucleic acids binders and central nervous system (CNS) drugs, are scarce. ADSeqGAN addresses this challenge by integrating an auxiliary random forest classifier as an additional discriminator into the GAN framework, significantly improves molecular generation quality and class specificity. Our method incorporates pretrained generator and Wasserstein distance to enhance training stability and diversity. We evaluate ADSeqGAN across three representative cases. First, on nucleic acid- and protein-targeting molecules, ADSeqGAN shows superior capability in generating nucleic acid binders compared to baseline models. Second, through oversampling, it markedly improves CNS drug generation, achieving higher yields than traditional de novo models. Third, in cannabinoid receptor type 1 (CB1) ligand design, ADSeqGAN generates novel druglike molecules, with 32.8\% predicted actives surpassing hit rates of CB1-focused and general-purpose libraries when assessed by a target-specific LRIP-SF scoring function. Overall, ADSeqGAN offers a versatile framework for molecular design in data-scarce scenarios, with demonstrated applications in nucleic acid binders, CNS drugs, and CB1 ligands.

基于ADSeqGAN的小样本分子生成辅助鉴别器序列生成对抗网络。
在这项工作中,我们介绍了辅助鉴别器序列生成对抗网络(ADSeqGAN),这是一种在小样本数据集中生成分子的新方法。传统的生成模型经常与有限的训练数据作斗争,特别是在药物发现中,针对特定治疗靶点的分子数据集,如核酸结合物和中枢神经系统(CNS)药物,是稀缺的。ADSeqGAN通过将一个辅助随机森林分类器作为额外的鉴别器集成到GAN框架中来解决这一挑战,显著提高了分子生成质量和类特异性。该方法结合了预训练生成器和Wasserstein距离,增强了训练的稳定性和多样性。我们通过三个代表性案例来评估ADSeqGAN。首先,在核酸和蛋白质靶向分子上,与基线模型相比,ADSeqGAN在生成核酸结合物方面表现出优越的能力。其次,通过过采样,显著提高了CNS药物生成,获得了比传统从头模型更高的产率。第三,在大麻素受体1型(CB1)配体设计中,ADSeqGAN产生了新的药物样分子,当通过目标特异性llip - sf评分功能评估时,其预测活性超过了CB1和通用文库的命中率32.8%。总的来说,ADSeqGAN为数据稀缺的情况下的分子设计提供了一个通用的框架,并在核酸结合剂、中枢神经系统药物和CB1配体中得到了应用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信