An Efficient Method Using Small-Sized Datasets Based upon Generative Adversarial Networks: Investigation of Cluster Configuration Space

IF 3.6 Q1 CHEMISTRY, MULTIDISCIPLINARY
Jintao Xie, Qing Lu, Wensheng Bian
{"title":"An Efficient Method Using Small-Sized Datasets Based upon Generative Adversarial Networks: Investigation of Cluster Configuration Space","authors":"Jintao Xie,&nbsp;Qing Lu,&nbsp;Wensheng Bian","doi":"10.1002/cmtd.202500004","DOIUrl":null,"url":null,"abstract":"<p>Cluster research plays an important role in chemistry. To explore the cluster configuration space, it is usually necessary to run a large-scale search. Yet for some instances, the desired cluster configuration may be counterintuitive, making it challenging to provide a reasonable initial guess to search for the corresponding minimum. There exist several routes to generate cluster configuration. However, they may suffer from the dimension problem or need some other prerequisite. In this work, generative adversarial networks (GANs) are employed to efficiently generate cluster configurations based on small-sized datasets. A dynamic clamp function is introduced during the training. The size of the dataset is controlled to be on the order of magnitude of dozens to hundreds of samples. Furthermore, the convex hull volume and area are established to assist in identifying unique structures. It is found that the proposed GAN architecture is not sensitive to network structures and can be used effectively to generate novel cluster configurations. The introduced dynamic clamp function significantly alleviates the mode collapse problem. Compared to the previous studies, the dataset size is much reduced to avoid large-scale training. Datasets containing fewer than 200 samples already yield satisfactory results. This method performs better than the genetic algorithm and is expected to have a wide range of chemical application scenarios.</p>","PeriodicalId":72562,"journal":{"name":"Chemistry methods : new approaches to solving problems in chemistry","volume":"5 9","pages":""},"PeriodicalIF":3.6000,"publicationDate":"2025-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://chemistry-europe.onlinelibrary.wiley.com/doi/epdf/10.1002/cmtd.202500004","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chemistry methods : new approaches to solving problems in chemistry","FirstCategoryId":"1085","ListUrlMain":"https://chemistry-europe.onlinelibrary.wiley.com/doi/10.1002/cmtd.202500004","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Cluster research plays an important role in chemistry. To explore the cluster configuration space, it is usually necessary to run a large-scale search. Yet for some instances, the desired cluster configuration may be counterintuitive, making it challenging to provide a reasonable initial guess to search for the corresponding minimum. There exist several routes to generate cluster configuration. However, they may suffer from the dimension problem or need some other prerequisite. In this work, generative adversarial networks (GANs) are employed to efficiently generate cluster configurations based on small-sized datasets. A dynamic clamp function is introduced during the training. The size of the dataset is controlled to be on the order of magnitude of dozens to hundreds of samples. Furthermore, the convex hull volume and area are established to assist in identifying unique structures. It is found that the proposed GAN architecture is not sensitive to network structures and can be used effectively to generate novel cluster configurations. The introduced dynamic clamp function significantly alleviates the mode collapse problem. Compared to the previous studies, the dataset size is much reduced to avoid large-scale training. Datasets containing fewer than 200 samples already yield satisfactory results. This method performs better than the genetic algorithm and is expected to have a wide range of chemical application scenarios.

Abstract Image

Abstract Image

Abstract Image

Abstract Image

基于生成对抗网络的小数据集高效利用方法:聚类配置空间研究
聚类研究在化学研究中占有重要地位。为了探索集群配置空间,通常需要运行大规模搜索。然而,对于某些实例,期望的集群配置可能是违反直觉的,因此很难提供合理的初始猜测来搜索相应的最小值。存在多条生成集群配置的路由。但是,它们可能会受到维度问题的困扰,或者需要一些其他先决条件。在这项工作中,生成对抗网络(gan)被用于有效地生成基于小型数据集的聚类配置。在训练过程中引入了动态夹紧函数。数据集的大小被控制在几十到几百个样本的数量级上。此外,建立凸壳体积和面积以协助识别独特结构。研究发现,该GAN结构对网络结构不敏感,可以有效地生成新的簇结构。引入的动态箝位函数显著缓解了模态崩溃问题。与以往的研究相比,数据集的大小大大减少,避免了大规模的训练。包含少于200个样本的数据集已经产生了令人满意的结果。该方法的性能优于遗传算法,有望具有广泛的化学应用场景。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
7.30
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信