{"title":"基于生成对抗网络的小数据集高效利用方法:聚类配置空间研究","authors":"Jintao Xie, Qing Lu, Wensheng Bian","doi":"10.1002/cmtd.202500004","DOIUrl":null,"url":null,"abstract":"<p>Cluster research plays an important role in chemistry. To explore the cluster configuration space, it is usually necessary to run a large-scale search. Yet for some instances, the desired cluster configuration may be counterintuitive, making it challenging to provide a reasonable initial guess to search for the corresponding minimum. There exist several routes to generate cluster configuration. However, they may suffer from the dimension problem or need some other prerequisite. In this work, generative adversarial networks (GANs) are employed to efficiently generate cluster configurations based on small-sized datasets. A dynamic clamp function is introduced during the training. The size of the dataset is controlled to be on the order of magnitude of dozens to hundreds of samples. Furthermore, the convex hull volume and area are established to assist in identifying unique structures. It is found that the proposed GAN architecture is not sensitive to network structures and can be used effectively to generate novel cluster configurations. The introduced dynamic clamp function significantly alleviates the mode collapse problem. Compared to the previous studies, the dataset size is much reduced to avoid large-scale training. Datasets containing fewer than 200 samples already yield satisfactory results. This method performs better than the genetic algorithm and is expected to have a wide range of chemical application scenarios.</p>","PeriodicalId":72562,"journal":{"name":"Chemistry methods : new approaches to solving problems in chemistry","volume":"5 9","pages":""},"PeriodicalIF":3.6000,"publicationDate":"2025-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://chemistry-europe.onlinelibrary.wiley.com/doi/epdf/10.1002/cmtd.202500004","citationCount":"0","resultStr":"{\"title\":\"An Efficient Method Using Small-Sized Datasets Based upon Generative Adversarial Networks: Investigation of Cluster Configuration Space\",\"authors\":\"Jintao Xie, Qing Lu, Wensheng Bian\",\"doi\":\"10.1002/cmtd.202500004\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Cluster research plays an important role in chemistry. To explore the cluster configuration space, it is usually necessary to run a large-scale search. Yet for some instances, the desired cluster configuration may be counterintuitive, making it challenging to provide a reasonable initial guess to search for the corresponding minimum. There exist several routes to generate cluster configuration. However, they may suffer from the dimension problem or need some other prerequisite. In this work, generative adversarial networks (GANs) are employed to efficiently generate cluster configurations based on small-sized datasets. A dynamic clamp function is introduced during the training. The size of the dataset is controlled to be on the order of magnitude of dozens to hundreds of samples. Furthermore, the convex hull volume and area are established to assist in identifying unique structures. It is found that the proposed GAN architecture is not sensitive to network structures and can be used effectively to generate novel cluster configurations. The introduced dynamic clamp function significantly alleviates the mode collapse problem. Compared to the previous studies, the dataset size is much reduced to avoid large-scale training. Datasets containing fewer than 200 samples already yield satisfactory results. This method performs better than the genetic algorithm and is expected to have a wide range of chemical application scenarios.</p>\",\"PeriodicalId\":72562,\"journal\":{\"name\":\"Chemistry methods : new approaches to solving problems in chemistry\",\"volume\":\"5 9\",\"pages\":\"\"},\"PeriodicalIF\":3.6000,\"publicationDate\":\"2025-05-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://chemistry-europe.onlinelibrary.wiley.com/doi/epdf/10.1002/cmtd.202500004\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Chemistry methods : new approaches to solving problems in chemistry\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://chemistry-europe.onlinelibrary.wiley.com/doi/10.1002/cmtd.202500004\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chemistry methods : new approaches to solving problems in chemistry","FirstCategoryId":"1085","ListUrlMain":"https://chemistry-europe.onlinelibrary.wiley.com/doi/10.1002/cmtd.202500004","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
An Efficient Method Using Small-Sized Datasets Based upon Generative Adversarial Networks: Investigation of Cluster Configuration Space
Cluster research plays an important role in chemistry. To explore the cluster configuration space, it is usually necessary to run a large-scale search. Yet for some instances, the desired cluster configuration may be counterintuitive, making it challenging to provide a reasonable initial guess to search for the corresponding minimum. There exist several routes to generate cluster configuration. However, they may suffer from the dimension problem or need some other prerequisite. In this work, generative adversarial networks (GANs) are employed to efficiently generate cluster configurations based on small-sized datasets. A dynamic clamp function is introduced during the training. The size of the dataset is controlled to be on the order of magnitude of dozens to hundreds of samples. Furthermore, the convex hull volume and area are established to assist in identifying unique structures. It is found that the proposed GAN architecture is not sensitive to network structures and can be used effectively to generate novel cluster configurations. The introduced dynamic clamp function significantly alleviates the mode collapse problem. Compared to the previous studies, the dataset size is much reduced to avoid large-scale training. Datasets containing fewer than 200 samples already yield satisfactory results. This method performs better than the genetic algorithm and is expected to have a wide range of chemical application scenarios.