小数据集虚拟样本生成的自适应哈密顿电路

IF 3.7 3区计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Journal of Computational Science Pub Date : 2025-08-22 DOI:10.1016/j.jocs.2025.102711

Totok Sutojo , Supriadi Rustad , Muhamad Akrom , Wahyu Aji Eko Prabowo , De Rosal Ignatius Moses Setiadi , Hermawan Kresno Dipojono , Yoshitada Morikawa

{"title":"小数据集虚拟样本生成的自适应哈密顿电路","authors":"Totok Sutojo , Supriadi Rustad , Muhamad Akrom , Wahyu Aji Eko Prabowo , De Rosal Ignatius Moses Setiadi , Hermawan Kresno Dipojono , Yoshitada Morikawa","doi":"10.1016/j.jocs.2025.102711","DOIUrl":null,"url":null,"abstract":"<div><div>Small datasets often lead to poor performance of data-driven prediction models due to uneven data distribution and large data spacing. One popular approach to address this issue is to use virtual samples during machine learning (ML) model training. This study proposes a Hamiltonian Circuit Virtual Sample Generation (HCVSG) method to distribute virtual samples generated using interpolation techniques while integrating the K-Nearest Neighbors (KNN) algorithm in model development. The Hamiltonian circuit is chosen because it doesn’t depend on the distribution assumption and provides multiple circuits that allow adaptive sample distribution, allowing the selection of circuits that produce minimum errors. This method supports improving feature-target correlation, reducing the risk of overfitting, and stabilizing error values as model complexity increases. Applying this method to three datasets in material research (MLCC, PSH, and EFD) shows that HCVSG significantly improves prediction accuracy compared to conventional KNN and eight MTD-based methods. The distribution of virtual samples along the Hamiltonian circuit helps fill the information gap and makes the data distribution more even, ultimately improving the predictive model's performance.</div></div>","PeriodicalId":48907,"journal":{"name":"Journal of Computational Science","volume":"92 ","pages":"Article 102711"},"PeriodicalIF":3.7000,"publicationDate":"2025-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An adaptive Hamiltonian circuit of virtual sample generation for a small dataset\",\"authors\":\"Totok Sutojo , Supriadi Rustad , Muhamad Akrom , Wahyu Aji Eko Prabowo , De Rosal Ignatius Moses Setiadi , Hermawan Kresno Dipojono , Yoshitada Morikawa\",\"doi\":\"10.1016/j.jocs.2025.102711\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Small datasets often lead to poor performance of data-driven prediction models due to uneven data distribution and large data spacing. One popular approach to address this issue is to use virtual samples during machine learning (ML) model training. This study proposes a Hamiltonian Circuit Virtual Sample Generation (HCVSG) method to distribute virtual samples generated using interpolation techniques while integrating the K-Nearest Neighbors (KNN) algorithm in model development. The Hamiltonian circuit is chosen because it doesn’t depend on the distribution assumption and provides multiple circuits that allow adaptive sample distribution, allowing the selection of circuits that produce minimum errors. This method supports improving feature-target correlation, reducing the risk of overfitting, and stabilizing error values as model complexity increases. Applying this method to three datasets in material research (MLCC, PSH, and EFD) shows that HCVSG significantly improves prediction accuracy compared to conventional KNN and eight MTD-based methods. The distribution of virtual samples along the Hamiltonian circuit helps fill the information gap and makes the data distribution more even, ultimately improving the predictive model's performance.</div></div>\",\"PeriodicalId\":48907,\"journal\":{\"name\":\"Journal of Computational Science\",\"volume\":\"92 \",\"pages\":\"Article 102711\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2025-08-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Computational Science\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1877750325001887\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computational Science","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1877750325001887","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

摘要

数据集小，数据分布不均匀，数据间距大，往往会导致数据驱动预测模型性能不佳。解决这个问题的一个流行方法是在机器学习（ML）模型训练期间使用虚拟样本。本研究提出了一种哈密顿电路虚拟样本生成（HCVSG）方法来分配使用插值技术生成的虚拟样本，同时在模型开发中集成k -最近邻（KNN）算法。选择哈密顿电路是因为它不依赖于分布假设，并且提供了允许自适应样本分布的多个电路，允许选择产生最小误差的电路。该方法可以提高特征与目标的相关性，降低过拟合的风险，并随着模型复杂性的增加而稳定误差值。将该方法应用于材料研究中的3个数据集（MLCC、PSH和EFD）表明，与传统KNN和8种基于mtd的方法相比，HCVSG显著提高了预测精度。虚拟样本沿哈密顿电路的分布有助于填补信息缺口，使数据分布更加均匀，最终提高预测模型的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An adaptive Hamiltonian circuit of virtual sample generation for a small dataset

Small datasets often lead to poor performance of data-driven prediction models due to uneven data distribution and large data spacing. One popular approach to address this issue is to use virtual samples during machine learning (ML) model training. This study proposes a Hamiltonian Circuit Virtual Sample Generation (HCVSG) method to distribute virtual samples generated using interpolation techniques while integrating the K-Nearest Neighbors (KNN) algorithm in model development. The Hamiltonian circuit is chosen because it doesn’t depend on the distribution assumption and provides multiple circuits that allow adaptive sample distribution, allowing the selection of circuits that produce minimum errors. This method supports improving feature-target correlation, reducing the risk of overfitting, and stabilizing error values as model complexity increases. Applying this method to three datasets in material research (MLCC, PSH, and EFD) shows that HCVSG significantly improves prediction accuracy compared to conventional KNN and eight MTD-based methods. The distribution of virtual samples along the Hamiltonian circuit helps fill the information gap and makes the data distribution more even, ultimately improving the predictive model's performance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Computational Science COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS-COMPUTER SCIENCE, THEORY & METHODS

CiteScore

5.50

自引率

3.00%

发文量

227

审稿时长

41 days

期刊介绍： Computational Science is a rapidly growing multi- and interdisciplinary field that uses advanced computing and data analysis to understand and solve complex problems. It has reached a level of predictive capability that now firmly complements the traditional pillars of experimentation and theory. The recent advances in experimental techniques such as detectors, on-line sensor networks and high-resolution imaging techniques, have opened up new windows into physical and biological processes at many levels of detail. The resulting data explosion allows for detailed data driven modeling and simulation. This new discipline in science combines computational thinking, modern computational methods, devices and collateral technologies to address problems far beyond the scope of traditional numerical methods. Computational science typically unifies three distinct elements: • Modeling, Algorithms and Simulations (e.g. numerical and non-numerical, discrete and continuous); • Software developed to solve science (e.g., biological, physical, and social), engineering, medicine, and humanities problems; • Computer and information science that develops and optimizes the advanced system hardware, software, networking, and data management components (e.g. problem solving environments).