Ioannis Iordanis, Christos Koukouvinos, Iliana Silou
{"title":"On the efficacy of conditioned and progressive Latin hypercube sampling in supervised machine learning","authors":"Ioannis Iordanis, Christos Koukouvinos, Iliana Silou","doi":"10.1016/j.apnum.2023.12.016","DOIUrl":null,"url":null,"abstract":"<div><div><span>In this paper, Latin Hypercube Sampling<span> (LHS) method is compared as per its effectiveness in supervised machine learning procedures. Employing LHS saves computer's processing time and in conjunction with Latin hypercube design properties and space filling ability, is considered as one of the most advanced mechanisms in terms of sampling. Although more data usually deliver better results, when using LHS techniques, same quality outputs can be produced with less data and, as a result, </span></span>storage cost<span> and training time are reduced. Conditioned Latin Hypercube Sampling (cLHS) is one of those techniques, successfully performing in supervised machine learning tasks. Unfortunately, the minimum sufficient training dataset size cannot be known in advance. In this case, progressive sampling is recommended since it begins with a small sample and progressively increases its size until model accuracy no longer improves. Combining Latin hypercube sampling and the idea of sequentially incrementing sampling, we test Progressive Latin Hypercube Sampling (PLHS) while monitoring the performance of the sampling-based training as the sample size grows. PLHS and cLHS algorithms are applied in datasets with discrete variables securing that each sample is provided with the Latin hypercube design properties and preserves the principal ability of LHS for space filling, as illustrated in respective sample projecting diagrams. The performance of the above LHS methods in supervised machine learning is evaluated by the degree of training of the model, which is certified through the accuracy of the produced confusion matrices in test files. The results from the use of the above Latin Hypercube Sampling techniques compared against benchmark sampling method empirically prove that machine learning training process becomes less costfull, while remaining reliable.</span></div></div>","PeriodicalId":8199,"journal":{"name":"Applied Numerical Mathematics","volume":"208 ","pages":"Pages 256-270"},"PeriodicalIF":2.2000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Numerical Mathematics","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0168927423003240","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICS, APPLIED","Score":null,"Total":0}
引用次数: 0
Abstract
In this paper, Latin Hypercube Sampling (LHS) method is compared as per its effectiveness in supervised machine learning procedures. Employing LHS saves computer's processing time and in conjunction with Latin hypercube design properties and space filling ability, is considered as one of the most advanced mechanisms in terms of sampling. Although more data usually deliver better results, when using LHS techniques, same quality outputs can be produced with less data and, as a result, storage cost and training time are reduced. Conditioned Latin Hypercube Sampling (cLHS) is one of those techniques, successfully performing in supervised machine learning tasks. Unfortunately, the minimum sufficient training dataset size cannot be known in advance. In this case, progressive sampling is recommended since it begins with a small sample and progressively increases its size until model accuracy no longer improves. Combining Latin hypercube sampling and the idea of sequentially incrementing sampling, we test Progressive Latin Hypercube Sampling (PLHS) while monitoring the performance of the sampling-based training as the sample size grows. PLHS and cLHS algorithms are applied in datasets with discrete variables securing that each sample is provided with the Latin hypercube design properties and preserves the principal ability of LHS for space filling, as illustrated in respective sample projecting diagrams. The performance of the above LHS methods in supervised machine learning is evaluated by the degree of training of the model, which is certified through the accuracy of the produced confusion matrices in test files. The results from the use of the above Latin Hypercube Sampling techniques compared against benchmark sampling method empirically prove that machine learning training process becomes less costfull, while remaining reliable.
期刊介绍:
The purpose of the journal is to provide a forum for the publication of high quality research and tutorial papers in computational mathematics. In addition to the traditional issues and problems in numerical analysis, the journal also publishes papers describing relevant applications in such fields as physics, fluid dynamics, engineering and other branches of applied science with a computational mathematics component. The journal strives to be flexible in the type of papers it publishes and their format. Equally desirable are:
(i) Full papers, which should be complete and relatively self-contained original contributions with an introduction that can be understood by the broad computational mathematics community. Both rigorous and heuristic styles are acceptable. Of particular interest are papers about new areas of research, in which other than strictly mathematical arguments may be important in establishing a basis for further developments.
(ii) Tutorial review papers, covering some of the important issues in Numerical Mathematics, Scientific Computing and their Applications. The journal will occasionally publish contributions which are larger than the usual format for regular papers.
(iii) Short notes, which present specific new results and techniques in a brief communication.