Empirical Approach to Sample Size Estimation for Testing of AI Algorithms

IF 0.6 4区数学 Q3 MATHEMATICS

Doklady Mathematics Pub Date : 2025-03-22 DOI:10.1134/S1064562424602063

M. R. Kodenko, T. M. Bobrovskaya, R. V. Reshetnikov, K. M. Arzamasov, A. V. Vladzymyrskyy, O. V. Omelyanskaya, Yu. A. Vasilev

{"title":"Empirical Approach to Sample Size Estimation for Testing of AI Algorithms","authors":"M. R. Kodenko, T. M. Bobrovskaya, R. V. Reshetnikov, K. M. Arzamasov, A. V. Vladzymyrskyy, O. V. Omelyanskaya, Yu. A. Vasilev","doi":"10.1134/S1064562424602063","DOIUrl":null,"url":null,"abstract":"<p>Calculation of sample size is one of the basic tasks in the field of correct and objective testing of artificial intelligence (AI) algorithms. Existing approaches, despite their exhaustive theoretical justification, can give results that differ by an order of magnitude under the same initial conditions. Most of the input parameters for such methods are determined by the researcher intuitively or on the basis of relevant literature data in the subject area. Such uncertainty at the research planning stage is associated with a high risk of obtaining biased results, which is especially important to take into account when using AI algorithms for medical diagnosis. Within the framework of this work, an empirical study of the value of the minimum required sample size of radiology diagnostic studies to obtain an objective value of the AUROC metric was conducted. An algorithm for calculating the threshold value of sample size according to the criterion of no statistically significant changes in the metric value in case of increasing this size was developed and implemented in software format. Using datasets containing the results of testing of AI algorithms on mammographic and radiographic studies with the total volume of more than 300 thousand, the empirical threshold for the sample size from 30 to 25 thousand studies with different relative content of pathology—from 10 to 90%—was calculated. The proposed algorithm allows obtaining results invariant to the balance of classes in the sample, the target value of AUROC, the modality of studies, and the AI algorithm. The empirical value of the minimum sufficient sample size for testing the AI algorithm for binary classification, obtained by analyzing over 2 million estimated values, is 400 studies. The results can be used to solve the problems of development and testing of diagnostic tools, including AI algorithms.</p>","PeriodicalId":531,"journal":{"name":"Doklady Mathematics","volume":"110 1 supplement","pages":"S62 - S74"},"PeriodicalIF":0.6000,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1134/S1064562424602063.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Doklady Mathematics","FirstCategoryId":"100","ListUrlMain":"https://link.springer.com/article/10.1134/S1064562424602063","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICS","Score":null,"Total":0}

引用次数: 0

Abstract

Calculation of sample size is one of the basic tasks in the field of correct and objective testing of artificial intelligence (AI) algorithms. Existing approaches, despite their exhaustive theoretical justification, can give results that differ by an order of magnitude under the same initial conditions. Most of the input parameters for such methods are determined by the researcher intuitively or on the basis of relevant literature data in the subject area. Such uncertainty at the research planning stage is associated with a high risk of obtaining biased results, which is especially important to take into account when using AI algorithms for medical diagnosis. Within the framework of this work, an empirical study of the value of the minimum required sample size of radiology diagnostic studies to obtain an objective value of the AUROC metric was conducted. An algorithm for calculating the threshold value of sample size according to the criterion of no statistically significant changes in the metric value in case of increasing this size was developed and implemented in software format. Using datasets containing the results of testing of AI algorithms on mammographic and radiographic studies with the total volume of more than 300 thousand, the empirical threshold for the sample size from 30 to 25 thousand studies with different relative content of pathology—from 10 to 90%—was calculated. The proposed algorithm allows obtaining results invariant to the balance of classes in the sample, the target value of AUROC, the modality of studies, and the AI algorithm. The empirical value of the minimum sufficient sample size for testing the AI algorithm for binary classification, obtained by analyzing over 2 million estimated values, is 400 studies. The results can be used to solve the problems of development and testing of diagnostic tools, including AI algorithms.

查看原文本刊更多论文

人工智能算法测试中样本量估计的经验方法

样本大小的计算是正确、客观地测试人工智能算法的基本任务之一。现有的方法尽管有详尽的理论依据，但在相同的初始条件下，得出的结果可能相差一个数量级。这些方法的输入参数大多由研究者凭直觉或根据该学科领域的相关文献数据确定。研究规划阶段的这种不确定性与获得偏倚结果的高风险相关，在使用人工智能算法进行医疗诊断时，这一点尤为重要。在这项工作的框架内，对获得AUROC指标客观值的放射学诊断研究所需的最小样本量的值进行了实证研究。根据增大样本量时度量值无统计学显著变化的标准，提出了一种计算样本量阈值的算法，并以软件形式实现。利用包含人工智能算法在30多万份乳腺x线摄影和x线摄影研究中测试结果的数据集，计算3万至2.5万份不同病理相对含量的研究样本数量的经验阈值（10%至90%）。所提出的算法允许获得与样本中类别的平衡、AUROC的目标值、研究的模式和AI算法不变的结果。通过对200多万个估计值进行分析，得到的用于检验人工智能二分类算法的最小足够样本量的经验值为400个研究。研究结果可用于解决包括人工智能算法在内的诊断工具的开发和测试问题。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Doklady Mathematics 数学-数学

CiteScore

1.00

自引率

16.70%

发文量

审稿时长

3-6 weeks

期刊介绍： Doklady Mathematics is a journal of the Presidium of the Russian Academy of Sciences. It contains English translations of papers published in Doklady Akademii Nauk (Proceedings of the Russian Academy of Sciences), which was founded in 1933 and is published 36 times a year. Doklady Mathematics includes the materials from the following areas: mathematics, mathematical physics, computer science, control theory, and computers. It publishes brief scientific reports on previously unpublished significant new research in mathematics and its applications. The main contributors to the journal are Members of the RAS, Corresponding Members of the RAS, and scientists from the former Soviet Union and other foreign countries. Among the contributors are the outstanding Russian mathematicians.