{"title":"易离群数据集置信区间估计的稳健方法:在分子和生物物理数据中的应用。","authors":"Victor V Golovko","doi":"10.3390/biom15050704","DOIUrl":null,"url":null,"abstract":"<p><p>Estimating confidence intervals in small or noisy datasets is a recurring challenge in biomolecular research, particularly when data contain outliers or exhibit high variability. This study introduces a robust statistical method that combines a hybrid bootstrap procedure with Steiner's most frequent value (MFV) approach to estimate confidence intervals without removing outliers or altering the original dataset. The MFV technique identifies the most representative value while minimizing information loss, making it well suited for datasets with limited sample sizes or non-Gaussian distributions. To demonstrate the method's robustness, we intentionally selected a dataset from outside the biomolecular domain: a fast-neutron activation cross-section of the <sup>109</sup>Ag(n, 2n)<sup>108m</sup>Ag reaction from nuclear physics. This dataset presents large uncertainties, inconsistencies, and known evaluation difficulties. Confidence intervals for the cross-section were determined using a method called the MFV-hybrid parametric bootstrapping (MFV-HPB) framework. In this approach, the original data points were repeatedly resampled, and new values were simulated based on their uncertainties before the MFV was calculated. Despite the dataset's complexity, the method yielded a stable MFV estimate of 709 mb with a 68.27% confidence interval of [691, 744] mb, illustrating the method's ability to provide interpretable results in challenging scenarios. Although the example is from nuclear science, the same statistical issues commonly arise in biomolecular fields, such as enzymatic kinetics, molecular assays, and diagnostic biomarker studies. The MFV-HPB framework provides a reliable and generalizable approach for extracting central estimates and confidence intervals in situations where data are difficult to collect, replicate, or interpret. Its resilience to outliers, independence from distributional assumptions, and compatibility with small-sample scenarios make it particularly valuable in molecular medicine, bioengineering, and biophysics.</p>","PeriodicalId":8943,"journal":{"name":"Biomolecules","volume":"15 5","pages":""},"PeriodicalIF":4.8000,"publicationDate":"2025-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12109080/pdf/","citationCount":"0","resultStr":"{\"title\":\"Robust Method for Confidence Interval Estimation in Outlier-Prone Datasets: Application to Molecular and Biophysical Data.\",\"authors\":\"Victor V Golovko\",\"doi\":\"10.3390/biom15050704\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Estimating confidence intervals in small or noisy datasets is a recurring challenge in biomolecular research, particularly when data contain outliers or exhibit high variability. This study introduces a robust statistical method that combines a hybrid bootstrap procedure with Steiner's most frequent value (MFV) approach to estimate confidence intervals without removing outliers or altering the original dataset. The MFV technique identifies the most representative value while minimizing information loss, making it well suited for datasets with limited sample sizes or non-Gaussian distributions. To demonstrate the method's robustness, we intentionally selected a dataset from outside the biomolecular domain: a fast-neutron activation cross-section of the <sup>109</sup>Ag(n, 2n)<sup>108m</sup>Ag reaction from nuclear physics. This dataset presents large uncertainties, inconsistencies, and known evaluation difficulties. Confidence intervals for the cross-section were determined using a method called the MFV-hybrid parametric bootstrapping (MFV-HPB) framework. In this approach, the original data points were repeatedly resampled, and new values were simulated based on their uncertainties before the MFV was calculated. Despite the dataset's complexity, the method yielded a stable MFV estimate of 709 mb with a 68.27% confidence interval of [691, 744] mb, illustrating the method's ability to provide interpretable results in challenging scenarios. Although the example is from nuclear science, the same statistical issues commonly arise in biomolecular fields, such as enzymatic kinetics, molecular assays, and diagnostic biomarker studies. The MFV-HPB framework provides a reliable and generalizable approach for extracting central estimates and confidence intervals in situations where data are difficult to collect, replicate, or interpret. Its resilience to outliers, independence from distributional assumptions, and compatibility with small-sample scenarios make it particularly valuable in molecular medicine, bioengineering, and biophysics.</p>\",\"PeriodicalId\":8943,\"journal\":{\"name\":\"Biomolecules\",\"volume\":\"15 5\",\"pages\":\"\"},\"PeriodicalIF\":4.8000,\"publicationDate\":\"2025-05-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12109080/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biomolecules\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.3390/biom15050704\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomolecules","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.3390/biom15050704","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
Robust Method for Confidence Interval Estimation in Outlier-Prone Datasets: Application to Molecular and Biophysical Data.
Estimating confidence intervals in small or noisy datasets is a recurring challenge in biomolecular research, particularly when data contain outliers or exhibit high variability. This study introduces a robust statistical method that combines a hybrid bootstrap procedure with Steiner's most frequent value (MFV) approach to estimate confidence intervals without removing outliers or altering the original dataset. The MFV technique identifies the most representative value while minimizing information loss, making it well suited for datasets with limited sample sizes or non-Gaussian distributions. To demonstrate the method's robustness, we intentionally selected a dataset from outside the biomolecular domain: a fast-neutron activation cross-section of the 109Ag(n, 2n)108mAg reaction from nuclear physics. This dataset presents large uncertainties, inconsistencies, and known evaluation difficulties. Confidence intervals for the cross-section were determined using a method called the MFV-hybrid parametric bootstrapping (MFV-HPB) framework. In this approach, the original data points were repeatedly resampled, and new values were simulated based on their uncertainties before the MFV was calculated. Despite the dataset's complexity, the method yielded a stable MFV estimate of 709 mb with a 68.27% confidence interval of [691, 744] mb, illustrating the method's ability to provide interpretable results in challenging scenarios. Although the example is from nuclear science, the same statistical issues commonly arise in biomolecular fields, such as enzymatic kinetics, molecular assays, and diagnostic biomarker studies. The MFV-HPB framework provides a reliable and generalizable approach for extracting central estimates and confidence intervals in situations where data are difficult to collect, replicate, or interpret. Its resilience to outliers, independence from distributional assumptions, and compatibility with small-sample scenarios make it particularly valuable in molecular medicine, bioengineering, and biophysics.
BiomoleculesBiochemistry, Genetics and Molecular Biology-Molecular Biology
CiteScore
9.40
自引率
3.60%
发文量
1640
审稿时长
18.28 days
期刊介绍:
Biomolecules (ISSN 2218-273X) is an international, peer-reviewed open access journal focusing on biogenic substances and their biological functions, structures, interactions with other molecules, and their microenvironment as well as biological systems. Biomolecules publishes reviews, regular research papers and short communications. Our aim is to encourage scientists to publish their experimental and theoretical results in as much detail as possible. There is no restriction on the length of the papers. The full experimental details must be provided so that the results can be reproduced.