易离群数据集置信区间估计的稳健方法:在分子和生物物理数据中的应用。

IF 4.8 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY
Biomolecules Pub Date : 2025-05-12 DOI:10.3390/biom15050704
Victor V Golovko
{"title":"易离群数据集置信区间估计的稳健方法:在分子和生物物理数据中的应用。","authors":"Victor V Golovko","doi":"10.3390/biom15050704","DOIUrl":null,"url":null,"abstract":"<p><p>Estimating confidence intervals in small or noisy datasets is a recurring challenge in biomolecular research, particularly when data contain outliers or exhibit high variability. This study introduces a robust statistical method that combines a hybrid bootstrap procedure with Steiner's most frequent value (MFV) approach to estimate confidence intervals without removing outliers or altering the original dataset. The MFV technique identifies the most representative value while minimizing information loss, making it well suited for datasets with limited sample sizes or non-Gaussian distributions. To demonstrate the method's robustness, we intentionally selected a dataset from outside the biomolecular domain: a fast-neutron activation cross-section of the <sup>109</sup>Ag(n, 2n)<sup>108m</sup>Ag reaction from nuclear physics. This dataset presents large uncertainties, inconsistencies, and known evaluation difficulties. Confidence intervals for the cross-section were determined using a method called the MFV-hybrid parametric bootstrapping (MFV-HPB) framework. In this approach, the original data points were repeatedly resampled, and new values were simulated based on their uncertainties before the MFV was calculated. Despite the dataset's complexity, the method yielded a stable MFV estimate of 709 mb with a 68.27% confidence interval of [691, 744] mb, illustrating the method's ability to provide interpretable results in challenging scenarios. Although the example is from nuclear science, the same statistical issues commonly arise in biomolecular fields, such as enzymatic kinetics, molecular assays, and diagnostic biomarker studies. The MFV-HPB framework provides a reliable and generalizable approach for extracting central estimates and confidence intervals in situations where data are difficult to collect, replicate, or interpret. Its resilience to outliers, independence from distributional assumptions, and compatibility with small-sample scenarios make it particularly valuable in molecular medicine, bioengineering, and biophysics.</p>","PeriodicalId":8943,"journal":{"name":"Biomolecules","volume":"15 5","pages":""},"PeriodicalIF":4.8000,"publicationDate":"2025-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12109080/pdf/","citationCount":"0","resultStr":"{\"title\":\"Robust Method for Confidence Interval Estimation in Outlier-Prone Datasets: Application to Molecular and Biophysical Data.\",\"authors\":\"Victor V Golovko\",\"doi\":\"10.3390/biom15050704\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Estimating confidence intervals in small or noisy datasets is a recurring challenge in biomolecular research, particularly when data contain outliers or exhibit high variability. This study introduces a robust statistical method that combines a hybrid bootstrap procedure with Steiner's most frequent value (MFV) approach to estimate confidence intervals without removing outliers or altering the original dataset. The MFV technique identifies the most representative value while minimizing information loss, making it well suited for datasets with limited sample sizes or non-Gaussian distributions. To demonstrate the method's robustness, we intentionally selected a dataset from outside the biomolecular domain: a fast-neutron activation cross-section of the <sup>109</sup>Ag(n, 2n)<sup>108m</sup>Ag reaction from nuclear physics. This dataset presents large uncertainties, inconsistencies, and known evaluation difficulties. Confidence intervals for the cross-section were determined using a method called the MFV-hybrid parametric bootstrapping (MFV-HPB) framework. In this approach, the original data points were repeatedly resampled, and new values were simulated based on their uncertainties before the MFV was calculated. Despite the dataset's complexity, the method yielded a stable MFV estimate of 709 mb with a 68.27% confidence interval of [691, 744] mb, illustrating the method's ability to provide interpretable results in challenging scenarios. Although the example is from nuclear science, the same statistical issues commonly arise in biomolecular fields, such as enzymatic kinetics, molecular assays, and diagnostic biomarker studies. The MFV-HPB framework provides a reliable and generalizable approach for extracting central estimates and confidence intervals in situations where data are difficult to collect, replicate, or interpret. Its resilience to outliers, independence from distributional assumptions, and compatibility with small-sample scenarios make it particularly valuable in molecular medicine, bioengineering, and biophysics.</p>\",\"PeriodicalId\":8943,\"journal\":{\"name\":\"Biomolecules\",\"volume\":\"15 5\",\"pages\":\"\"},\"PeriodicalIF\":4.8000,\"publicationDate\":\"2025-05-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12109080/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biomolecules\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.3390/biom15050704\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomolecules","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.3390/biom15050704","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

在生物分子研究中,估计小数据集或噪声数据集的置信区间是一个反复出现的挑战,特别是当数据包含异常值或表现出高变异性时。本研究引入了一种鲁棒的统计方法,该方法结合了混合bootstrap过程和Steiner最频繁值(MFV)方法来估计置信区间,而无需去除异常值或改变原始数据集。MFV技术识别最具代表性的值,同时最小化信息损失,使其非常适合有限样本量或非高斯分布的数据集。为了证明该方法的稳健性,我们特意选择了一个来自生物分子领域之外的数据集:来自核物理学的109Ag(n, 2n)108mAg反应的快中子激活截面。该数据集存在很大的不确定性、不一致性和已知的评估困难。采用一种称为mfv -混合参数引导(MFV-HPB)框架的方法确定了截面的置信区间。该方法对原始数据点进行重复采样,并根据其不确定度模拟新值,然后再计算MFV。尽管数据集很复杂,但该方法获得了709 mb的稳定MFV估计值,置信区间为68.27% [691,744]mb,说明该方法能够在具有挑战性的场景中提供可解释的结果。虽然这个例子来自核科学,但同样的统计问题通常出现在生物分子领域,如酶动力学、分子测定和诊断性生物标志物研究。MFV-HPB框架为在数据难以收集、复制或解释的情况下提取中心估计和置信区间提供了一种可靠且可推广的方法。它对异常值的弹性,独立于分布假设,以及与小样本情景的兼容性使其在分子医学,生物工程和生物物理学中特别有价值。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Robust Method for Confidence Interval Estimation in Outlier-Prone Datasets: Application to Molecular and Biophysical Data.

Estimating confidence intervals in small or noisy datasets is a recurring challenge in biomolecular research, particularly when data contain outliers or exhibit high variability. This study introduces a robust statistical method that combines a hybrid bootstrap procedure with Steiner's most frequent value (MFV) approach to estimate confidence intervals without removing outliers or altering the original dataset. The MFV technique identifies the most representative value while minimizing information loss, making it well suited for datasets with limited sample sizes or non-Gaussian distributions. To demonstrate the method's robustness, we intentionally selected a dataset from outside the biomolecular domain: a fast-neutron activation cross-section of the 109Ag(n, 2n)108mAg reaction from nuclear physics. This dataset presents large uncertainties, inconsistencies, and known evaluation difficulties. Confidence intervals for the cross-section were determined using a method called the MFV-hybrid parametric bootstrapping (MFV-HPB) framework. In this approach, the original data points were repeatedly resampled, and new values were simulated based on their uncertainties before the MFV was calculated. Despite the dataset's complexity, the method yielded a stable MFV estimate of 709 mb with a 68.27% confidence interval of [691, 744] mb, illustrating the method's ability to provide interpretable results in challenging scenarios. Although the example is from nuclear science, the same statistical issues commonly arise in biomolecular fields, such as enzymatic kinetics, molecular assays, and diagnostic biomarker studies. The MFV-HPB framework provides a reliable and generalizable approach for extracting central estimates and confidence intervals in situations where data are difficult to collect, replicate, or interpret. Its resilience to outliers, independence from distributional assumptions, and compatibility with small-sample scenarios make it particularly valuable in molecular medicine, bioengineering, and biophysics.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Biomolecules
Biomolecules Biochemistry, Genetics and Molecular Biology-Molecular Biology
CiteScore
9.40
自引率
3.60%
发文量
1640
审稿时长
18.28 days
期刊介绍: Biomolecules (ISSN 2218-273X) is an international, peer-reviewed open access journal focusing on biogenic substances and their biological functions, structures, interactions with other molecules, and their microenvironment as well as biological systems. Biomolecules publishes reviews, regular research papers and short communications.  Our aim is to encourage scientists to publish their experimental and theoretical results in as much detail as possible. There is no restriction on the length of the papers. The full experimental details must be provided so that the results can be reproduced.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信