克服电子鼻临床研究的方法学障碍,一种基于模拟数据的方法。

IF 3.7 4区 医学 Q1 BIOCHEMICAL RESEARCH METHODS
Milou L M van Riswijk, Bastiaan F M van Tintelen, Ruben H Lucas, Job van der Palen, Peter D Siersema
{"title":"克服电子鼻临床研究的方法学障碍,一种基于模拟数据的方法。","authors":"Milou L M van Riswijk, Bastiaan F M van Tintelen, Ruben H Lucas, Job van der Palen, Peter D Siersema","doi":"10.1088/1752-7163/add291","DOIUrl":null,"url":null,"abstract":"<p><p>Analysis of volatile organic compounds by electronic nose (e-nose) may address gaps in non-invasive screening for neoplasia. Machine learning impacts study design and sample size requirements, but guidance on clinical study design is limited. This study evaluates how neoplasia prevalence, augmented data, and the number of e-nose devices impact sample size requirements. Simulated e-nose breath test data were created using real-world study data. We examined the effect of varying neoplasia prevalence (50%-5%) and data augmentation on model performance, as well as the impact of using multiple devices. Prediction models were developed using single value decomposition and random forest, and convolutional neural networks. Model performance was displayed as area under the receiver operating characteristics curve and F1-score. Stable model performance was defined as the phase where additional data no longer increases model performance. We found that lower neoplasia prevalence significantly increased sample size requirements, with low-prevalence settings (5%) requiring up to five times more data than high-prevalence settings (50%) for stable model performance. Model performance varied between devices, and integrating data from multiple devices required larger sample sizes. Approximately 400 data points per device at 50% prevalence, and 2100 data points at 5% prevalence, were necessary to reach stable model performance. Concluding, sample size requirements for e-nose studies are heavily influenced by disease prevalence and the number of devices used. Limiting device variability and ensuring sufficient case and control samples per device are crucial for achieving reliable predictive performance. Specific requirements will vary based on sensor and disease characteristics.<b>ClinicalTrials.gov Identifier:</b>Clinicaltrials.gov Identifier NCT03346005 (model study) and NCT04357158 (validation study).</p>","PeriodicalId":15306,"journal":{"name":"Journal of breath research","volume":"19 3","pages":""},"PeriodicalIF":3.7000,"publicationDate":"2025-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Overcoming methodological barriers in electronic nose clinical studies, a simulation data-based approach.\",\"authors\":\"Milou L M van Riswijk, Bastiaan F M van Tintelen, Ruben H Lucas, Job van der Palen, Peter D Siersema\",\"doi\":\"10.1088/1752-7163/add291\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Analysis of volatile organic compounds by electronic nose (e-nose) may address gaps in non-invasive screening for neoplasia. Machine learning impacts study design and sample size requirements, but guidance on clinical study design is limited. This study evaluates how neoplasia prevalence, augmented data, and the number of e-nose devices impact sample size requirements. Simulated e-nose breath test data were created using real-world study data. We examined the effect of varying neoplasia prevalence (50%-5%) and data augmentation on model performance, as well as the impact of using multiple devices. Prediction models were developed using single value decomposition and random forest, and convolutional neural networks. Model performance was displayed as area under the receiver operating characteristics curve and F1-score. Stable model performance was defined as the phase where additional data no longer increases model performance. We found that lower neoplasia prevalence significantly increased sample size requirements, with low-prevalence settings (5%) requiring up to five times more data than high-prevalence settings (50%) for stable model performance. Model performance varied between devices, and integrating data from multiple devices required larger sample sizes. Approximately 400 data points per device at 50% prevalence, and 2100 data points at 5% prevalence, were necessary to reach stable model performance. Concluding, sample size requirements for e-nose studies are heavily influenced by disease prevalence and the number of devices used. Limiting device variability and ensuring sufficient case and control samples per device are crucial for achieving reliable predictive performance. Specific requirements will vary based on sensor and disease characteristics.<b>ClinicalTrials.gov Identifier:</b>Clinicaltrials.gov Identifier NCT03346005 (model study) and NCT04357158 (validation study).</p>\",\"PeriodicalId\":15306,\"journal\":{\"name\":\"Journal of breath research\",\"volume\":\"19 3\",\"pages\":\"\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2025-05-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of breath research\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1088/1752-7163/add291\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of breath research","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1088/1752-7163/add291","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

摘要

通过电子鼻分析挥发性有机化合物可能会弥补肿瘤非侵入性筛查的空白。机器学习影响研究设计和样本量要求,但临床研究设计的指导是有限的。本研究评估了肿瘤患病率、增强数据和电子鼻设备数量如何影响样本量要求。模拟电子鼻呼吸测试数据是使用真实世界的研究数据创建的。我们研究了不同肿瘤患病率(50%-5%)和数据增强对模型性能的影响,以及使用多种设备的影响。利用单值分解、随机森林和卷积神经网络建立预测模型。模型表现以受试者工作特征曲线下面积和f1评分显示。稳定的模型性能被定义为附加数据不再增加模型性能的阶段。我们发现,较低的肿瘤患病率显著增加了样本量要求,低患病率设置(5%)比高患病率设置(50%)需要多达5倍的数据来稳定模型性能。模型性能因设备而异,集成来自多个设备的数据需要更大的样本量。为了达到稳定的模型性能,每个设备在50%的流行率下需要大约400个数据点,在5%的流行率下需要2100个数据点。总之,电子鼻研究的样本量要求在很大程度上受到疾病流行程度和使用设备数量的影响。限制设备可变性,确保每个设备有足够的病例和控制样本,对于实现可靠的预测性能至关重要。具体要求将根据传感器和疾病特征而有所不同。临床试验。gov标识符:临床试验。gov标识符NCT03346005(模型研究)和NCT04357158(验证研究)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Overcoming methodological barriers in electronic nose clinical studies, a simulation data-based approach.

Analysis of volatile organic compounds by electronic nose (e-nose) may address gaps in non-invasive screening for neoplasia. Machine learning impacts study design and sample size requirements, but guidance on clinical study design is limited. This study evaluates how neoplasia prevalence, augmented data, and the number of e-nose devices impact sample size requirements. Simulated e-nose breath test data were created using real-world study data. We examined the effect of varying neoplasia prevalence (50%-5%) and data augmentation on model performance, as well as the impact of using multiple devices. Prediction models were developed using single value decomposition and random forest, and convolutional neural networks. Model performance was displayed as area under the receiver operating characteristics curve and F1-score. Stable model performance was defined as the phase where additional data no longer increases model performance. We found that lower neoplasia prevalence significantly increased sample size requirements, with low-prevalence settings (5%) requiring up to five times more data than high-prevalence settings (50%) for stable model performance. Model performance varied between devices, and integrating data from multiple devices required larger sample sizes. Approximately 400 data points per device at 50% prevalence, and 2100 data points at 5% prevalence, were necessary to reach stable model performance. Concluding, sample size requirements for e-nose studies are heavily influenced by disease prevalence and the number of devices used. Limiting device variability and ensuring sufficient case and control samples per device are crucial for achieving reliable predictive performance. Specific requirements will vary based on sensor and disease characteristics.ClinicalTrials.gov Identifier:Clinicaltrials.gov Identifier NCT03346005 (model study) and NCT04357158 (validation study).

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of breath research
Journal of breath research BIOCHEMICAL RESEARCH METHODS-RESPIRATORY SYSTEM
CiteScore
7.60
自引率
21.10%
发文量
49
审稿时长
>12 weeks
期刊介绍: Journal of Breath Research is dedicated to all aspects of scientific breath research. The traditional focus is on analysis of volatile compounds and aerosols in exhaled breath for the investigation of exogenous exposures, metabolism, toxicology, health status and the diagnosis of disease and breath odours. The journal also welcomes other breath-related topics. Typical areas of interest include: Big laboratory instrumentation: describing new state-of-the-art analytical instrumentation capable of performing high-resolution discovery and targeted breath research; exploiting complex technologies drawn from other areas of biochemistry and genetics for breath research. Engineering solutions: developing new breath sampling technologies for condensate and aerosols, for chemical and optical sensors, for extraction and sample preparation methods, for automation and standardization, and for multiplex analyses to preserve the breath matrix and facilitating analytical throughput. Measure exhaled constituents (e.g. CO2, acetone, isoprene) as markers of human presence or mitigate such contaminants in enclosed environments. Human and animal in vivo studies: decoding the ''breath exposome'', implementing exposure and intervention studies, performing cross-sectional and case-control research, assaying immune and inflammatory response, and testing mammalian host response to infections and exogenous exposures to develop information directly applicable to systems biology. Studying inhalation toxicology; inhaled breath as a source of internal dose; resultant blood, breath and urinary biomarkers linked to inhalation pathway. Cellular and molecular level in vitro studies. Clinical, pharmacological and forensic applications. Mathematical, statistical and graphical data interpretation.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信