克服电子鼻临床研究的方法学障碍，一种基于模拟数据的方法。

IF 3.4 4区医学 Q1 BIOCHEMICAL RESEARCH METHODS

Journal of breath research Pub Date : 2025-05-09 DOI:10.1088/1752-7163/add291

Milou L M van Riswijk, Bastiaan F M van Tintelen, Ruben H Lucas, Job van der Palen, Peter D Siersema

{"title":"克服电子鼻临床研究的方法学障碍，一种基于模拟数据的方法。","authors":"Milou L M van Riswijk, Bastiaan F M van Tintelen, Ruben H Lucas, Job van der Palen, Peter D Siersema","doi":"10.1088/1752-7163/add291","DOIUrl":null,"url":null,"abstract":"Analysis of volatile organic compounds by electronic nose (e-nose) may address gaps in non-invasive screening for neoplasia. Machine learning impacts study design and sample size requirements, but guidance on clinical study design is limited. This study evaluates how neoplasia prevalence, augmented data, and the number of e-nose devices impact sample size requirements. Simulated e-nose breath test data were created using real-world study data. We examined the effect of varying neoplasia prevalence (50%-5%) and data augmentation on model performance, as well as the impact of using multiple devices. Prediction models were developed using single value decomposition and random forest, and convolutional neural networks. Model performance was displayed as area under the receiver operating characteristics curve and F1-score. Stable model performance was defined as the phase where additional data no longer increases model performance. We found that lower neoplasia prevalence significantly increased sample size requirements, with low-prevalence settings (5%) requiring up to five times more data than high-prevalence settings (50%) for stable model performance. Model performance varied between devices, and integrating data from multiple devices required larger sample sizes. Approximately 400 data points per device at 50% prevalence, and 2100 data points at 5% prevalence, were necessary to reach stable model performance. Concluding, sample size requirements for e-nose studies are heavily influenced by disease prevalence and the number of devices used. Limiting device variability and ensuring sufficient case and control samples per device are crucial for achieving reliable predictive performance. Specific requirements will vary based on sensor and disease characteristics.ClinicalTrials.gov Identifier:Clinicaltrials.gov Identifier NCT03346005 (model study) and NCT04357158 (validation study).","PeriodicalId":15306,"journal":{"name":"Journal of breath research","volume":"19 3","pages":""},"PeriodicalIF":3.4000,"publicationDate":"2025-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Overcoming methodological barriers in electronic nose clinical studies, a simulation data-based approach.\",\"authors\":\"Milou L M van Riswijk, Bastiaan F M van Tintelen, Ruben H Lucas, Job van der Palen, Peter D Siersema\",\"doi\":\"10.1088/1752-7163/add291\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Analysis of volatile organic compounds by electronic nose (e-nose) may address gaps in non-invasive screening for neoplasia. Machine learning impacts study design and sample size requirements, but guidance on clinical study design is limited. This study evaluates how neoplasia prevalence, augmented data, and the number of e-nose devices impact sample size requirements. Simulated e-nose breath test data were created using real-world study data. We examined the effect of varying neoplasia prevalence (50%-5%) and data augmentation on model performance, as well as the impact of using multiple devices. Prediction models were developed using single value decomposition and random forest, and convolutional neural networks. Model performance was displayed as area under the receiver operating characteristics curve and F1-score. Stable model performance was defined as the phase where additional data no longer increases model performance. We found that lower neoplasia prevalence significantly increased sample size requirements, with low-prevalence settings (5%) requiring up to five times more data than high-prevalence settings (50%) for stable model performance. Model performance varied between devices, and integrating data from multiple devices required larger sample sizes. Approximately 400 data points per device at 50% prevalence, and 2100 data points at 5% prevalence, were necessary to reach stable model performance. Concluding, sample size requirements for e-nose studies are heavily influenced by disease prevalence and the number of devices used. Limiting device variability and ensuring sufficient case and control samples per device are crucial for achieving reliable predictive performance. Specific requirements will vary based on sensor and disease characteristics.ClinicalTrials.gov Identifier:Clinicaltrials.gov Identifier NCT03346005 (model study) and NCT04357158 (validation study).\",\"PeriodicalId\":15306,\"journal\":{\"name\":\"Journal of breath research\",\"volume\":\"19 3\",\"pages\":\"\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2025-05-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of breath research\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1088/1752-7163/add291\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of breath research","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1088/1752-7163/add291","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

摘要

通过电子鼻分析挥发性有机化合物可能会弥补肿瘤非侵入性筛查的空白。机器学习影响研究设计和样本量要求，但临床研究设计的指导是有限的。本研究评估了肿瘤患病率、增强数据和电子鼻设备数量如何影响样本量要求。模拟电子鼻呼吸测试数据是使用真实世界的研究数据创建的。我们研究了不同肿瘤患病率（50%-5%）和数据增强对模型性能的影响，以及使用多种设备的影响。利用单值分解、随机森林和卷积神经网络建立预测模型。模型表现以受试者工作特征曲线下面积和f1评分显示。稳定的模型性能被定义为附加数据不再增加模型性能的阶段。我们发现，较低的肿瘤患病率显著增加了样本量要求，低患病率设置（5%）比高患病率设置（50%）需要多达5倍的数据来稳定模型性能。模型性能因设备而异，集成来自多个设备的数据需要更大的样本量。为了达到稳定的模型性能，每个设备在50%的流行率下需要大约400个数据点，在5%的流行率下需要2100个数据点。总之，电子鼻研究的样本量要求在很大程度上受到疾病流行程度和使用设备数量的影响。限制设备可变性，确保每个设备有足够的病例和控制样本，对于实现可靠的预测性能至关重要。具体要求将根据传感器和疾病特征而有所不同。临床试验。gov标识符：临床试验。gov标识符NCT03346005（模型研究）和NCT04357158（验证研究）。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Overcoming methodological barriers in electronic nose clinical studies, a simulation data-based approach.

Analysis of volatile organic compounds by electronic nose (e-nose) may address gaps in non-invasive screening for neoplasia. Machine learning impacts study design and sample size requirements, but guidance on clinical study design is limited. This study evaluates how neoplasia prevalence, augmented data, and the number of e-nose devices impact sample size requirements. Simulated e-nose breath test data were created using real-world study data. We examined the effect of varying neoplasia prevalence (50%-5%) and data augmentation on model performance, as well as the impact of using multiple devices. Prediction models were developed using single value decomposition and random forest, and convolutional neural networks. Model performance was displayed as area under the receiver operating characteristics curve and F1-score. Stable model performance was defined as the phase where additional data no longer increases model performance. We found that lower neoplasia prevalence significantly increased sample size requirements, with low-prevalence settings (5%) requiring up to five times more data than high-prevalence settings (50%) for stable model performance. Model performance varied between devices, and integrating data from multiple devices required larger sample sizes. Approximately 400 data points per device at 50% prevalence, and 2100 data points at 5% prevalence, were necessary to reach stable model performance. Concluding, sample size requirements for e-nose studies are heavily influenced by disease prevalence and the number of devices used. Limiting device variability and ensuring sufficient case and control samples per device are crucial for achieving reliable predictive performance. Specific requirements will vary based on sensor and disease characteristics.ClinicalTrials.gov Identifier:Clinicaltrials.gov Identifier NCT03346005 (model study) and NCT04357158 (validation study).

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of breath research BIOCHEMICAL RESEARCH METHODS-RESPIRATORY SYSTEM

CiteScore

7.60

自引率

21.10%

发文量

审稿时长

>12 weeks

期刊介绍： Journal of Breath Research is dedicated to all aspects of scientific breath research. The traditional focus is on analysis of volatile compounds and aerosols in exhaled breath for the investigation of exogenous exposures, metabolism, toxicology, health status and the diagnosis of disease and breath odours. The journal also welcomes other breath-related topics. Typical areas of interest include: Big laboratory instrumentation: describing new state-of-the-art analytical instrumentation capable of performing high-resolution discovery and targeted breath research; exploiting complex technologies drawn from other areas of biochemistry and genetics for breath research. Engineering solutions: developing new breath sampling technologies for condensate and aerosols, for chemical and optical sensors, for extraction and sample preparation methods, for automation and standardization, and for multiplex analyses to preserve the breath matrix and facilitating analytical throughput. Measure exhaled constituents (e.g. CO2, acetone, isoprene) as markers of human presence or mitigate such contaminants in enclosed environments. Human and animal in vivo studies: decoding the ''breath exposome'', implementing exposure and intervention studies, performing cross-sectional and case-control research, assaying immune and inflammatory response, and testing mammalian host response to infections and exogenous exposures to develop information directly applicable to systems biology. Studying inhalation toxicology; inhaled breath as a source of internal dose; resultant blood, breath and urinary biomarkers linked to inhalation pathway. Cellular and molecular level in vitro studies. Clinical, pharmacological and forensic applications. Mathematical, statistical and graphical data interpretation.