Milou L M van Riswijk, Bastiaan F M van Tintelen, Ruben H Lucas, Job van der Palen, Peter D Siersema
{"title":"克服电子鼻临床研究的方法学障碍,一种基于模拟数据的方法。","authors":"Milou L M van Riswijk, Bastiaan F M van Tintelen, Ruben H Lucas, Job van der Palen, Peter D Siersema","doi":"10.1088/1752-7163/add291","DOIUrl":null,"url":null,"abstract":"<p><p>Analysis of volatile organic compounds by electronic nose (e-nose) may address gaps in non-invasive screening for neoplasia. Machine learning impacts study design and sample size requirements, but guidance on clinical study design is limited. This study evaluates how neoplasia prevalence, augmented data, and the number of e-nose devices impact sample size requirements. Simulated e-nose breath test data were created using real-world study data. We examined the effect of varying neoplasia prevalence (50%-5%) and data augmentation on model performance, as well as the impact of using multiple devices. Prediction models were developed using single value decomposition and random forest, and convolutional neural networks. Model performance was displayed as area under the receiver operating characteristics curve and F1-score. Stable model performance was defined as the phase where additional data no longer increases model performance. We found that lower neoplasia prevalence significantly increased sample size requirements, with low-prevalence settings (5%) requiring up to five times more data than high-prevalence settings (50%) for stable model performance. Model performance varied between devices, and integrating data from multiple devices required larger sample sizes. Approximately 400 data points per device at 50% prevalence, and 2100 data points at 5% prevalence, were necessary to reach stable model performance. Concluding, sample size requirements for e-nose studies are heavily influenced by disease prevalence and the number of devices used. Limiting device variability and ensuring sufficient case and control samples per device are crucial for achieving reliable predictive performance. Specific requirements will vary based on sensor and disease characteristics.<b>ClinicalTrials.gov Identifier:</b>Clinicaltrials.gov Identifier NCT03346005 (model study) and NCT04357158 (validation study).</p>","PeriodicalId":15306,"journal":{"name":"Journal of breath research","volume":"19 3","pages":""},"PeriodicalIF":3.7000,"publicationDate":"2025-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Overcoming methodological barriers in electronic nose clinical studies, a simulation data-based approach.\",\"authors\":\"Milou L M van Riswijk, Bastiaan F M van Tintelen, Ruben H Lucas, Job van der Palen, Peter D Siersema\",\"doi\":\"10.1088/1752-7163/add291\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Analysis of volatile organic compounds by electronic nose (e-nose) may address gaps in non-invasive screening for neoplasia. Machine learning impacts study design and sample size requirements, but guidance on clinical study design is limited. This study evaluates how neoplasia prevalence, augmented data, and the number of e-nose devices impact sample size requirements. Simulated e-nose breath test data were created using real-world study data. We examined the effect of varying neoplasia prevalence (50%-5%) and data augmentation on model performance, as well as the impact of using multiple devices. Prediction models were developed using single value decomposition and random forest, and convolutional neural networks. Model performance was displayed as area under the receiver operating characteristics curve and F1-score. Stable model performance was defined as the phase where additional data no longer increases model performance. We found that lower neoplasia prevalence significantly increased sample size requirements, with low-prevalence settings (5%) requiring up to five times more data than high-prevalence settings (50%) for stable model performance. Model performance varied between devices, and integrating data from multiple devices required larger sample sizes. Approximately 400 data points per device at 50% prevalence, and 2100 data points at 5% prevalence, were necessary to reach stable model performance. Concluding, sample size requirements for e-nose studies are heavily influenced by disease prevalence and the number of devices used. Limiting device variability and ensuring sufficient case and control samples per device are crucial for achieving reliable predictive performance. Specific requirements will vary based on sensor and disease characteristics.<b>ClinicalTrials.gov Identifier:</b>Clinicaltrials.gov Identifier NCT03346005 (model study) and NCT04357158 (validation study).</p>\",\"PeriodicalId\":15306,\"journal\":{\"name\":\"Journal of breath research\",\"volume\":\"19 3\",\"pages\":\"\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2025-05-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of breath research\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1088/1752-7163/add291\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of breath research","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1088/1752-7163/add291","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
Overcoming methodological barriers in electronic nose clinical studies, a simulation data-based approach.
Analysis of volatile organic compounds by electronic nose (e-nose) may address gaps in non-invasive screening for neoplasia. Machine learning impacts study design and sample size requirements, but guidance on clinical study design is limited. This study evaluates how neoplasia prevalence, augmented data, and the number of e-nose devices impact sample size requirements. Simulated e-nose breath test data were created using real-world study data. We examined the effect of varying neoplasia prevalence (50%-5%) and data augmentation on model performance, as well as the impact of using multiple devices. Prediction models were developed using single value decomposition and random forest, and convolutional neural networks. Model performance was displayed as area under the receiver operating characteristics curve and F1-score. Stable model performance was defined as the phase where additional data no longer increases model performance. We found that lower neoplasia prevalence significantly increased sample size requirements, with low-prevalence settings (5%) requiring up to five times more data than high-prevalence settings (50%) for stable model performance. Model performance varied between devices, and integrating data from multiple devices required larger sample sizes. Approximately 400 data points per device at 50% prevalence, and 2100 data points at 5% prevalence, were necessary to reach stable model performance. Concluding, sample size requirements for e-nose studies are heavily influenced by disease prevalence and the number of devices used. Limiting device variability and ensuring sufficient case and control samples per device are crucial for achieving reliable predictive performance. Specific requirements will vary based on sensor and disease characteristics.ClinicalTrials.gov Identifier:Clinicaltrials.gov Identifier NCT03346005 (model study) and NCT04357158 (validation study).
期刊介绍:
Journal of Breath Research is dedicated to all aspects of scientific breath research. The traditional focus is on analysis of volatile compounds and aerosols in exhaled breath for the investigation of exogenous exposures, metabolism, toxicology, health status and the diagnosis of disease and breath odours. The journal also welcomes other breath-related topics.
Typical areas of interest include:
Big laboratory instrumentation: describing new state-of-the-art analytical instrumentation capable of performing high-resolution discovery and targeted breath research; exploiting complex technologies drawn from other areas of biochemistry and genetics for breath research.
Engineering solutions: developing new breath sampling technologies for condensate and aerosols, for chemical and optical sensors, for extraction and sample preparation methods, for automation and standardization, and for multiplex analyses to preserve the breath matrix and facilitating analytical throughput. Measure exhaled constituents (e.g. CO2, acetone, isoprene) as markers of human presence or mitigate such contaminants in enclosed environments.
Human and animal in vivo studies: decoding the ''breath exposome'', implementing exposure and intervention studies, performing cross-sectional and case-control research, assaying immune and inflammatory response, and testing mammalian host response to infections and exogenous exposures to develop information directly applicable to systems biology. Studying inhalation toxicology; inhaled breath as a source of internal dose; resultant blood, breath and urinary biomarkers linked to inhalation pathway.
Cellular and molecular level in vitro studies.
Clinical, pharmacological and forensic applications.
Mathematical, statistical and graphical data interpretation.