Dylan Spicker, Amir Nazemi, Joy Hutchinson, Paul Fieguth, Sharon Kirkpatrick, Michael Wallace, Kevin W Dodd
{"title":"Challenges for Predictive Modeling With Neural Network Techniques Using Error-Prone Dietary Intake Data.","authors":"Dylan Spicker, Amir Nazemi, Joy Hutchinson, Paul Fieguth, Sharon Kirkpatrick, Michael Wallace, Kevin W Dodd","doi":"10.1002/sim.70013","DOIUrl":null,"url":null,"abstract":"<p><p>Dietary intake data are routinely drawn upon to explore diet-health relationships, and inform clinical practice and public health. However, these data are almost always subject to measurement error, distorting true diet-health relationships. Beyond measurement error, there are likely complex synergistic and sometimes antagonistic interactions between different dietary components, complicating the relationships between diet and health outcomes. Flexible models are required to capture the nuance that these complex interactions introduce. This complexity makes research on diet-health relationships an appealing candidate for the application of modern machine learning techniques, and in particular, neural networks. Neural networks are computational models that can capture highly complex, nonlinear relationships, so long as sufficient data are available. While these models have been applied in many domains, the impacts of measurement error on the performance of predictive modeling have not been widely investigated. In this work, we demonstrate the ways in which measurement error erodes the performance of neural networks and illustrate the care that is required for leveraging these models in the presence of error. We demonstrate the role that sample size and replicate measurements play in model performance, indicate a motivation for the investigation of transformations to additivity, and illustrate the caution required to prevent model overfitting. While the past performance of neural networks across various domains makes them an attractive candidate for examining diet-health relationships, our work demonstrates that substantial care and further methodological development are both required to observe increased predictive performance when applying these techniques compared to more traditional statistical procedures.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"44 5","pages":"e70013"},"PeriodicalIF":1.8000,"publicationDate":"2025-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11806516/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistics in Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/sim.70013","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Dietary intake data are routinely drawn upon to explore diet-health relationships, and inform clinical practice and public health. However, these data are almost always subject to measurement error, distorting true diet-health relationships. Beyond measurement error, there are likely complex synergistic and sometimes antagonistic interactions between different dietary components, complicating the relationships between diet and health outcomes. Flexible models are required to capture the nuance that these complex interactions introduce. This complexity makes research on diet-health relationships an appealing candidate for the application of modern machine learning techniques, and in particular, neural networks. Neural networks are computational models that can capture highly complex, nonlinear relationships, so long as sufficient data are available. While these models have been applied in many domains, the impacts of measurement error on the performance of predictive modeling have not been widely investigated. In this work, we demonstrate the ways in which measurement error erodes the performance of neural networks and illustrate the care that is required for leveraging these models in the presence of error. We demonstrate the role that sample size and replicate measurements play in model performance, indicate a motivation for the investigation of transformations to additivity, and illustrate the caution required to prevent model overfitting. While the past performance of neural networks across various domains makes them an attractive candidate for examining diet-health relationships, our work demonstrates that substantial care and further methodological development are both required to observe increased predictive performance when applying these techniques compared to more traditional statistical procedures.
期刊介绍:
The journal aims to influence practice in medicine and its associated sciences through the publication of papers on statistical and other quantitative methods. Papers will explain new methods and demonstrate their application, preferably through a substantive, real, motivating example or a comprehensive evaluation based on an illustrative example. Alternatively, papers will report on case-studies where creative use or technical generalizations of established methodology is directed towards a substantive application. Reviews of, and tutorials on, general topics relevant to the application of statistics to medicine will also be published. The main criteria for publication are appropriateness of the statistical methods to a particular medical problem and clarity of exposition. Papers with primarily mathematical content will be excluded. The journal aims to enhance communication between statisticians, clinicians and medical researchers.