Challenges for Predictive Modeling With Neural Network Techniques Using Error-Prone Dietary Intake Data.

IF 1.8 4区 医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY
Dylan Spicker, Amir Nazemi, Joy Hutchinson, Paul Fieguth, Sharon Kirkpatrick, Michael Wallace, Kevin W Dodd
{"title":"Challenges for Predictive Modeling With Neural Network Techniques Using Error-Prone Dietary Intake Data.","authors":"Dylan Spicker, Amir Nazemi, Joy Hutchinson, Paul Fieguth, Sharon Kirkpatrick, Michael Wallace, Kevin W Dodd","doi":"10.1002/sim.70013","DOIUrl":null,"url":null,"abstract":"<p><p>Dietary intake data are routinely drawn upon to explore diet-health relationships, and inform clinical practice and public health. However, these data are almost always subject to measurement error, distorting true diet-health relationships. Beyond measurement error, there are likely complex synergistic and sometimes antagonistic interactions between different dietary components, complicating the relationships between diet and health outcomes. Flexible models are required to capture the nuance that these complex interactions introduce. This complexity makes research on diet-health relationships an appealing candidate for the application of modern machine learning techniques, and in particular, neural networks. Neural networks are computational models that can capture highly complex, nonlinear relationships, so long as sufficient data are available. While these models have been applied in many domains, the impacts of measurement error on the performance of predictive modeling have not been widely investigated. In this work, we demonstrate the ways in which measurement error erodes the performance of neural networks and illustrate the care that is required for leveraging these models in the presence of error. We demonstrate the role that sample size and replicate measurements play in model performance, indicate a motivation for the investigation of transformations to additivity, and illustrate the caution required to prevent model overfitting. While the past performance of neural networks across various domains makes them an attractive candidate for examining diet-health relationships, our work demonstrates that substantial care and further methodological development are both required to observe increased predictive performance when applying these techniques compared to more traditional statistical procedures.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"44 5","pages":"e70013"},"PeriodicalIF":1.8000,"publicationDate":"2025-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11806516/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistics in Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/sim.70013","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Dietary intake data are routinely drawn upon to explore diet-health relationships, and inform clinical practice and public health. However, these data are almost always subject to measurement error, distorting true diet-health relationships. Beyond measurement error, there are likely complex synergistic and sometimes antagonistic interactions between different dietary components, complicating the relationships between diet and health outcomes. Flexible models are required to capture the nuance that these complex interactions introduce. This complexity makes research on diet-health relationships an appealing candidate for the application of modern machine learning techniques, and in particular, neural networks. Neural networks are computational models that can capture highly complex, nonlinear relationships, so long as sufficient data are available. While these models have been applied in many domains, the impacts of measurement error on the performance of predictive modeling have not been widely investigated. In this work, we demonstrate the ways in which measurement error erodes the performance of neural networks and illustrate the care that is required for leveraging these models in the presence of error. We demonstrate the role that sample size and replicate measurements play in model performance, indicate a motivation for the investigation of transformations to additivity, and illustrate the caution required to prevent model overfitting. While the past performance of neural networks across various domains makes them an attractive candidate for examining diet-health relationships, our work demonstrates that substantial care and further methodological development are both required to observe increased predictive performance when applying these techniques compared to more traditional statistical procedures.

利用容易出错的饮食摄入数据进行神经网络预测建模的挑战。
膳食摄入数据通常用于探索饮食与健康的关系,并为临床实践和公共卫生提供信息。然而,这些数据几乎总是受到测量误差的影响,扭曲了真正的饮食与健康关系。除了测量误差之外,不同饮食成分之间可能存在复杂的协同作用,有时还存在拮抗作用,使饮食与健康结果之间的关系复杂化。需要灵活的模型来捕捉这些复杂交互所带来的细微差别。这种复杂性使得饮食与健康关系的研究成为现代机器学习技术,特别是神经网络应用的一个有吸引力的候选者。神经网络是一种计算模型,它可以捕捉高度复杂的非线性关系,只要有足够的数据可用。虽然这些模型已经应用于许多领域,但测量误差对预测建模性能的影响尚未得到广泛的研究。在这项工作中,我们展示了测量误差侵蚀神经网络性能的方式,并说明了在存在误差的情况下利用这些模型所需的注意事项。我们展示了样本大小和重复测量在模型性能中所起的作用,表明了研究可加性转换的动机,并说明了防止模型过拟合所需的谨慎。虽然神经网络在各个领域的过去表现使它们成为检查饮食-健康关系的有吸引力的候选者,但我们的工作表明,与更传统的统计程序相比,应用这些技术时,需要大量的关注和进一步的方法发展来观察提高的预测性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Statistics in Medicine
Statistics in Medicine 医学-公共卫生、环境卫生与职业卫生
CiteScore
3.40
自引率
10.00%
发文量
334
审稿时长
2-4 weeks
期刊介绍: The journal aims to influence practice in medicine and its associated sciences through the publication of papers on statistical and other quantitative methods. Papers will explain new methods and demonstrate their application, preferably through a substantive, real, motivating example or a comprehensive evaluation based on an illustrative example. Alternatively, papers will report on case-studies where creative use or technical generalizations of established methodology is directed towards a substantive application. Reviews of, and tutorials on, general topics relevant to the application of statistics to medicine will also be published. The main criteria for publication are appropriateness of the statistical methods to a particular medical problem and clarity of exposition. Papers with primarily mathematical content will be excluded. The journal aims to enhance communication between statisticians, clinicians and medical researchers.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信