Nate C Carnes, Claire A Kolaja, Crystal L Lewis, Sheila F Castañeda, Rudolph P Rull
{"title":"使用机器学习表征调查未完成的个人和方法学风险因素:来自美国千年队列研究的发现。","authors":"Nate C Carnes, Claire A Kolaja, Crystal L Lewis, Sheila F Castañeda, Rudolph P Rull","doi":"10.1186/s12874-025-02620-3","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Missing survey data can threaten the validity and generalizability of findings from longitudinal cohort studies. Respondent characteristics and survey attributes may contribute to patterns of survey non-completion, a form of missing data in which respondents begin but do not finish a survey, that can lead to biased conclusions. The objectives of the present research are to demonstrate how machine learning can identify survey non-completion and to characterize individual and methodological factors that are associated with this form of data missingness.</p><p><strong>Methods: </strong>The present study developed a novel machine learning algorithm to characterize survey non-completion in the Millennium Cohort Study during the 2019-2021 data collection cycle that included a 30- to 45-min paper or web-based follow-up survey for previously enrolled panels (Panels 1-4, n = 80,986) and a 30- to 45-min web-based baseline survey for new enrollees (Panel 5, n = 58,609). We then examined the effect of individual characteristics and survey attributes on survey non-completion.</p><p><strong>Results: </strong>This algorithm achieved 99% accuracy and showed that 0.29% of follow-up respondents and 15.43% of new enrollees were survey non-completers. Our findings suggest that certain military and sociodemographic characteristics (e.g., enlisted pay grades) were associated with increased survey non-completion in the 2019-2021 cycle. Survey attributes explained a large proportion of the variability in survey non-completion, with our analyses indicating a higher likelihood of survey non-completion in Sects. (1) located toward the beginning of the survey, (2) with sensitive questions, and (3) with fewer questions.</p><p><strong>Conclusion: </strong>This research highlights the importance of accounting for potential respondent bias due to survey non-completion and identifies factors associated with this type of missing data.</p>","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":"25 1","pages":"174"},"PeriodicalIF":3.9000,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12261820/pdf/","citationCount":"0","resultStr":"{\"title\":\"Characterizing individual and methodological risk factors for survey non-completion using machine learning: findings from the U.S. Millennium Cohort Study.\",\"authors\":\"Nate C Carnes, Claire A Kolaja, Crystal L Lewis, Sheila F Castañeda, Rudolph P Rull\",\"doi\":\"10.1186/s12874-025-02620-3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Missing survey data can threaten the validity and generalizability of findings from longitudinal cohort studies. Respondent characteristics and survey attributes may contribute to patterns of survey non-completion, a form of missing data in which respondents begin but do not finish a survey, that can lead to biased conclusions. The objectives of the present research are to demonstrate how machine learning can identify survey non-completion and to characterize individual and methodological factors that are associated with this form of data missingness.</p><p><strong>Methods: </strong>The present study developed a novel machine learning algorithm to characterize survey non-completion in the Millennium Cohort Study during the 2019-2021 data collection cycle that included a 30- to 45-min paper or web-based follow-up survey for previously enrolled panels (Panels 1-4, n = 80,986) and a 30- to 45-min web-based baseline survey for new enrollees (Panel 5, n = 58,609). We then examined the effect of individual characteristics and survey attributes on survey non-completion.</p><p><strong>Results: </strong>This algorithm achieved 99% accuracy and showed that 0.29% of follow-up respondents and 15.43% of new enrollees were survey non-completers. Our findings suggest that certain military and sociodemographic characteristics (e.g., enlisted pay grades) were associated with increased survey non-completion in the 2019-2021 cycle. Survey attributes explained a large proportion of the variability in survey non-completion, with our analyses indicating a higher likelihood of survey non-completion in Sects. (1) located toward the beginning of the survey, (2) with sensitive questions, and (3) with fewer questions.</p><p><strong>Conclusion: </strong>This research highlights the importance of accounting for potential respondent bias due to survey non-completion and identifies factors associated with this type of missing data.</p>\",\"PeriodicalId\":9114,\"journal\":{\"name\":\"BMC Medical Research Methodology\",\"volume\":\"25 1\",\"pages\":\"174\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2025-07-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12261820/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC Medical Research Methodology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1186/s12874-025-02620-3\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Research Methodology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12874-025-02620-3","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
Characterizing individual and methodological risk factors for survey non-completion using machine learning: findings from the U.S. Millennium Cohort Study.
Background: Missing survey data can threaten the validity and generalizability of findings from longitudinal cohort studies. Respondent characteristics and survey attributes may contribute to patterns of survey non-completion, a form of missing data in which respondents begin but do not finish a survey, that can lead to biased conclusions. The objectives of the present research are to demonstrate how machine learning can identify survey non-completion and to characterize individual and methodological factors that are associated with this form of data missingness.
Methods: The present study developed a novel machine learning algorithm to characterize survey non-completion in the Millennium Cohort Study during the 2019-2021 data collection cycle that included a 30- to 45-min paper or web-based follow-up survey for previously enrolled panels (Panels 1-4, n = 80,986) and a 30- to 45-min web-based baseline survey for new enrollees (Panel 5, n = 58,609). We then examined the effect of individual characteristics and survey attributes on survey non-completion.
Results: This algorithm achieved 99% accuracy and showed that 0.29% of follow-up respondents and 15.43% of new enrollees were survey non-completers. Our findings suggest that certain military and sociodemographic characteristics (e.g., enlisted pay grades) were associated with increased survey non-completion in the 2019-2021 cycle. Survey attributes explained a large proportion of the variability in survey non-completion, with our analyses indicating a higher likelihood of survey non-completion in Sects. (1) located toward the beginning of the survey, (2) with sensitive questions, and (3) with fewer questions.
Conclusion: This research highlights the importance of accounting for potential respondent bias due to survey non-completion and identifies factors associated with this type of missing data.
期刊介绍:
BMC Medical Research Methodology is an open access journal publishing original peer-reviewed research articles in methodological approaches to healthcare research. Articles on the methodology of epidemiological research, clinical trials and meta-analysis/systematic review are particularly encouraged, as are empirical studies of the associations between choice of methodology and study outcomes. BMC Medical Research Methodology does not aim to publish articles describing scientific methods or techniques: these should be directed to the BMC journal covering the relevant biomedical subject area.