Supervised machine learning for exploratory analysis in family research

IF 2.7 1区 社会学 Q1 FAMILY STUDIES
Xiaoran Sun
{"title":"Supervised machine learning for exploratory analysis in family research","authors":"Xiaoran Sun","doi":"10.1111/jomf.12973","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Objective</h3>\n \n <p>This article introduces supervised machine learning (ML) for conducting exploratory, discovery-oriented family research in a transparent and systematic way.</p>\n </section>\n \n <section>\n \n <h3> Background</h3>\n \n <p>Supervised ML can examine large numbers of variable simultaneously, identify key predictors, and explore patterns among predictors—an approach that may help address concerns in family research about lack of theoretical specificity and prevalence of unguided exploratory analysis.</p>\n </section>\n \n <section>\n \n <h3> Method</h3>\n \n <p>Following an overview of supervised ML, example analyses drew on the National Longitudinal Study of Adolescent Health (Add Health) dataset across Waves I–IV (<i>N</i> = 5114 adolescents, 50.53% female, <i>M</i><sub>age</sub> = 15.94, <i>SD</i> = 1.77 at Wave I). From 143 articles using Add Health data Waves I through IV, 62 adolescent family variables from eight domains (e.g., socioeconomics, parenting, health) were identified as predictors of young adult (ages 24–32) educational attainment. Following benchmark regression models, ML models were trained using Lasso regression, decision tree, random forest, and extreme gradient boosting; these were tested separately from training data and interpreted through SHapley Additive exPlanations.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>The random forest model performed best (<i>R</i><sup>2</sup> = .382 for the model with all the predictors): 14 variables were identified to be the key predictors of educational attainment. Patterns among these predictors, including directionality, nonlinearity and interactions emerged.</p>\n </section>\n \n <section>\n \n <h3> Conclusions</h3>\n \n <p>Supervised ML research can be used to inform further confirmatory analyses and advance theory.</p>\n </section>\n </div>","PeriodicalId":48440,"journal":{"name":"Journal of Marriage and Family","volume":"86 5","pages":"1468-1494"},"PeriodicalIF":2.7000,"publicationDate":"2024-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/jomf.12973","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Marriage and Family","FirstCategoryId":"90","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/jomf.12973","RegionNum":1,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"FAMILY STUDIES","Score":null,"Total":0}
引用次数: 0

Abstract

Objective

This article introduces supervised machine learning (ML) for conducting exploratory, discovery-oriented family research in a transparent and systematic way.

Background

Supervised ML can examine large numbers of variable simultaneously, identify key predictors, and explore patterns among predictors—an approach that may help address concerns in family research about lack of theoretical specificity and prevalence of unguided exploratory analysis.

Method

Following an overview of supervised ML, example analyses drew on the National Longitudinal Study of Adolescent Health (Add Health) dataset across Waves I–IV (N = 5114 adolescents, 50.53% female, Mage = 15.94, SD = 1.77 at Wave I). From 143 articles using Add Health data Waves I through IV, 62 adolescent family variables from eight domains (e.g., socioeconomics, parenting, health) were identified as predictors of young adult (ages 24–32) educational attainment. Following benchmark regression models, ML models were trained using Lasso regression, decision tree, random forest, and extreme gradient boosting; these were tested separately from training data and interpreted through SHapley Additive exPlanations.

Results

The random forest model performed best (R2 = .382 for the model with all the predictors): 14 variables were identified to be the key predictors of educational attainment. Patterns among these predictors, including directionality, nonlinearity and interactions emerged.

Conclusions

Supervised ML research can be used to inform further confirmatory analyses and advance theory.

Abstract Image

用于家庭研究探索性分析的有监督机器学习
有监督的机器学习(ML)可以同时检查大量变量、识别关键预测因子并探索预测因子之间的模式--这种方法可能有助于解决家庭研究中对缺乏理论特异性和普遍存在的无指导探索性分析的担忧。在概述了监督式 ML 之后,我们利用全国青少年健康纵向研究(Add Health)第一至第四波的数据集(N = 5114 名青少年,50.53% 为女性,Mage = 15.94,SD = 1.77(第一波))进行了实例分析。从使用第一至第四波 Add Health 数据的 143 篇文章中,确定了八个领域(如社会经济、养育子女、健康)中的 62 个青少年家庭变量,作为年轻成人(24-32 岁)受教育程度的预测因素。在基准回归模型之后,使用 Lasso 回归、决策树、随机森林和极端梯度提升等方法训练了 ML 模型;这些模型与训练数据分别进行了测试,并通过 SHapley Additive exPlanations 进行了解释:有 14 个变量被确定为教育程度的关键预测因素。这些预测因素之间出现了模式,包括方向性、非线性和交互作用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
12.20
自引率
6.70%
发文量
81
期刊介绍: For more than 70 years, Journal of Marriage and Family (JMF) has been a leading research journal in the family field. JMF features original research and theory, research interpretation and reviews, and critical discussion concerning all aspects of marriage, other forms of close relationships, and families.In 2009, an institutional subscription to Journal of Marriage and Family includes a subscription to Family Relations and Journal of Family Theory & Review.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信