{"title":"Supervised machine learning for exploratory analysis in family research","authors":"Xiaoran Sun","doi":"10.1111/jomf.12973","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Objective</h3>\n \n <p>This article introduces supervised machine learning (ML) for conducting exploratory, discovery-oriented family research in a transparent and systematic way.</p>\n </section>\n \n <section>\n \n <h3> Background</h3>\n \n <p>Supervised ML can examine large numbers of variable simultaneously, identify key predictors, and explore patterns among predictors—an approach that may help address concerns in family research about lack of theoretical specificity and prevalence of unguided exploratory analysis.</p>\n </section>\n \n <section>\n \n <h3> Method</h3>\n \n <p>Following an overview of supervised ML, example analyses drew on the National Longitudinal Study of Adolescent Health (Add Health) dataset across Waves I–IV (<i>N</i> = 5114 adolescents, 50.53% female, <i>M</i><sub>age</sub> = 15.94, <i>SD</i> = 1.77 at Wave I). From 143 articles using Add Health data Waves I through IV, 62 adolescent family variables from eight domains (e.g., socioeconomics, parenting, health) were identified as predictors of young adult (ages 24–32) educational attainment. Following benchmark regression models, ML models were trained using Lasso regression, decision tree, random forest, and extreme gradient boosting; these were tested separately from training data and interpreted through SHapley Additive exPlanations.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>The random forest model performed best (<i>R</i><sup>2</sup> = .382 for the model with all the predictors): 14 variables were identified to be the key predictors of educational attainment. Patterns among these predictors, including directionality, nonlinearity and interactions emerged.</p>\n </section>\n \n <section>\n \n <h3> Conclusions</h3>\n \n <p>Supervised ML research can be used to inform further confirmatory analyses and advance theory.</p>\n </section>\n </div>","PeriodicalId":48440,"journal":{"name":"Journal of Marriage and Family","volume":"86 5","pages":"1468-1494"},"PeriodicalIF":2.7000,"publicationDate":"2024-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/jomf.12973","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Marriage and Family","FirstCategoryId":"90","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/jomf.12973","RegionNum":1,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"FAMILY STUDIES","Score":null,"Total":0}
引用次数: 0
Abstract
Objective
This article introduces supervised machine learning (ML) for conducting exploratory, discovery-oriented family research in a transparent and systematic way.
Background
Supervised ML can examine large numbers of variable simultaneously, identify key predictors, and explore patterns among predictors—an approach that may help address concerns in family research about lack of theoretical specificity and prevalence of unguided exploratory analysis.
Method
Following an overview of supervised ML, example analyses drew on the National Longitudinal Study of Adolescent Health (Add Health) dataset across Waves I–IV (N = 5114 adolescents, 50.53% female, Mage = 15.94, SD = 1.77 at Wave I). From 143 articles using Add Health data Waves I through IV, 62 adolescent family variables from eight domains (e.g., socioeconomics, parenting, health) were identified as predictors of young adult (ages 24–32) educational attainment. Following benchmark regression models, ML models were trained using Lasso regression, decision tree, random forest, and extreme gradient boosting; these were tested separately from training data and interpreted through SHapley Additive exPlanations.
Results
The random forest model performed best (R2 = .382 for the model with all the predictors): 14 variables were identified to be the key predictors of educational attainment. Patterns among these predictors, including directionality, nonlinearity and interactions emerged.
Conclusions
Supervised ML research can be used to inform further confirmatory analyses and advance theory.
期刊介绍:
For more than 70 years, Journal of Marriage and Family (JMF) has been a leading research journal in the family field. JMF features original research and theory, research interpretation and reviews, and critical discussion concerning all aspects of marriage, other forms of close relationships, and families.In 2009, an institutional subscription to Journal of Marriage and Family includes a subscription to Family Relations and Journal of Family Theory & Review.