{"title":"An overview of modern machine learning methods for effect measure modification analyses in high-dimensional settings","authors":"Michael Cheung, Anna Dimitrova, Tarik Benmarhnia","doi":"10.1016/j.ssmph.2025.101764","DOIUrl":null,"url":null,"abstract":"<div><div>A primary concern of public health researchers involves identifying and quantifying heterogeneous exposure effects across population subgroups. Understanding the magnitude and direction of these effects on a given scale provides researchers the ability to recommend policy prescriptions and assess the external validity of findings. Traditional methods for effect measure modification analyses require manual model specification that is often impractical or not feasible to conduct in high-dimensional settings. Recent developments in machine learning aim to solve this issue by utilizing data-driven approaches to estimate heterogeneous exposure effects. However, these methods do not directly identify effect modifiers and estimate corresponding subgroup effects. Consequently, additional analysis techniques are required to use these methods in the context of effect measure modification analyses. While no data-driven method or technique can identify effect modifiers and domain expertise is still required, they may serve an important role in the discovery of vulnerable subgroups when prior knowledge is not available. We summarize and provide the intuition behind these machine learning methods and discuss how they may be employed for effect measure modification analyses to serve as a reference for public health researchers. We discuss their implementation in R with annotated syntax and demonstrate their application by assessing the heterogeneous effects of drought on stunting among children in the Demographic and Health survey data set as a case study.</div></div>","PeriodicalId":47780,"journal":{"name":"Ssm-Population Health","volume":"29 ","pages":"Article 101764"},"PeriodicalIF":3.6000,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ssm-Population Health","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2352827325000187","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
引用次数: 0
Abstract
A primary concern of public health researchers involves identifying and quantifying heterogeneous exposure effects across population subgroups. Understanding the magnitude and direction of these effects on a given scale provides researchers the ability to recommend policy prescriptions and assess the external validity of findings. Traditional methods for effect measure modification analyses require manual model specification that is often impractical or not feasible to conduct in high-dimensional settings. Recent developments in machine learning aim to solve this issue by utilizing data-driven approaches to estimate heterogeneous exposure effects. However, these methods do not directly identify effect modifiers and estimate corresponding subgroup effects. Consequently, additional analysis techniques are required to use these methods in the context of effect measure modification analyses. While no data-driven method or technique can identify effect modifiers and domain expertise is still required, they may serve an important role in the discovery of vulnerable subgroups when prior knowledge is not available. We summarize and provide the intuition behind these machine learning methods and discuss how they may be employed for effect measure modification analyses to serve as a reference for public health researchers. We discuss their implementation in R with annotated syntax and demonstrate their application by assessing the heterogeneous effects of drought on stunting among children in the Demographic and Health survey data set as a case study.
期刊介绍:
SSM - Population Health. The new online only, open access, peer reviewed journal in all areas relating Social Science research to population health. SSM - Population Health shares the same Editors-in Chief and general approach to manuscripts as its sister journal, Social Science & Medicine. The journal takes a broad approach to the field especially welcoming interdisciplinary papers from across the Social Sciences and allied areas. SSM - Population Health offers an alternative outlet for work which might not be considered, or is classed as ''out of scope'' elsewhere, and prioritizes fast peer review and publication to the benefit of authors and readers. The journal welcomes all types of paper from traditional primary research articles, replication studies, short communications, methodological studies, instrument validation, opinion pieces, literature reviews, etc. SSM - Population Health also offers the opportunity to publish special issues or sections to reflect current interest and research in topical or developing areas. The journal fully supports authors wanting to present their research in an innovative fashion though the use of multimedia formats.