Yunyu Xiao, Yuan Meng, Timothy T. Brown, Alexander C. Tsai, Lonnie R. Snowden, Julian Chun-Chung Chow, Jyotishman Pathak, J. John Mann
{"title":"Machine learning to investigate policy-relevant social determinants of health and suicide rates in the United States","authors":"Yunyu Xiao, Yuan Meng, Timothy T. Brown, Alexander C. Tsai, Lonnie R. Snowden, Julian Chun-Chung Chow, Jyotishman Pathak, J. John Mann","doi":"10.1038/s44220-025-00424-4","DOIUrl":null,"url":null,"abstract":"This study aimed to categorize county clusters of multidimensional social determinants of health (SDOH) using unsupervised machine learning and to analyze their association with county-level suicide rates, considering temporal, geographic and demographic variation. We analyzed aggregated SDOH data across 3,018 US counties for 2009, 2014 and 2019, which were linked to county-level suicide rates from the National Vital Statistics System. We identified three distinct SDOH clusters: ‘REMOTE’ (rural, elderly, marginalized environments, old housing, traditional systems, empty houses), ‘COPE’ (complex family dynamics, high consumption of health services, poverty, extreme heat) and ‘DIVERSE’ (dense, immigrant rich, environmentally challenged, economically unequal, racial/ethnic diversity, saturated health care, expensive housing). We used negative binomial regression after identifying clusters to estimate the associations between county-level SDOH clusters and suicide rates. Compared with other clusters, REMOTE was associated with higher overall suicide rates, particularly among men; COPE showed elevated suicide rates among whites; and DIVERSE exhibited increased rates among women and Black and Hispanic populations. The distribution of suicide rates across US states corresponded to the variations in SDOH cluster distribution within each state. These findings provide a foundation for designing more effective, data-driven suicide prevention strategies tailored to specific regional and demographic contexts. This study addresses variations in suicide rates across the United States and the impact of three county-level clusters of social determinants of health characteristics. The authors used unsupervised machine learning to analyze data from 2009 to 2019, revealing that remote areas, characterized by rurality and older populations, had the highest suicide rates, highlighting the need for addressing disparities with targeted interventions.","PeriodicalId":74247,"journal":{"name":"Nature mental health","volume":"3 6","pages":"675-684"},"PeriodicalIF":8.7000,"publicationDate":"2025-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature mental health","FirstCategoryId":"1085","ListUrlMain":"https://www.nature.com/articles/s44220-025-00424-4","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This study aimed to categorize county clusters of multidimensional social determinants of health (SDOH) using unsupervised machine learning and to analyze their association with county-level suicide rates, considering temporal, geographic and demographic variation. We analyzed aggregated SDOH data across 3,018 US counties for 2009, 2014 and 2019, which were linked to county-level suicide rates from the National Vital Statistics System. We identified three distinct SDOH clusters: ‘REMOTE’ (rural, elderly, marginalized environments, old housing, traditional systems, empty houses), ‘COPE’ (complex family dynamics, high consumption of health services, poverty, extreme heat) and ‘DIVERSE’ (dense, immigrant rich, environmentally challenged, economically unequal, racial/ethnic diversity, saturated health care, expensive housing). We used negative binomial regression after identifying clusters to estimate the associations between county-level SDOH clusters and suicide rates. Compared with other clusters, REMOTE was associated with higher overall suicide rates, particularly among men; COPE showed elevated suicide rates among whites; and DIVERSE exhibited increased rates among women and Black and Hispanic populations. The distribution of suicide rates across US states corresponded to the variations in SDOH cluster distribution within each state. These findings provide a foundation for designing more effective, data-driven suicide prevention strategies tailored to specific regional and demographic contexts. This study addresses variations in suicide rates across the United States and the impact of three county-level clusters of social determinants of health characteristics. The authors used unsupervised machine learning to analyze data from 2009 to 2019, revealing that remote areas, characterized by rurality and older populations, had the highest suicide rates, highlighting the need for addressing disparities with targeted interventions.