A comparison of methods for coding race in linear and logistic regression models

IF 3 3区 医学 Q1 PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH
Melody S. Goodman , Ariana Lopez , Anarina L. Murillo , Kristyn A. Pierce
{"title":"A comparison of methods for coding race in linear and logistic regression models","authors":"Melody S. Goodman ,&nbsp;Ariana Lopez ,&nbsp;Anarina L. Murillo ,&nbsp;Kristyn A. Pierce","doi":"10.1016/j.annepidem.2025.10.005","DOIUrl":null,"url":null,"abstract":"<div><div>In many public health and clinical research studies that use regression models for analyses, race is often considered a confounder and \"controlled\" for in the regression model with simple indicators for race and non-Hispanic White as the reference group, without much introspection from the data analyst. From a health equity perspective, multiple issues exist with this approach. We examine and compare several methods for coding race in linear and logistic regression models. We compare several coding methods using a sample of 8097 participants (≥18 years old) from the 2020 New York City Community Health Survey. To illustrate the importance of coding methods for race, we conducted regression analyses to compare the results from six coding approaches: dummy, simple effect, difference (forward and backward), deviation, and analyst-defined coding. Body mass index measured continuously and diabetes status measured dichotomously were the outcome variables in the linear and logistic regression models. Results showed that selecting a coding method has implications for identifying racial health inequities. The reference group selection is critical to measuring racial inequities in health outcomes. This study emphasizes the need to consider the impact of coding techniques on research study design, particularly when racial health inequities are the research focus.</div></div>","PeriodicalId":50767,"journal":{"name":"Annals of Epidemiology","volume":"112 ","pages":"Pages 15-22"},"PeriodicalIF":3.0000,"publicationDate":"2025-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Epidemiology","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1047279725002923","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
引用次数: 0

Abstract

In many public health and clinical research studies that use regression models for analyses, race is often considered a confounder and "controlled" for in the regression model with simple indicators for race and non-Hispanic White as the reference group, without much introspection from the data analyst. From a health equity perspective, multiple issues exist with this approach. We examine and compare several methods for coding race in linear and logistic regression models. We compare several coding methods using a sample of 8097 participants (≥18 years old) from the 2020 New York City Community Health Survey. To illustrate the importance of coding methods for race, we conducted regression analyses to compare the results from six coding approaches: dummy, simple effect, difference (forward and backward), deviation, and analyst-defined coding. Body mass index measured continuously and diabetes status measured dichotomously were the outcome variables in the linear and logistic regression models. Results showed that selecting a coding method has implications for identifying racial health inequities. The reference group selection is critical to measuring racial inequities in health outcomes. This study emphasizes the need to consider the impact of coding techniques on research study design, particularly when racial health inequities are the research focus.
线性和逻辑回归模型中编码竞争方法的比较。
在许多使用回归模型进行分析的公共卫生和临床研究中,种族通常被认为是一个混杂因素,并且在回归模型中以种族和非西班牙裔白人作为参考组的简单指标中被认为是“受控”的,而数据分析师没有进行多少自省。从卫生公平的角度来看,这种方法存在多重问题。我们在线性和逻辑回归模型中检验和比较了几种编码竞赛的方法。我们使用来自2020年纽约市社区健康调查的8,097名参与者(≥18岁)的样本比较了几种编码方法。为了说明编码方法对种族的重要性,我们进行了回归分析,比较了六种编码方法的结果:虚拟、简单效应、差异(向前和向后)、偏差和分析师定义的编码。连续测量体重指数和二分类测量糖尿病状态是线性和逻辑回归模型的结果变量。结果表明,选择一种编码方法对识别种族健康不平等具有重要意义。参照组的选择对于衡量健康结果中的种族不平等至关重要。本研究强调需要考虑编码技术对研究研究设计的影响,特别是当种族健康不平等是研究重点时。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Annals of Epidemiology
Annals of Epidemiology 医学-公共卫生、环境卫生与职业卫生
CiteScore
7.40
自引率
1.80%
发文量
207
审稿时长
59 days
期刊介绍: The journal emphasizes the application of epidemiologic methods to issues that affect the distribution and determinants of human illness in diverse contexts. Its primary focus is on chronic and acute conditions of diverse etiologies and of major importance to clinical medicine, public health, and health care delivery.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信