Melody S. Goodman , Ariana Lopez , Anarina L. Murillo , Kristyn A. Pierce
{"title":"线性和逻辑回归模型中编码竞争方法的比较。","authors":"Melody S. Goodman , Ariana Lopez , Anarina L. Murillo , Kristyn A. Pierce","doi":"10.1016/j.annepidem.2025.10.005","DOIUrl":null,"url":null,"abstract":"<div><div>In many public health and clinical research studies that use regression models for analyses, race is often considered a confounder and \"controlled\" for in the regression model with simple indicators for race and non-Hispanic White as the reference group, without much introspection from the data analyst. From a health equity perspective, multiple issues exist with this approach. We examine and compare several methods for coding race in linear and logistic regression models. We compare several coding methods using a sample of 8097 participants (≥18 years old) from the 2020 New York City Community Health Survey. To illustrate the importance of coding methods for race, we conducted regression analyses to compare the results from six coding approaches: dummy, simple effect, difference (forward and backward), deviation, and analyst-defined coding. Body mass index measured continuously and diabetes status measured dichotomously were the outcome variables in the linear and logistic regression models. Results showed that selecting a coding method has implications for identifying racial health inequities. The reference group selection is critical to measuring racial inequities in health outcomes. This study emphasizes the need to consider the impact of coding techniques on research study design, particularly when racial health inequities are the research focus.</div></div>","PeriodicalId":50767,"journal":{"name":"Annals of Epidemiology","volume":"112 ","pages":"Pages 15-22"},"PeriodicalIF":3.0000,"publicationDate":"2025-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A comparison of methods for coding race in linear and logistic regression models\",\"authors\":\"Melody S. Goodman , Ariana Lopez , Anarina L. Murillo , Kristyn A. Pierce\",\"doi\":\"10.1016/j.annepidem.2025.10.005\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In many public health and clinical research studies that use regression models for analyses, race is often considered a confounder and \\\"controlled\\\" for in the regression model with simple indicators for race and non-Hispanic White as the reference group, without much introspection from the data analyst. From a health equity perspective, multiple issues exist with this approach. We examine and compare several methods for coding race in linear and logistic regression models. We compare several coding methods using a sample of 8097 participants (≥18 years old) from the 2020 New York City Community Health Survey. To illustrate the importance of coding methods for race, we conducted regression analyses to compare the results from six coding approaches: dummy, simple effect, difference (forward and backward), deviation, and analyst-defined coding. Body mass index measured continuously and diabetes status measured dichotomously were the outcome variables in the linear and logistic regression models. Results showed that selecting a coding method has implications for identifying racial health inequities. The reference group selection is critical to measuring racial inequities in health outcomes. This study emphasizes the need to consider the impact of coding techniques on research study design, particularly when racial health inequities are the research focus.</div></div>\",\"PeriodicalId\":50767,\"journal\":{\"name\":\"Annals of Epidemiology\",\"volume\":\"112 \",\"pages\":\"Pages 15-22\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2025-10-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Annals of Epidemiology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1047279725002923\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Epidemiology","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1047279725002923","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
A comparison of methods for coding race in linear and logistic regression models
In many public health and clinical research studies that use regression models for analyses, race is often considered a confounder and "controlled" for in the regression model with simple indicators for race and non-Hispanic White as the reference group, without much introspection from the data analyst. From a health equity perspective, multiple issues exist with this approach. We examine and compare several methods for coding race in linear and logistic regression models. We compare several coding methods using a sample of 8097 participants (≥18 years old) from the 2020 New York City Community Health Survey. To illustrate the importance of coding methods for race, we conducted regression analyses to compare the results from six coding approaches: dummy, simple effect, difference (forward and backward), deviation, and analyst-defined coding. Body mass index measured continuously and diabetes status measured dichotomously were the outcome variables in the linear and logistic regression models. Results showed that selecting a coding method has implications for identifying racial health inequities. The reference group selection is critical to measuring racial inequities in health outcomes. This study emphasizes the need to consider the impact of coding techniques on research study design, particularly when racial health inequities are the research focus.
期刊介绍:
The journal emphasizes the application of epidemiologic methods to issues that affect the distribution and determinants of human illness in diverse contexts. Its primary focus is on chronic and acute conditions of diverse etiologies and of major importance to clinical medicine, public health, and health care delivery.