{"title":"Multiple Imputation of Race and Hispanic Ethnicity in National Surveillance Data for Chlamydia, Gonorrhea, and Syphilis.","authors":"Tracy Pondo, Elizabeth Torrone, Melissa Pagaoa","doi":"10.1097/OLQ.0000000000002047","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Disease burden of sexually transmitted infections such as chlamydia, gonorrhea, and syphilis is often compared across age categories, sex categories, and race and ethnicity categories. Missing data may prevent researchers from accurately characterizing health disparities between populations. This article describes the methods used to impute race and Hispanic ethnicity in a large national surveillance data set.</p><p><strong>Methods: </strong>All US cases of chlamydia, gonorrhea, and syphilis (excluding congenital syphilis) reported through the National Notifiable Diseases Surveillance System from the year 2019 were included in the analyses. We used fully conditional specification to impute missing race and Hispanic ethnicity data. After imputation, reported case rates were calculated, by disease, for each race and Hispanic ethnicity category using Vintage 2019 Population and Housing Unit Estimates from the US Census. We then used case counts from subsets that contained only complete race and Hispanic ethnicity information to investigate if the confidence intervals from the multiply imputed data included the observed number of cases in each race and Hispanic ethnicity category.</p><p><strong>Results: </strong>Among the 2,553,038 cases reported in 2019, race and Hispanic ethnicity were multiply imputed for 9% of syphilis cases, 22% of gonorrhea cases, and 33% of chlamydia cases. In the subset analyses, every nonzero rate of reported cases was contained within the confidence intervals that were calculated from multiply imputed data.</p><p><strong>Conclusions: </strong>Confidence intervals that account for the uncertainty of the predictions are an advantage of multiple imputation over complete-case analysis because a realistic variance estimate allows for valid hypothesis testing results.</p>","PeriodicalId":21837,"journal":{"name":"Sexually transmitted diseases","volume":" ","pages":"719-727"},"PeriodicalIF":2.4000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11560705/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sexually transmitted diseases","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1097/OLQ.0000000000002047","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/6/17 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"INFECTIOUS DISEASES","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Disease burden of sexually transmitted infections such as chlamydia, gonorrhea, and syphilis is often compared across age categories, sex categories, and race and ethnicity categories. Missing data may prevent researchers from accurately characterizing health disparities between populations. This article describes the methods used to impute race and Hispanic ethnicity in a large national surveillance data set.
Methods: All US cases of chlamydia, gonorrhea, and syphilis (excluding congenital syphilis) reported through the National Notifiable Diseases Surveillance System from the year 2019 were included in the analyses. We used fully conditional specification to impute missing race and Hispanic ethnicity data. After imputation, reported case rates were calculated, by disease, for each race and Hispanic ethnicity category using Vintage 2019 Population and Housing Unit Estimates from the US Census. We then used case counts from subsets that contained only complete race and Hispanic ethnicity information to investigate if the confidence intervals from the multiply imputed data included the observed number of cases in each race and Hispanic ethnicity category.
Results: Among the 2,553,038 cases reported in 2019, race and Hispanic ethnicity were multiply imputed for 9% of syphilis cases, 22% of gonorrhea cases, and 33% of chlamydia cases. In the subset analyses, every nonzero rate of reported cases was contained within the confidence intervals that were calculated from multiply imputed data.
Conclusions: Confidence intervals that account for the uncertainty of the predictions are an advantage of multiple imputation over complete-case analysis because a realistic variance estimate allows for valid hypothesis testing results.
背景:衣原体、淋病和梅毒等性传播感染的疾病负担经常在不同年龄、性别、种族和民族之间进行比较。数据缺失可能导致研究人员无法准确描述不同人群之间的健康差异。本文介绍了在大型全国性监测数据集中推算种族和西班牙裔的方法:2019年通过国家应报疾病监测系统(NNDSS)报告的所有美国衣原体、淋病和梅毒病例(不包括先天性梅毒)都纳入了分析。我们使用全条件规范来估算缺失的种族和西班牙裔数据。估算后,使用美国人口普查提供的《2019 年人口和住房单位估算数据》(Vintage 2019 Population and Housing Unit Estimates),按疾病计算出每个种族和西班牙裔类别的报告病例率。然后,我们使用仅包含完整种族和西班牙裔信息的子集中的病例数来调查多重估算数据的置信区间是否包含每个种族和西班牙裔类别中的观察病例数:在2019年报告的2,553,038例病例中,9%的梅毒病例、22%的淋病病例和33%的衣原体病例的种族和西班牙裔是多重推算的。在子集分析中,报告病例的每一个非零比率都包含在根据多重推算数据计算出的置信区间内:考虑到预测的不确定性的置信区间是多重估算相对于完整病例分析的一个优势,因为切合实际的方差估计可以得出有效的假设检验结果。
期刊介绍:
Sexually Transmitted Diseases, the official journal of the American Sexually Transmitted Diseases Association, publishes peer-reviewed, original articles on clinical, laboratory, immunologic, epidemiologic, behavioral, public health, and historical topics pertaining to sexually transmitted diseases and related fields. Reports from the CDC and NIH provide up-to-the-minute information. A highly respected editorial board is composed of prominent scientists who are leaders in this rapidly changing field. Included in each issue are studies and developments from around the world.