Matching for Several Sparse Nominal Variables in a Case-Control Study of Readmission Following Surgery.

IF 1.8 4区数学 Q1 STATISTICS & PROBABILITY

American Statistician Pub Date : 2011-10-01 DOI:10.1198/tas.2011.11072

José R Zubizarreta, Caroline E Reinke, Rachel R Kelz, Jeffrey H Silber, Paul R Rosenbaum

{"title":"Matching for Several Sparse Nominal Variables in a Case-Control Study of Readmission Following Surgery.","authors":"José R Zubizarreta, Caroline E Reinke, Rachel R Kelz, Jeffrey H Silber, Paul R Rosenbaum","doi":"10.1198/tas.2011.11072","DOIUrl":null,"url":null,"abstract":"<p><p>Matching for several nominal covariates with many levels has usually been thought to be difficult because these covariates combine to form an enormous number of interaction categories with few if any people in most such categories. Moreover, because nominal variables are not ordered, there is often no notion of a \"close substitute\" when an exact match is unavailable. In a case-control study of the risk factors for read-mission within 30 days of surgery in the Medicare population, we wished to match for 47 hospitals, 15 surgical procedures grouped or nested within 5 procedure groups, two genders, or 47 × 15 × 2 = 1410 categories. In addition, we wished to match as closely as possible for the continuous variable age (65-80 years). There were 1380 readmitted patients or cases. A fractional factorial experiment may balance main effects and low-order interactions without achieving balance for high-order interactions. In an analogous fashion, we balance certain main effects and low-order interactions among the covariates; moreover, we use as many exactly matched pairs as possible. This is done by creating a match that is exact for several variables, with a close match for age, and both a \"near-exact match\" and a \"finely balanced match\" for another nominal variable, in this case a 47 × 5 = 235 category variable representing the interaction of the 47 hospitals and the five surgical procedure groups. The method is easily implemented in R.</p>","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"65 4","pages":"229-238"},"PeriodicalIF":1.8000,"publicationDate":"2011-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1198/tas.2011.11072","citationCount":"42","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"American Statistician","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1198/tas.2011.11072","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}

引用次数: 42

Abstract

Matching for several nominal covariates with many levels has usually been thought to be difficult because these covariates combine to form an enormous number of interaction categories with few if any people in most such categories. Moreover, because nominal variables are not ordered, there is often no notion of a "close substitute" when an exact match is unavailable. In a case-control study of the risk factors for read-mission within 30 days of surgery in the Medicare population, we wished to match for 47 hospitals, 15 surgical procedures grouped or nested within 5 procedure groups, two genders, or 47 × 15 × 2 = 1410 categories. In addition, we wished to match as closely as possible for the continuous variable age (65-80 years). There were 1380 readmitted patients or cases. A fractional factorial experiment may balance main effects and low-order interactions without achieving balance for high-order interactions. In an analogous fashion, we balance certain main effects and low-order interactions among the covariates; moreover, we use as many exactly matched pairs as possible. This is done by creating a match that is exact for several variables, with a close match for age, and both a "near-exact match" and a "finely balanced match" for another nominal variable, in this case a 47 × 5 = 235 category variable representing the interaction of the 47 hospitals and the five surgical procedure groups. The method is easily implemented in R.

查看原文本刊更多论文

手术后再入院病例对照研究中几个稀疏标称变量的匹配。

与多个水平的几个名义协变量匹配通常被认为是困难的，因为这些协变量组合在一起形成了大量的交互类别，其中大多数类别中几乎没有人。此外，由于名义变量不是有序的，所以当无法获得精确匹配时，通常没有“接近替代”的概念。在一项针对医疗保险人群手术后30天内阅读任务危险因素的病例对照研究中，我们希望匹配47家医院、15种外科手术、5种手术组、两种性别或47 × 15 × 2 = 1410种类别。此外，我们希望尽可能地匹配连续可变年龄(65-80岁)。再入院病人或病例1380例。分数析因实验可以平衡主效应和低阶相互作用，而不能平衡高阶相互作用。以类似的方式，我们平衡了协变量之间的某些主效应和低阶相互作用;此外，我们使用尽可能多的完全匹配的对。这是通过创建几个变量的精确匹配来实现的，其中年龄的匹配非常接近，并且为另一个名义变量创建“接近精确匹配”和“精细平衡匹配”，在本例中，47 × 5 = 235类别变量表示47家医院和5个外科手术组的相互作用。该方法很容易在R中实现。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

American Statistician 数学-统计学与概率论

CiteScore

3.50

自引率

5.60%

发文量

审稿时长

>12 weeks

期刊介绍： Are you looking for general-interest articles about current national and international statistical problems and programs; interesting and fun articles of a general nature about statistics and its applications; or the teaching of statistics? Then you are looking for The American Statistician (TAS), published quarterly by the American Statistical Association. TAS contains timely articles organized into the following sections: Statistical Practice, General, Teacher''s Corner, History Corner, Interdisciplinary, Statistical Computing and Graphics, Reviews of Books and Teaching Materials, and Letters to the Editor.