Bounds for selection bias using outcome probabilities

Q3 Mathematics

Epidemiologic Methods Pub Date : 2024-01-01 DOI:10.1515/em-2023-0033

Stina Zetterstrom

{"title":"Bounds for selection bias using outcome probabilities","authors":"Stina Zetterstrom","doi":"10.1515/em-2023-0033","DOIUrl":null,"url":null,"abstract":"\n \n \n Determining the causal relationship between exposure and outcome is the goal of many observational studies. However, the selection of subjects into the study population, either voluntary or involuntary, may result in estimates that suffer from selection bias. To assess the robustness of the estimates as well as the magnitude of the bias, bounds for the bias can be calculated. Previous bounds for selection bias often require the specification of unknown relative risks, which might be difficult to provide. Here, alternative bounds based on observed data and unknown outcome probabilities are proposed. These unknown probabilities may be easier to specify than unknown relative risks.\n \n \n \n I derive alternative bounds from the definitions of the causal estimands using the potential outcomes framework, under specific assumptions. The bounds are expressed using observed data and unobserved outcome probabilities. The bounds are compared to previously reported bounds in a simulation study. Furthermore, a study of perinatal risk factors for type 1 diabetes is provided as a motivating example.\n \n \n \n I show that the proposed bounds are often informative when the exposure and outcome are sufficiently common, especially for the risk difference in the total population. It is also noted that the proposed bounds can be uninformative when the exposure and outcome are rare. Furthermore, it is noted that previously proposed assumption-free bounds are special cases of the new bounds when the sensitivity parameters are set to their most conservative values.\n \n \n \n Depending on the data generating process and causal estimand of interest, the proposed bounds can be tighter or wider than the reference bounds. Importantly, in cases with sufficiently common outcome and exposure, the proposed bounds are often informative, especially for the risk difference in the total population. It is also noted that, in some cases, the new bounds can be wider than the reference bounds. However, the proposed bounds based on unobserved probabilities may in some cases be easier to specify than the reference bounds based on unknown relative risks.\n","PeriodicalId":37999,"journal":{"name":"Epidemiologic Methods","volume":"128 5-6","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Epidemiologic Methods","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1515/em-2023-0033","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Mathematics","Score":null,"Total":0}

引用次数: 0

Abstract

Determining the causal relationship between exposure and outcome is the goal of many observational studies. However, the selection of subjects into the study population, either voluntary or involuntary, may result in estimates that suffer from selection bias. To assess the robustness of the estimates as well as the magnitude of the bias, bounds for the bias can be calculated. Previous bounds for selection bias often require the specification of unknown relative risks, which might be difficult to provide. Here, alternative bounds based on observed data and unknown outcome probabilities are proposed. These unknown probabilities may be easier to specify than unknown relative risks. I derive alternative bounds from the definitions of the causal estimands using the potential outcomes framework, under specific assumptions. The bounds are expressed using observed data and unobserved outcome probabilities. The bounds are compared to previously reported bounds in a simulation study. Furthermore, a study of perinatal risk factors for type 1 diabetes is provided as a motivating example. I show that the proposed bounds are often informative when the exposure and outcome are sufficiently common, especially for the risk difference in the total population. It is also noted that the proposed bounds can be uninformative when the exposure and outcome are rare. Furthermore, it is noted that previously proposed assumption-free bounds are special cases of the new bounds when the sensitivity parameters are set to their most conservative values. Depending on the data generating process and causal estimand of interest, the proposed bounds can be tighter or wider than the reference bounds. Importantly, in cases with sufficiently common outcome and exposure, the proposed bounds are often informative, especially for the risk difference in the total population. It is also noted that, in some cases, the new bounds can be wider than the reference bounds. However, the proposed bounds based on unobserved probabilities may in some cases be easier to specify than the reference bounds based on unknown relative risks.

查看原文本刊更多论文

使用结果概率的选择偏差界限

确定暴露与结果之间的因果关系是许多观察性研究的目标。然而，自愿或非自愿地将受试者选入研究人群可能会导致估计值出现选择偏差。为了评估估计值的稳健性以及偏倚的程度，可以计算偏倚的界限。以往的选择偏差界限往往需要说明未知的相对风险，而这可能很难提供。这里提出了基于观测数据和未知结果概率的替代界限。这些未知概率可能比未知相对风险更容易说明。我利用潜在结果框架，在特定假设条件下，从因果关系估计值的定义中推导出替代界限。这些界限使用观察到的数据和未观察到的结果概率来表示。在一项模拟研究中，这些界限与之前报告的界限进行了比较。此外，还提供了一个关于 1 型糖尿病围产期风险因素的研究作为激励性实例。我的研究表明，当暴露和结果足够常见时，所提出的界限往往具有参考价值，特别是对于总人口中的风险差异。我还指出，当暴露因素和结果都很罕见时，所提出的界限可能无法提供信息。此外，我们还注意到，当敏感性参数设置为最保守值时，以前提出的无假设界限是新界限的特例。根据数据生成过程和相关因果估计值的不同，提出的边界可能比参考边界更窄或更宽。重要的是，在结果和暴露足够普遍的情况下，建议的界限往往具有参考价值，特别是对总人口的风险差异而言。我们还注意到，在某些情况下，新的界限可能比参考界限更宽。不过，在某些情况下，基于未观测概率的建议界限可能比基于未知相对风险的参考界限更容易明确。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Epidemiologic Methods Mathematics-Applied Mathematics

CiteScore

2.10

自引率

0.00%

发文量

期刊介绍： Epidemiologic Methods (EM) seeks contributions comparable to those of the leading epidemiologic journals, but also invites papers that may be more technical or of greater length than what has traditionally been allowed by journals in epidemiology. Applications and examples with real data to illustrate methodology are strongly encouraged but not required. Topics. genetic epidemiology, infectious disease, pharmaco-epidemiology, ecologic studies, environmental exposures, screening, surveillance, social networks, comparative effectiveness, statistical modeling, causal inference, measurement error, study design, meta-analysis