实现半参数数据与个体层面数据融合的统一理论

arXiv - STAT - Statistics Theory Pub Date : 2024-09-16 DOI:arxiv-2409.09973

Ellen GrahamUniversity of Washington, Marco CaroneUniversity of Washington, Andrea RotnitzkyUniversity of Washington

{"title":"实现半参数数据与个体层面数据融合的统一理论","authors":"Ellen GrahamUniversity of Washington, Marco CaroneUniversity of Washington, Andrea RotnitzkyUniversity of Washington","doi":"arxiv-2409.09973","DOIUrl":null,"url":null,"abstract":"We address the goal of conducting inference about a smooth finite-dimensional\nparameter by utilizing individual-level data from various independent sources.\nRecent advancements have led to the development of a comprehensive theory\ncapable of handling scenarios where different data sources align with, possibly\ndistinct subsets of, conditional distributions of a single factorization of the\njoint target distribution. While this theory proves effective in many\nsignificant contexts, it falls short in certain common data fusion problems,\nsuch as two-sample instrumental variable analysis, settings that integrate data\nfrom epidemiological studies with diverse designs (e.g., prospective cohorts\nand retrospective case-control studies), and studies with variables prone to\nmeasurement error that are supplemented by validation studies. In this paper,\nwe extend the aforementioned comprehensive theory to allow for the fusion of\nindividual-level data from sources aligned with conditional distributions that\ndo not correspond to a single factorization of the target distribution.\nAssuming conditional and marginal distribution alignments, we provide universal\nresults that characterize the class of all influence functions of regular\nasymptotically linear estimators and the efficient influence function of any\npathwise differentiable parameter, irrespective of the number of data sources,\nthe specific parameter of interest, or the statistical model for the target\ndistribution. This theory paves the way for machine-learning debiased,\nsemiparametric efficient estimation.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"33 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Towards a Unified Theory for Semiparametric Data Fusion with Individual-Level Data\",\"authors\":\"Ellen GrahamUniversity of Washington, Marco CaroneUniversity of Washington, Andrea RotnitzkyUniversity of Washington\",\"doi\":\"arxiv-2409.09973\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We address the goal of conducting inference about a smooth finite-dimensional\\nparameter by utilizing individual-level data from various independent sources.\\nRecent advancements have led to the development of a comprehensive theory\\ncapable of handling scenarios where different data sources align with, possibly\\ndistinct subsets of, conditional distributions of a single factorization of the\\njoint target distribution. While this theory proves effective in many\\nsignificant contexts, it falls short in certain common data fusion problems,\\nsuch as two-sample instrumental variable analysis, settings that integrate data\\nfrom epidemiological studies with diverse designs (e.g., prospective cohorts\\nand retrospective case-control studies), and studies with variables prone to\\nmeasurement error that are supplemented by validation studies. In this paper,\\nwe extend the aforementioned comprehensive theory to allow for the fusion of\\nindividual-level data from sources aligned with conditional distributions that\\ndo not correspond to a single factorization of the target distribution.\\nAssuming conditional and marginal distribution alignments, we provide universal\\nresults that characterize the class of all influence functions of regular\\nasymptotically linear estimators and the efficient influence function of any\\npathwise differentiable parameter, irrespective of the number of data sources,\\nthe specific parameter of interest, or the statistical model for the target\\ndistribution. This theory paves the way for machine-learning debiased,\\nsemiparametric efficient estimation.\",\"PeriodicalId\":501379,\"journal\":{\"name\":\"arXiv - STAT - Statistics Theory\",\"volume\":\"33 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - STAT - Statistics Theory\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.09973\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Statistics Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.09973","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

我们的目标是利用来自不同独立来源的个体级数据，对一个平滑的有限维参数进行推断。最近的研究进展促使我们发展出一套全面的理论，能够处理不同数据源与联合目标分布的单一因子化的条件分布（可能是其不同子集）相一致的情况。虽然这一理论在许多重要场合证明是有效的，但在某些常见的数据融合问题上，如双样本工具变量分析、整合来自不同设计的流行病学研究（如前瞻性队列和回顾性病例对照研究）的数据的设置，以及具有易产生测量误差的变量并辅以验证研究的研究中，它就显得不足了。在本文中，我们扩展了上述综合理论，允许融合来自条件分布对齐源的个体水平数据，这些条件分布并不对应于目标分布的单一因子化。假设条件分布和边际分布对齐，我们提供了通用结果，描述了正则渐近线性估计器的所有影响函数类，以及任何路径可微参数的有效影响函数，而与数据源的数量、感兴趣的特定参数或目标分布的统计模型无关。这一理论为机器学习去偏的、半参数的高效估计铺平了道路。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Towards a Unified Theory for Semiparametric Data Fusion with Individual-Level Data

We address the goal of conducting inference about a smooth finite-dimensional parameter by utilizing individual-level data from various independent sources. Recent advancements have led to the development of a comprehensive theory capable of handling scenarios where different data sources align with, possibly distinct subsets of, conditional distributions of a single factorization of the joint target distribution. While this theory proves effective in many significant contexts, it falls short in certain common data fusion problems, such as two-sample instrumental variable analysis, settings that integrate data from epidemiological studies with diverse designs (e.g., prospective cohorts and retrospective case-control studies), and studies with variables prone to measurement error that are supplemented by validation studies. In this paper, we extend the aforementioned comprehensive theory to allow for the fusion of individual-level data from sources aligned with conditional distributions that do not correspond to a single factorization of the target distribution. Assuming conditional and marginal distribution alignments, we provide universal results that characterize the class of all influence functions of regular asymptotically linear estimators and the efficient influence function of any pathwise differentiable parameter, irrespective of the number of data sources, the specific parameter of interest, or the statistical model for the target distribution. This theory paves the way for machine-learning debiased, semiparametric efficient estimation.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - STAT - Statistics Theory

自引率

0.00%

发文量