Ellen GrahamUniversity of Washington, Marco CaroneUniversity of Washington, Andrea RotnitzkyUniversity of Washington
{"title":"实现半参数数据与个体层面数据融合的统一理论","authors":"Ellen GrahamUniversity of Washington, Marco CaroneUniversity of Washington, Andrea RotnitzkyUniversity of Washington","doi":"arxiv-2409.09973","DOIUrl":null,"url":null,"abstract":"We address the goal of conducting inference about a smooth finite-dimensional\nparameter by utilizing individual-level data from various independent sources.\nRecent advancements have led to the development of a comprehensive theory\ncapable of handling scenarios where different data sources align with, possibly\ndistinct subsets of, conditional distributions of a single factorization of the\njoint target distribution. While this theory proves effective in many\nsignificant contexts, it falls short in certain common data fusion problems,\nsuch as two-sample instrumental variable analysis, settings that integrate data\nfrom epidemiological studies with diverse designs (e.g., prospective cohorts\nand retrospective case-control studies), and studies with variables prone to\nmeasurement error that are supplemented by validation studies. In this paper,\nwe extend the aforementioned comprehensive theory to allow for the fusion of\nindividual-level data from sources aligned with conditional distributions that\ndo not correspond to a single factorization of the target distribution.\nAssuming conditional and marginal distribution alignments, we provide universal\nresults that characterize the class of all influence functions of regular\nasymptotically linear estimators and the efficient influence function of any\npathwise differentiable parameter, irrespective of the number of data sources,\nthe specific parameter of interest, or the statistical model for the target\ndistribution. This theory paves the way for machine-learning debiased,\nsemiparametric efficient estimation.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"33 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Towards a Unified Theory for Semiparametric Data Fusion with Individual-Level Data\",\"authors\":\"Ellen GrahamUniversity of Washington, Marco CaroneUniversity of Washington, Andrea RotnitzkyUniversity of Washington\",\"doi\":\"arxiv-2409.09973\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We address the goal of conducting inference about a smooth finite-dimensional\\nparameter by utilizing individual-level data from various independent sources.\\nRecent advancements have led to the development of a comprehensive theory\\ncapable of handling scenarios where different data sources align with, possibly\\ndistinct subsets of, conditional distributions of a single factorization of the\\njoint target distribution. While this theory proves effective in many\\nsignificant contexts, it falls short in certain common data fusion problems,\\nsuch as two-sample instrumental variable analysis, settings that integrate data\\nfrom epidemiological studies with diverse designs (e.g., prospective cohorts\\nand retrospective case-control studies), and studies with variables prone to\\nmeasurement error that are supplemented by validation studies. In this paper,\\nwe extend the aforementioned comprehensive theory to allow for the fusion of\\nindividual-level data from sources aligned with conditional distributions that\\ndo not correspond to a single factorization of the target distribution.\\nAssuming conditional and marginal distribution alignments, we provide universal\\nresults that characterize the class of all influence functions of regular\\nasymptotically linear estimators and the efficient influence function of any\\npathwise differentiable parameter, irrespective of the number of data sources,\\nthe specific parameter of interest, or the statistical model for the target\\ndistribution. This theory paves the way for machine-learning debiased,\\nsemiparametric efficient estimation.\",\"PeriodicalId\":501379,\"journal\":{\"name\":\"arXiv - STAT - Statistics Theory\",\"volume\":\"33 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - STAT - Statistics Theory\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.09973\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Statistics Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.09973","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Towards a Unified Theory for Semiparametric Data Fusion with Individual-Level Data
We address the goal of conducting inference about a smooth finite-dimensional
parameter by utilizing individual-level data from various independent sources.
Recent advancements have led to the development of a comprehensive theory
capable of handling scenarios where different data sources align with, possibly
distinct subsets of, conditional distributions of a single factorization of the
joint target distribution. While this theory proves effective in many
significant contexts, it falls short in certain common data fusion problems,
such as two-sample instrumental variable analysis, settings that integrate data
from epidemiological studies with diverse designs (e.g., prospective cohorts
and retrospective case-control studies), and studies with variables prone to
measurement error that are supplemented by validation studies. In this paper,
we extend the aforementioned comprehensive theory to allow for the fusion of
individual-level data from sources aligned with conditional distributions that
do not correspond to a single factorization of the target distribution.
Assuming conditional and marginal distribution alignments, we provide universal
results that characterize the class of all influence functions of regular
asymptotically linear estimators and the efficient influence function of any
pathwise differentiable parameter, irrespective of the number of data sources,
the specific parameter of interest, or the statistical model for the target
distribution. This theory paves the way for machine-learning debiased,
semiparametric efficient estimation.