Jinyuan Liu, Xinlian Zhang, Tuo Lin, Ruohui Chen, Yuan Zhong, Tian Chen, Tsungchin Wu, Chenyu Liu, Anna Huang, Tanya T. Nguyen, Ellen E. Lee, Dilip V. Jeste, Xin M. Tu
{"title":"A New Paradigm for High‐dimensional Data: Distance‐Based Semiparametric Feature Aggregation Framework via Between‐Subject Attributes","authors":"Jinyuan Liu, Xinlian Zhang, Tuo Lin, Ruohui Chen, Yuan Zhong, Tian Chen, Tsungchin Wu, Chenyu Liu, Anna Huang, Tanya T. Nguyen, Ellen E. Lee, Dilip V. Jeste, Xin M. Tu","doi":"10.1111/sjos.12695","DOIUrl":null,"url":null,"abstract":"Abstract This article proposes a distance‐based framework incentivized by the paradigm shift towards feature aggregation for high‐dimensional data, which does not rely on the sparse‐feature assumption or the permutation‐based inference. Focusing on distance‐based outcomes that preserve information without truncating any features, a class of semiparametric regression has been developed, which encapsulates multiple sources of high‐dimensional variables using pairwise outcomes of between‐subject attributes. Further, we propose a strategy to address the interlocking correlations among pairs via the U‐statistics‐based estimating equations (UGEE), which correspond to their unique efficient influence function (EIF). Hence, the resulting semiparametric estimators are robust to distributional misspecification while enjoying root‐n consistency and asymptotic optimality to facilitate inference. In essence, the proposed approach not only circumvents information loss due to feature selection but also improves the model's interpretability and computational feasibility. Simulation studies and applications to the human microbiome and wearables data are provided, where the feature dimensions are tens of thousands. This article is protected by copyright. All rights reserved.","PeriodicalId":49567,"journal":{"name":"Scandinavian Journal of Statistics","volume":"42 s195","pages":"0"},"PeriodicalIF":1.0000,"publicationDate":"2023-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scandinavian Journal of Statistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1111/sjos.12695","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0
Abstract
Abstract This article proposes a distance‐based framework incentivized by the paradigm shift towards feature aggregation for high‐dimensional data, which does not rely on the sparse‐feature assumption or the permutation‐based inference. Focusing on distance‐based outcomes that preserve information without truncating any features, a class of semiparametric regression has been developed, which encapsulates multiple sources of high‐dimensional variables using pairwise outcomes of between‐subject attributes. Further, we propose a strategy to address the interlocking correlations among pairs via the U‐statistics‐based estimating equations (UGEE), which correspond to their unique efficient influence function (EIF). Hence, the resulting semiparametric estimators are robust to distributional misspecification while enjoying root‐n consistency and asymptotic optimality to facilitate inference. In essence, the proposed approach not only circumvents information loss due to feature selection but also improves the model's interpretability and computational feasibility. Simulation studies and applications to the human microbiome and wearables data are provided, where the feature dimensions are tens of thousands. This article is protected by copyright. All rights reserved.
期刊介绍:
The Scandinavian Journal of Statistics is internationally recognised as one of the leading statistical journals in the world. It was founded in 1974 by four Scandinavian statistical societies. Today more than eighty per cent of the manuscripts are submitted from outside Scandinavia.
It is an international journal devoted to reporting significant and innovative original contributions to statistical methodology, both theory and applications.
The journal specializes in statistical modelling showing particular appreciation of the underlying substantive research problems.
The emergence of specialized methods for analysing longitudinal and spatial data is just one example of an area of important methodological development in which the Scandinavian Journal of Statistics has a particular niche.