Tingting Cai , Jianbo Li , Qin Zhou , Songlou Yin , Riquan Zhang
{"title":"Subgroup detection based on partially linear additive individualized model with missing data in response","authors":"Tingting Cai , Jianbo Li , Qin Zhou , Songlou Yin , Riquan Zhang","doi":"10.1016/j.csda.2023.107910","DOIUrl":null,"url":null,"abstract":"<div><p><span>Based on partially linear additive individualized model, a fusion-penalized inverse probability<span> weighted least squares method<span> is proposed to detect the subgroup for missing data in response. Firstly, the B-spline technique is used to approximate the unknown additive individualized functions and then an inverse probability weighted quadratic loss function<span> is established with fusion penalty on the difference of subject-wise B-spline coefficients. Secondly, minimization of such quadratic loss function leads to the estimation of linear regression parameters<span> and individualized B spline coefficients. With a proper tuning parameter, some differences in penalty term are shrunk into zero and thus the corresponding subjects will be clustered into the same subgroup. Thirdly, a </span></span></span></span></span>clustering method<span> is developed to automatically determine the subgroup membership for the subjects with missing data. Fourthly, large sample properties of resulting estimates are given under some regular conditions. Finally, numerical studies are presented to illustrate the performance of the proposed subgroup detection method.</span></p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":null,"pages":null},"PeriodicalIF":1.5000,"publicationDate":"2023-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Statistics & Data Analysis","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167947323002219","RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
Based on partially linear additive individualized model, a fusion-penalized inverse probability weighted least squares method is proposed to detect the subgroup for missing data in response. Firstly, the B-spline technique is used to approximate the unknown additive individualized functions and then an inverse probability weighted quadratic loss function is established with fusion penalty on the difference of subject-wise B-spline coefficients. Secondly, minimization of such quadratic loss function leads to the estimation of linear regression parameters and individualized B spline coefficients. With a proper tuning parameter, some differences in penalty term are shrunk into zero and thus the corresponding subjects will be clustered into the same subgroup. Thirdly, a clustering method is developed to automatically determine the subgroup membership for the subjects with missing data. Fourthly, large sample properties of resulting estimates are given under some regular conditions. Finally, numerical studies are presented to illustrate the performance of the proposed subgroup detection method.
基于部分线性加权个体化模型,提出了一种融合-惩罚逆概率加权最小二乘法来检测响应中的缺失数据子组。首先,使用 B-样条技术来近似未知的加法个体化函数,然后建立反概率加权二次损失函数,并对受试者的 B-样条系数之差进行融合惩罚。其次,通过最小化二次损失函数,可以估计线性回归参数和个性化 B 样条系数。通过适当的调整参数,惩罚项中的一些差异会被缩小为零,从而将相应的受试者聚类到同一分组中。第三,开发了一种聚类方法,用于自动确定数据缺失受试者的子群成员资格。第四,给出了在一些常规条件下估计结果的大样本特性。最后,通过数值研究说明了所提出的亚组检测方法的性能。
期刊介绍:
Computational Statistics and Data Analysis (CSDA), an Official Publication of the network Computational and Methodological Statistics (CMStatistics) and of the International Association for Statistical Computing (IASC), is an international journal dedicated to the dissemination of methodological research and applications in the areas of computational statistics and data analysis. The journal consists of four refereed sections which are divided into the following subject areas:
I) Computational Statistics - Manuscripts dealing with: 1) the explicit impact of computers on statistical methodology (e.g., Bayesian computing, bioinformatics,computer graphics, computer intensive inferential methods, data exploration, data mining, expert systems, heuristics, knowledge based systems, machine learning, neural networks, numerical and optimization methods, parallel computing, statistical databases, statistical systems), and 2) the development, evaluation and validation of statistical software and algorithms. Software and algorithms can be submitted with manuscripts and will be stored together with the online article.
II) Statistical Methodology for Data Analysis - Manuscripts dealing with novel and original data analytical strategies and methodologies applied in biostatistics (design and analytic methods for clinical trials, epidemiological studies, statistical genetics, or genetic/environmental interactions), chemometrics, classification, data exploration, density estimation, design of experiments, environmetrics, education, image analysis, marketing, model free data exploration, pattern recognition, psychometrics, statistical physics, image processing, robust procedures.
[...]
III) Special Applications - [...]
IV) Annals of Statistical Data Science [...]