{"title":"Simultaneous variable selection and estimation in semiparametric regression of mixed panel count data.","authors":"Lei Ge, Tao Hu, Yang Li","doi":"10.1093/biomtc/ujad041","DOIUrl":null,"url":null,"abstract":"<p><p>Mixed panel count data represent a common complex data structure in longitudinal survey studies. A major challenge in analyzing such data is variable selection and estimation while efficiently incorporating both the panel count and panel binary data components. Analyses in the medical literature have often ignored the panel binary component and treated it as missing with the unknown panel counts, while obviously such a simplification does not effectively utilize the original data information. In this research, we put forward a penalized likelihood variable selection and estimation procedure under the proportional mean model. A computationally efficient EM algorithm is developed that ensures sparse estimation for variable selection, and the resulting estimator is shown to have the desirable oracle property. Simulation studies assessed and confirmed the good finite-sample properties of the proposed method, and the method is applied to analyze a motivating dataset from the Health and Retirement Study.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.4000,"publicationDate":"2024-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biometrics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1093/biomtc/ujad041","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Mixed panel count data represent a common complex data structure in longitudinal survey studies. A major challenge in analyzing such data is variable selection and estimation while efficiently incorporating both the panel count and panel binary data components. Analyses in the medical literature have often ignored the panel binary component and treated it as missing with the unknown panel counts, while obviously such a simplification does not effectively utilize the original data information. In this research, we put forward a penalized likelihood variable selection and estimation procedure under the proportional mean model. A computationally efficient EM algorithm is developed that ensures sparse estimation for variable selection, and the resulting estimator is shown to have the desirable oracle property. Simulation studies assessed and confirmed the good finite-sample properties of the proposed method, and the method is applied to analyze a motivating dataset from the Health and Retirement Study.
混合面板计数数据是纵向调查研究中一种常见的复杂数据结构。分析这类数据的一个主要挑战是变量选择和估计,同时有效地纳入面板计数和面板二进制数据成分。医学文献中的分析通常会忽略面板二进制部分,并将其与未知面板计数一起视为缺失,而这种简化显然不能有效利用原始数据信息。在本研究中,我们提出了一种比例均值模型下的惩罚似然变量选择和估计程序。我们开发了一种计算效率较高的 EM 算法,该算法可确保变量选择的稀疏估计,并证明所得到的估计器具有理想的甲骨文特性。模拟研究评估并证实了所提方法的良好有限样本特性,并将该方法应用于分析健康与退休研究中的一个激励性数据集。
期刊介绍:
The International Biometric Society is an international society promoting the development and application of statistical and mathematical theory and methods in the biosciences, including agriculture, biomedical science and public health, ecology, environmental sciences, forestry, and allied disciplines. The Society welcomes as members statisticians, mathematicians, biological scientists, and others devoted to interdisciplinary efforts in advancing the collection and interpretation of information in the biosciences. The Society sponsors the biennial International Biometric Conference, held in sites throughout the world; through its National Groups and Regions, it also Society sponsors regional and local meetings.