连续非高斯、截断和离散函数数据的函数主成分分析。

IF 1.8 4区医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Statistics in Medicine Pub Date : 2024-12-10 Epub Date: 2024-10-23 DOI:10.1002/sim.10240

Debangan Dey, Rahul Ghosal, Kathleen Merikangas, Vadim Zipunnikov

{"title":"连续非高斯、截断和离散函数数据的函数主成分分析。","authors":"Debangan Dey, Rahul Ghosal, Kathleen Merikangas, Vadim Zipunnikov","doi":"10.1002/sim.10240","DOIUrl":null,"url":null,"abstract":"Mobile health studies often collect multiple within-day self-reported assessments of participants' behavior and well-being on different scales such as physical activity (continuous scale), pain levels (truncated scale), mood states (ordinal scale), and the occurrence of daily life events (binary scale). These assessments, when indexed by time of day, can be treated and analyzed as functional data corresponding to their respective types: continuous, truncated, ordinal, and binary. Motivated by these examples, we develop a functional principal component analysis that deals with all four types of functional data in a unified manner. It employs a semiparametric Gaussian copula model, assuming a generalized latent non-paranormal process as the underlying generating mechanism for these four types of functional data. We specify latent temporal dependence using a covariance estimated through Kendall's <math> <semantics><mrow><mi>τ</mi></mrow> <annotation>$$ \\tau $$</annotation></semantics> </math> bridging method, incorporating smoothness in the bridging process. The approach is then extended with methods for handling both dense and sparse sampling designs, calculating subject-specific latent representations of observed data, latent principal components and principal component scores. Simulation studies demonstrate the method's competitive performance under both dense and sparse sampling designs. The method is applied to data from 497 participants in the National Institute of Mental Health Family Study of Mood Spectrum Disorders to characterize differences in within-day temporal patterns of mood in individuals with the major mood disorder subtypes, including Major Depressive Disorder and Type 1 and 2 Bipolar Disorder. Software implementation of the proposed method is provided in an R-package.","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5431-5445"},"PeriodicalIF":1.8000,"publicationDate":"2024-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11586909/pdf/","citationCount":"0","resultStr":"{\"title\":\"Functional Principal Component Analysis for Continuous Non-Gaussian, Truncated, and Discrete Functional Data.\",\"authors\":\"Debangan Dey, Rahul Ghosal, Kathleen Merikangas, Vadim Zipunnikov\",\"doi\":\"10.1002/sim.10240\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Mobile health studies often collect multiple within-day self-reported assessments of participants' behavior and well-being on different scales such as physical activity (continuous scale), pain levels (truncated scale), mood states (ordinal scale), and the occurrence of daily life events (binary scale). These assessments, when indexed by time of day, can be treated and analyzed as functional data corresponding to their respective types: continuous, truncated, ordinal, and binary. Motivated by these examples, we develop a functional principal component analysis that deals with all four types of functional data in a unified manner. It employs a semiparametric Gaussian copula model, assuming a generalized latent non-paranormal process as the underlying generating mechanism for these four types of functional data. We specify latent temporal dependence using a covariance estimated through Kendall's <math> <semantics><mrow><mi>τ</mi></mrow> <annotation>$$ \\\\tau $$</annotation></semantics> </math> bridging method, incorporating smoothness in the bridging process. The approach is then extended with methods for handling both dense and sparse sampling designs, calculating subject-specific latent representations of observed data, latent principal components and principal component scores. Simulation studies demonstrate the method's competitive performance under both dense and sparse sampling designs. The method is applied to data from 497 participants in the National Institute of Mental Health Family Study of Mood Spectrum Disorders to characterize differences in within-day temporal patterns of mood in individuals with the major mood disorder subtypes, including Major Depressive Disorder and Type 1 and 2 Bipolar Disorder. Software implementation of the proposed method is provided in an R-package.\",\"PeriodicalId\":21879,\"journal\":{\"name\":\"Statistics in Medicine\",\"volume\":\" \",\"pages\":\"5431-5445\"},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2024-12-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11586909/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Statistics in Medicine\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1002/sim.10240\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/10/23 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q3\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistics in Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/sim.10240","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/10/23 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

移动健康研究通常会收集参与者在一天内对自己的行为和健康状况进行的多种自我报告评估，这些评估涉及不同的量表，如体力活动（连续量表）、疼痛程度（截断量表）、情绪状态（序数量表）和日常生活事件的发生（二元量表）。这些评估以一天中的时间为索引，可作为功能数据进行处理和分析，并与各自的类型（连续量表、截断量表、序数量表和二进制量表）相对应。受这些例子的启发，我们开发了一种功能主成分分析法，以统一的方式处理所有四种类型的功能数据。它采用半参数高斯共轭模型，假定一个广义的潜在非正态过程是这四类函数数据的基本生成机制。我们使用通过 Kendall's τ$ \tau$ 桥接方法估算的协方差来指定潜在的时间依赖性，并将平稳性纳入桥接过程。然后，该方法通过处理密集和稀疏抽样设计的方法进行了扩展，计算了观察数据的特定主体潜在表示、潜在主成分和主成分得分。模拟研究证明了该方法在密集和稀疏抽样设计下都具有很强的竞争力。该方法应用于美国国家心理健康研究所情绪谱系障碍家庭研究的 497 名参与者的数据，以描述主要情绪障碍亚型（包括重度抑郁障碍和 1 型和 2 型双相情感障碍）患者的日内情绪时间模式差异。建议方法的软件实现以 R 软件包的形式提供。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Functional Principal Component Analysis for Continuous Non-Gaussian, Truncated, and Discrete Functional Data.

Mobile health studies often collect multiple within-day self-reported assessments of participants' behavior and well-being on different scales such as physical activity (continuous scale), pain levels (truncated scale), mood states (ordinal scale), and the occurrence of daily life events (binary scale). These assessments, when indexed by time of day, can be treated and analyzed as functional data corresponding to their respective types: continuous, truncated, ordinal, and binary. Motivated by these examples, we develop a functional principal component analysis that deals with all four types of functional data in a unified manner. It employs a semiparametric Gaussian copula model, assuming a generalized latent non-paranormal process as the underlying generating mechanism for these four types of functional data. We specify latent temporal dependence using a covariance estimated through Kendall's $τ$ bridging method, incorporating smoothness in the bridging process. The approach is then extended with methods for handling both dense and sparse sampling designs, calculating subject-specific latent representations of observed data, latent principal components and principal component scores. Simulation studies demonstrate the method's competitive performance under both dense and sparse sampling designs. The method is applied to data from 497 participants in the National Institute of Mental Health Family Study of Mood Spectrum Disorders to characterize differences in within-day temporal patterns of mood in individuals with the major mood disorder subtypes, including Major Depressive Disorder and Type 1 and 2 Bipolar Disorder. Software implementation of the proposed method is provided in an R-package.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Statistics in Medicine 医学-公共卫生、环境卫生与职业卫生

CiteScore

3.40

自引率

10.00%

发文量

334

审稿时长

2-4 weeks

期刊介绍： The journal aims to influence practice in medicine and its associated sciences through the publication of papers on statistical and other quantitative methods. Papers will explain new methods and demonstrate their application, preferably through a substantive, real, motivating example or a comprehensive evaluation based on an illustrative example. Alternatively, papers will report on case-studies where creative use or technical generalizations of established methodology is directed towards a substantive application. Reviews of, and tutorials on, general topics relevant to the application of statistics to medicine will also be published. The main criteria for publication are appropriateness of the statistical methods to a particular medical problem and clarity of exposition. Papers with primarily mathematical content will be excluded. The journal aims to enhance communication between statisticians, clinicians and medical researchers.