{"title":"连续非高斯、截断和离散函数数据的函数主成分分析。","authors":"Debangan Dey, Rahul Ghosal, Kathleen Merikangas, Vadim Zipunnikov","doi":"10.1002/sim.10240","DOIUrl":null,"url":null,"abstract":"<p><p>Mobile health studies often collect multiple within-day self-reported assessments of participants' behavior and well-being on different scales such as physical activity (continuous scale), pain levels (truncated scale), mood states (ordinal scale), and the occurrence of daily life events (binary scale). These assessments, when indexed by time of day, can be treated and analyzed as functional data corresponding to their respective types: continuous, truncated, ordinal, and binary. Motivated by these examples, we develop a functional principal component analysis that deals with all four types of functional data in a unified manner. It employs a semiparametric Gaussian copula model, assuming a generalized latent non-paranormal process as the underlying generating mechanism for these four types of functional data. We specify latent temporal dependence using a covariance estimated through Kendall's <math> <semantics><mrow><mi>τ</mi></mrow> <annotation>$$ \\tau $$</annotation></semantics> </math> bridging method, incorporating smoothness in the bridging process. The approach is then extended with methods for handling both dense and sparse sampling designs, calculating subject-specific latent representations of observed data, latent principal components and principal component scores. Simulation studies demonstrate the method's competitive performance under both dense and sparse sampling designs. The method is applied to data from 497 participants in the National Institute of Mental Health Family Study of Mood Spectrum Disorders to characterize differences in within-day temporal patterns of mood in individuals with the major mood disorder subtypes, including Major Depressive Disorder and Type 1 and 2 Bipolar Disorder. Software implementation of the proposed method is provided in an R-package.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5431-5445"},"PeriodicalIF":1.8000,"publicationDate":"2024-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11586909/pdf/","citationCount":"0","resultStr":"{\"title\":\"Functional Principal Component Analysis for Continuous Non-Gaussian, Truncated, and Discrete Functional Data.\",\"authors\":\"Debangan Dey, Rahul Ghosal, Kathleen Merikangas, Vadim Zipunnikov\",\"doi\":\"10.1002/sim.10240\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Mobile health studies often collect multiple within-day self-reported assessments of participants' behavior and well-being on different scales such as physical activity (continuous scale), pain levels (truncated scale), mood states (ordinal scale), and the occurrence of daily life events (binary scale). These assessments, when indexed by time of day, can be treated and analyzed as functional data corresponding to their respective types: continuous, truncated, ordinal, and binary. Motivated by these examples, we develop a functional principal component analysis that deals with all four types of functional data in a unified manner. It employs a semiparametric Gaussian copula model, assuming a generalized latent non-paranormal process as the underlying generating mechanism for these four types of functional data. We specify latent temporal dependence using a covariance estimated through Kendall's <math> <semantics><mrow><mi>τ</mi></mrow> <annotation>$$ \\\\tau $$</annotation></semantics> </math> bridging method, incorporating smoothness in the bridging process. The approach is then extended with methods for handling both dense and sparse sampling designs, calculating subject-specific latent representations of observed data, latent principal components and principal component scores. Simulation studies demonstrate the method's competitive performance under both dense and sparse sampling designs. The method is applied to data from 497 participants in the National Institute of Mental Health Family Study of Mood Spectrum Disorders to characterize differences in within-day temporal patterns of mood in individuals with the major mood disorder subtypes, including Major Depressive Disorder and Type 1 and 2 Bipolar Disorder. Software implementation of the proposed method is provided in an R-package.</p>\",\"PeriodicalId\":21879,\"journal\":{\"name\":\"Statistics in Medicine\",\"volume\":\" \",\"pages\":\"5431-5445\"},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2024-12-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11586909/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Statistics in Medicine\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1002/sim.10240\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/10/23 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q3\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistics in Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/sim.10240","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/10/23 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
Functional Principal Component Analysis for Continuous Non-Gaussian, Truncated, and Discrete Functional Data.
Mobile health studies often collect multiple within-day self-reported assessments of participants' behavior and well-being on different scales such as physical activity (continuous scale), pain levels (truncated scale), mood states (ordinal scale), and the occurrence of daily life events (binary scale). These assessments, when indexed by time of day, can be treated and analyzed as functional data corresponding to their respective types: continuous, truncated, ordinal, and binary. Motivated by these examples, we develop a functional principal component analysis that deals with all four types of functional data in a unified manner. It employs a semiparametric Gaussian copula model, assuming a generalized latent non-paranormal process as the underlying generating mechanism for these four types of functional data. We specify latent temporal dependence using a covariance estimated through Kendall's bridging method, incorporating smoothness in the bridging process. The approach is then extended with methods for handling both dense and sparse sampling designs, calculating subject-specific latent representations of observed data, latent principal components and principal component scores. Simulation studies demonstrate the method's competitive performance under both dense and sparse sampling designs. The method is applied to data from 497 participants in the National Institute of Mental Health Family Study of Mood Spectrum Disorders to characterize differences in within-day temporal patterns of mood in individuals with the major mood disorder subtypes, including Major Depressive Disorder and Type 1 and 2 Bipolar Disorder. Software implementation of the proposed method is provided in an R-package.
期刊介绍:
The journal aims to influence practice in medicine and its associated sciences through the publication of papers on statistical and other quantitative methods. Papers will explain new methods and demonstrate their application, preferably through a substantive, real, motivating example or a comprehensive evaluation based on an illustrative example. Alternatively, papers will report on case-studies where creative use or technical generalizations of established methodology is directed towards a substantive application. Reviews of, and tutorials on, general topics relevant to the application of statistics to medicine will also be published. The main criteria for publication are appropriateness of the statistical methods to a particular medical problem and clarity of exposition. Papers with primarily mathematical content will be excluded. The journal aims to enhance communication between statisticians, clinicians and medical researchers.