{"title":"From the Editor","authors":"J. Wagner","doi":"10.1177/87560879221131869","DOIUrl":null,"url":null,"abstract":": Accelerometry data enables scientists to extract personal digital features that can benefit precision health decision making. Existing methods in accelerometry data analysis typically begin with discretizing summary single-axis counts by certain fixed cutoffs into several activity categories, such as Vigorous, Moderate, Light, and Sedentary. One well-known limitation is that the chosen cutoffs have often been validated with restricted settings, and thus they cannot be generalizable across populations, devices, or studies. In this paper, we develop a data-driven approach to overcome this bot-tleneck in the analysis of activity data, in which we holistically summarize a subject ’ s activity pro-file using Occupation-Time curves (OTCs). Being a functional predictor, OTC describes the percentage of time spent at or above a continuum of activity count levels. We develop multi-step adaptive learning algorithms to perform a supervised learning via a scale-functional regression model that con-tains OTC as the functional predictor of interest as well as other covariates. Our learning algorithm first incorporates a hybrid approach of fused lasso for grouping and Hidden Markov Model for change-point detection, and then executes a few refinement learning steps to yield activity windows of interest. We demonstrate good performances of this learning algorithm using simulations as well as real world data analysis to assess the influence of physical activity on biological aging. Abstract: The of Abstract: Principal component analysis (PCA) is one of the most popular methods for dimension reduction. In light of the rapidly increasing large-scale data in federated ecosystems, the traditional PCA method is often not applicable due to privacy protection considerations and large computational burden. Fast PCA algorithms have been proposed to lower the computational cost but cannot handle federated data. Distributed PCA algorithms have been developed to handle federated data but are not computationally efficient when data at each site are very large. In this paper, we propose the FAst DIstributed (FADI) PCA method which applies fast PCA to site specific data using multiple random sketches and aggregates the results across sites. We perform a non-asymptotic We perform studies and show that We apply Abstract: Sequential process monitoring has broad applications. In practice, process character-istics to monitor often have a high dimensionality, partly due to the fast progress in data acquisition techniques. Thus, statistical process control (SPC) research for monitoring high dimensional processes is in rapid development in recent years. Most existing SPC charts for monitoring high-dimensional processes are designed for conventional cases in which the in-control (IC) process observations at different time points are assumed to be independent and identically distributed. In practice, however, serial correlation almost always exists in the observed sequential data, and the longitudinal pattern of the process to monitor could be dynamic in the sense that its IC distribution would vary over time (e.g., seasonality). In this paper, we develop a novel SPC chart for monitoring high-dimensional dynamic processes. The new method is based on nonparametric longitudinal modeling for describing the longitudinal pattern of the process under monitoring, principal component analysis for dimension reduction, and a sequential learning algorithm for developing an effective decision rule. It can well accommodate time-varying IC process distribution, serial data correlation, and nonparametric data distribution. The proposed method has been shown effective for air pollution surveillance. Abstract: Estimating treatment effects is of great importance for many biomedical applications with observational data. Particularly, interpretability of the treatment effects is preferable for many biomedical researchers. In this paper, we first give a theoretical analysis and propose an upper bound for the bias of average treatment effect estimation under the strong ignorability assumption. The proposed upper bound consists of two parts: training error for factual outcomes, and the distance between treated and control distributions. We use the Weighted Energy Distance (WED) ABSTRACT Epidemiology, biostatistics, and data science are broad disciplines that incorporate a variety of substantive areas. Common among them is a focus on quantitative approaches for solving intricate problems. When the substantive area is health and health care, the overlap is further cemented. Researchers in these disciplines are fluent in statistics, data management and analysis, and health and medicine, to name but a few competencies. Yet there are important and perhaps mutually exclu-sive attributes of these fields that warrant a tighter integration. For example, epidemiologists receive substantial training in the science of study design, measurement, and the art of causal inference. Biostatisticians are well versed in the theory and application of methodological techniques, as well as the design and conduct of public health research. Data scientists receive equivalently rigorous training in computational and visualization approaches for high-dimensional data. Compared to data scientists, epidemiologists and biostatisticians may have less expertise in computer science and informatics, while data scientists may benefit from a working knowledge of study design and causal inference. Collaboration and cross-training offer the opportunity to share and learn of the constructs, frameworks, theories, and methods of these fields with the goal of offering fresh and innovate perspectives for tackling challenging problems in health and health care. In this article, we first describe the evolution of these fields focusing on their convergence in the era of electronic health data, notably electronic medical records (EMRs). Next we present how a collaborative team may design, analyze, and implement an EMR-based study. Finally, we review the curricula at leading epidemiology, biostatistics, and data science training programs, identifying gaps and offering suggestions for the fields moving forward.","PeriodicalId":16823,"journal":{"name":"Journal of Plastic Film & Sheeting","volume":"26 1","pages":"491 - 492"},"PeriodicalIF":2.1000,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Plastic Film & Sheeting","FirstCategoryId":"88","ListUrlMain":"https://doi.org/10.1177/87560879221131869","RegionNum":4,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATERIALS SCIENCE, COATINGS & FILMS","Score":null,"Total":0}
引用次数: 0
Abstract
: Accelerometry data enables scientists to extract personal digital features that can benefit precision health decision making. Existing methods in accelerometry data analysis typically begin with discretizing summary single-axis counts by certain fixed cutoffs into several activity categories, such as Vigorous, Moderate, Light, and Sedentary. One well-known limitation is that the chosen cutoffs have often been validated with restricted settings, and thus they cannot be generalizable across populations, devices, or studies. In this paper, we develop a data-driven approach to overcome this bot-tleneck in the analysis of activity data, in which we holistically summarize a subject ’ s activity pro-file using Occupation-Time curves (OTCs). Being a functional predictor, OTC describes the percentage of time spent at or above a continuum of activity count levels. We develop multi-step adaptive learning algorithms to perform a supervised learning via a scale-functional regression model that con-tains OTC as the functional predictor of interest as well as other covariates. Our learning algorithm first incorporates a hybrid approach of fused lasso for grouping and Hidden Markov Model for change-point detection, and then executes a few refinement learning steps to yield activity windows of interest. We demonstrate good performances of this learning algorithm using simulations as well as real world data analysis to assess the influence of physical activity on biological aging. Abstract: The of Abstract: Principal component analysis (PCA) is one of the most popular methods for dimension reduction. In light of the rapidly increasing large-scale data in federated ecosystems, the traditional PCA method is often not applicable due to privacy protection considerations and large computational burden. Fast PCA algorithms have been proposed to lower the computational cost but cannot handle federated data. Distributed PCA algorithms have been developed to handle federated data but are not computationally efficient when data at each site are very large. In this paper, we propose the FAst DIstributed (FADI) PCA method which applies fast PCA to site specific data using multiple random sketches and aggregates the results across sites. We perform a non-asymptotic We perform studies and show that We apply Abstract: Sequential process monitoring has broad applications. In practice, process character-istics to monitor often have a high dimensionality, partly due to the fast progress in data acquisition techniques. Thus, statistical process control (SPC) research for monitoring high dimensional processes is in rapid development in recent years. Most existing SPC charts for monitoring high-dimensional processes are designed for conventional cases in which the in-control (IC) process observations at different time points are assumed to be independent and identically distributed. In practice, however, serial correlation almost always exists in the observed sequential data, and the longitudinal pattern of the process to monitor could be dynamic in the sense that its IC distribution would vary over time (e.g., seasonality). In this paper, we develop a novel SPC chart for monitoring high-dimensional dynamic processes. The new method is based on nonparametric longitudinal modeling for describing the longitudinal pattern of the process under monitoring, principal component analysis for dimension reduction, and a sequential learning algorithm for developing an effective decision rule. It can well accommodate time-varying IC process distribution, serial data correlation, and nonparametric data distribution. The proposed method has been shown effective for air pollution surveillance. Abstract: Estimating treatment effects is of great importance for many biomedical applications with observational data. Particularly, interpretability of the treatment effects is preferable for many biomedical researchers. In this paper, we first give a theoretical analysis and propose an upper bound for the bias of average treatment effect estimation under the strong ignorability assumption. The proposed upper bound consists of two parts: training error for factual outcomes, and the distance between treated and control distributions. We use the Weighted Energy Distance (WED) ABSTRACT Epidemiology, biostatistics, and data science are broad disciplines that incorporate a variety of substantive areas. Common among them is a focus on quantitative approaches for solving intricate problems. When the substantive area is health and health care, the overlap is further cemented. Researchers in these disciplines are fluent in statistics, data management and analysis, and health and medicine, to name but a few competencies. Yet there are important and perhaps mutually exclu-sive attributes of these fields that warrant a tighter integration. For example, epidemiologists receive substantial training in the science of study design, measurement, and the art of causal inference. Biostatisticians are well versed in the theory and application of methodological techniques, as well as the design and conduct of public health research. Data scientists receive equivalently rigorous training in computational and visualization approaches for high-dimensional data. Compared to data scientists, epidemiologists and biostatisticians may have less expertise in computer science and informatics, while data scientists may benefit from a working knowledge of study design and causal inference. Collaboration and cross-training offer the opportunity to share and learn of the constructs, frameworks, theories, and methods of these fields with the goal of offering fresh and innovate perspectives for tackling challenging problems in health and health care. In this article, we first describe the evolution of these fields focusing on their convergence in the era of electronic health data, notably electronic medical records (EMRs). Next we present how a collaborative team may design, analyze, and implement an EMR-based study. Finally, we review the curricula at leading epidemiology, biostatistics, and data science training programs, identifying gaps and offering suggestions for the fields moving forward.
期刊介绍:
The Journal of Plastic Film and Sheeting improves communication concerning plastic film and sheeting with major emphasis on the propogation of knowledge which will serve to advance the science and technology of these products and thus better serve industry and the ultimate consumer. The journal reports on the wide variety of advances that are rapidly taking place in the technology of plastic film and sheeting. This journal is a member of the Committee on Publication Ethics (COPE).