From the Editor

IF 1.5 4区材料科学 Q3 MATERIALS SCIENCE, COATINGS & FILMS

Journal of Plastic Film & Sheeting Pub Date : 2022-10-01 DOI:10.1177/87560879221131869

J. Wagner

{"title":"From the Editor","authors":"J. Wagner","doi":"10.1177/87560879221131869","DOIUrl":null,"url":null,"abstract":": Accelerometry data enables scientists to extract personal digital features that can benefit precision health decision making. Existing methods in accelerometry data analysis typically begin with discretizing summary single-axis counts by certain fixed cutoffs into several activity categories, such as Vigorous, Moderate, Light, and Sedentary. One well-known limitation is that the chosen cutoffs have often been validated with restricted settings, and thus they cannot be generalizable across populations, devices, or studies. In this paper, we develop a data-driven approach to overcome this bot-tleneck in the analysis of activity data, in which we holistically summarize a subject ’ s activity pro-file using Occupation-Time curves (OTCs). Being a functional predictor, OTC describes the percentage of time spent at or above a continuum of activity count levels. We develop multi-step adaptive learning algorithms to perform a supervised learning via a scale-functional regression model that con-tains OTC as the functional predictor of interest as well as other covariates. Our learning algorithm first incorporates a hybrid approach of fused lasso for grouping and Hidden Markov Model for change-point detection, and then executes a few refinement learning steps to yield activity windows of interest. We demonstrate good performances of this learning algorithm using simulations as well as real world data analysis to assess the influence of physical activity on biological aging. Abstract: The of Abstract: Principal component analysis (PCA) is one of the most popular methods for dimension reduction. In light of the rapidly increasing large-scale data in federated ecosystems, the traditional PCA method is often not applicable due to privacy protection considerations and large computational burden. Fast PCA algorithms have been proposed to lower the computational cost but cannot handle federated data. Distributed PCA algorithms have been developed to handle federated data but are not computationally eﬀicient when data at each site are very large. In this paper, we propose the FAst DIstributed (FADI) PCA method which applies fast PCA to site specific data using multiple random sketches and aggregates the results across sites. We perform a non-asymptotic We perform studies and show that We apply Abstract: Sequential process monitoring has broad applications. In practice, process character-istics to monitor often have a high dimensionality, partly due to the fast progress in data acquisition techniques. Thus, statistical process control (SPC) research for monitoring high dimensional processes is in rapid development in recent years. Most existing SPC charts for monitoring high-dimensional processes are designed for conventional cases in which the in-control (IC) process observations at different time points are assumed to be independent and identically distributed. In practice, however, serial correlation almost always exists in the observed sequential data, and the longitudinal pattern of the process to monitor could be dynamic in the sense that its IC distribution would vary over time (e.g., seasonality). In this paper, we develop a novel SPC chart for monitoring high-dimensional dynamic processes. The new method is based on nonparametric longitudinal modeling for describing the longitudinal pattern of the process under monitoring, principal component analysis for dimension reduction, and a sequential learning algorithm for developing an effective decision rule. It can well accommodate time-varying IC process distribution, serial data correlation, and nonparametric data distribution. The proposed method has been shown effective for air pollution surveillance. Abstract: Estimating treatment effects is of great importance for many biomedical applications with observational data. Particularly, interpretability of the treatment effects is preferable for many biomedical researchers. In this paper, we first give a theoretical analysis and propose an upper bound for the bias of average treatment effect estimation under the strong ignorability assumption. The proposed upper bound consists of two parts: training error for factual outcomes, and the distance between treated and control distributions. We use the Weighted Energy Distance (WED) ABSTRACT Epidemiology, biostatistics, and data science are broad disciplines that incorporate a variety of substantive areas. Common among them is a focus on quantitative approaches for solving intricate problems. When the substantive area is health and health care, the overlap is further cemented. Researchers in these disciplines are fluent in statistics, data management and analysis, and health and medicine, to name but a few competencies. Yet there are important and perhaps mutually exclu-sive attributes of these fields that warrant a tighter integration. For example, epidemiologists receive substantial training in the science of study design, measurement, and the art of causal inference. Biostatisticians are well versed in the theory and application of methodological techniques, as well as the design and conduct of public health research. Data scientists receive equivalently rigorous training in computational and visualization approaches for high-dimensional data. Compared to data scientists, epidemiologists and biostatisticians may have less expertise in computer science and informatics, while data scientists may benefit from a working knowledge of study design and causal inference. Collaboration and cross-training offer the opportunity to share and learn of the constructs, frameworks, theories, and methods of these fields with the goal of offering fresh and innovate perspectives for tackling challenging problems in health and health care. In this article, we first describe the evolution of these fields focusing on their convergence in the era of electronic health data, notably electronic medical records (EMRs). Next we present how a collaborative team may design, analyze, and implement an EMR-based study. Finally, we review the curricula at leading epidemiology, biostatistics, and data science training programs, identifying gaps and offering suggestions for the fields moving forward.","PeriodicalId":16823,"journal":{"name":"Journal of Plastic Film & Sheeting","volume":"26 1","pages":"491 - 492"},"PeriodicalIF":1.5000,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Plastic Film & Sheeting","FirstCategoryId":"88","ListUrlMain":"https://doi.org/10.1177/87560879221131869","RegionNum":4,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATERIALS SCIENCE, COATINGS & FILMS","Score":null,"Total":0}

引用次数: 0

Abstract

: Accelerometry data enables scientists to extract personal digital features that can benefit precision health decision making. Existing methods in accelerometry data analysis typically begin with discretizing summary single-axis counts by certain fixed cutoffs into several activity categories, such as Vigorous, Moderate, Light, and Sedentary. One well-known limitation is that the chosen cutoffs have often been validated with restricted settings, and thus they cannot be generalizable across populations, devices, or studies. In this paper, we develop a data-driven approach to overcome this bot-tleneck in the analysis of activity data, in which we holistically summarize a subject ’ s activity pro-file using Occupation-Time curves (OTCs). Being a functional predictor, OTC describes the percentage of time spent at or above a continuum of activity count levels. We develop multi-step adaptive learning algorithms to perform a supervised learning via a scale-functional regression model that con-tains OTC as the functional predictor of interest as well as other covariates. Our learning algorithm first incorporates a hybrid approach of fused lasso for grouping and Hidden Markov Model for change-point detection, and then executes a few refinement learning steps to yield activity windows of interest. We demonstrate good performances of this learning algorithm using simulations as well as real world data analysis to assess the influence of physical activity on biological aging. Abstract: The of Abstract: Principal component analysis (PCA) is one of the most popular methods for dimension reduction. In light of the rapidly increasing large-scale data in federated ecosystems, the traditional PCA method is often not applicable due to privacy protection considerations and large computational burden. Fast PCA algorithms have been proposed to lower the computational cost but cannot handle federated data. Distributed PCA algorithms have been developed to handle federated data but are not computationally eﬀicient when data at each site are very large. In this paper, we propose the FAst DIstributed (FADI) PCA method which applies fast PCA to site specific data using multiple random sketches and aggregates the results across sites. We perform a non-asymptotic We perform studies and show that We apply Abstract: Sequential process monitoring has broad applications. In practice, process character-istics to monitor often have a high dimensionality, partly due to the fast progress in data acquisition techniques. Thus, statistical process control (SPC) research for monitoring high dimensional processes is in rapid development in recent years. Most existing SPC charts for monitoring high-dimensional processes are designed for conventional cases in which the in-control (IC) process observations at different time points are assumed to be independent and identically distributed. In practice, however, serial correlation almost always exists in the observed sequential data, and the longitudinal pattern of the process to monitor could be dynamic in the sense that its IC distribution would vary over time (e.g., seasonality). In this paper, we develop a novel SPC chart for monitoring high-dimensional dynamic processes. The new method is based on nonparametric longitudinal modeling for describing the longitudinal pattern of the process under monitoring, principal component analysis for dimension reduction, and a sequential learning algorithm for developing an effective decision rule. It can well accommodate time-varying IC process distribution, serial data correlation, and nonparametric data distribution. The proposed method has been shown effective for air pollution surveillance. Abstract: Estimating treatment effects is of great importance for many biomedical applications with observational data. Particularly, interpretability of the treatment effects is preferable for many biomedical researchers. In this paper, we first give a theoretical analysis and propose an upper bound for the bias of average treatment effect estimation under the strong ignorability assumption. The proposed upper bound consists of two parts: training error for factual outcomes, and the distance between treated and control distributions. We use the Weighted Energy Distance (WED) ABSTRACT Epidemiology, biostatistics, and data science are broad disciplines that incorporate a variety of substantive areas. Common among them is a focus on quantitative approaches for solving intricate problems. When the substantive area is health and health care, the overlap is further cemented. Researchers in these disciplines are fluent in statistics, data management and analysis, and health and medicine, to name but a few competencies. Yet there are important and perhaps mutually exclu-sive attributes of these fields that warrant a tighter integration. For example, epidemiologists receive substantial training in the science of study design, measurement, and the art of causal inference. Biostatisticians are well versed in the theory and application of methodological techniques, as well as the design and conduct of public health research. Data scientists receive equivalently rigorous training in computational and visualization approaches for high-dimensional data. Compared to data scientists, epidemiologists and biostatisticians may have less expertise in computer science and informatics, while data scientists may benefit from a working knowledge of study design and causal inference. Collaboration and cross-training offer the opportunity to share and learn of the constructs, frameworks, theories, and methods of these fields with the goal of offering fresh and innovate perspectives for tackling challenging problems in health and health care. In this article, we first describe the evolution of these fields focusing on their convergence in the era of electronic health data, notably electronic medical records (EMRs). Next we present how a collaborative team may design, analyze, and implement an EMR-based study. Finally, we review the curricula at leading epidemiology, biostatistics, and data science training programs, identifying gaps and offering suggestions for the fields moving forward.

查看原文本刊更多论文

来自编辑

:加速度测量数据使科学家能够提取个人数字特征，从而有利于精确的健康决策。现有的加速度计数据分析方法通常首先通过某些固定的截止点将汇总单轴计数离散为几个活动类别，如剧烈、中度、轻度和久坐。一个众所周知的限制是，所选择的截止值通常是在有限的设置下验证的，因此它们不能在人群、设备或研究中推广。在本文中，我们开发了一种数据驱动的方法来克服活动数据分析中的这一瓶颈，其中我们使用职业时间曲线(OTCs)全面总结了受试者的活动概况。作为一个功能预测指标，OTC描述了在连续活动计数水平或以上花费的时间百分比。我们开发了多步自适应学习算法，通过包含OTC作为感兴趣的功能预测器以及其他协变量的规模函数回归模型来执行监督学习。我们的学习算法首先结合融合套索分组和隐马尔可夫模型的混合方法进行变化点检测，然后执行一些细化学习步骤来产生感兴趣的活动窗口。我们使用模拟和现实世界的数据分析来评估身体活动对生物衰老的影响，证明了这种学习算法的良好性能。摘要:主成分分析(PCA)是最常用的降维方法之一。在联邦生态系统中大规模数据快速增长的情况下，由于隐私保护的考虑和计算量大，传统的主成分分析方法往往不适用。快速PCA算法可以降低计算成本，但无法处理联邦数据。分布式PCA算法已被开发用于处理联邦数据，但当每个站点的数据非常大时，计算效率不高。在本文中，我们提出了快速分布(FAst DIstributed, FADI)主成分分析方法，该方法使用多个随机草图对特定站点的数据进行快速主成分分析，并跨站点汇总结果。摘要:序贯过程监控具有广泛的应用前景。在实践中，要监测的过程特征往往具有很高的维度，部分原因是由于数据采集技术的快速发展。因此，用于监测高维过程的统计过程控制(SPC)研究近年来得到了迅速发展。大多数现有的用于监控高维过程的SPC图都是为传统情况而设计的，在这种情况下，不同时间点的控制(IC)过程观测被假设为独立且均匀分布。然而，在实践中，序列相关性几乎总是存在于观测到的序列数据中，并且要监测的过程的纵向模式可能是动态的，因为其IC分布会随时间而变化(例如，季节性)。在本文中，我们开发了一种新的SPC图用于监控高维动态过程。该方法基于非参数纵向建模来描述监测过程的纵向模式，基于主成分分析来降维，基于顺序学习算法来制定有效的决策规则。它能很好地适应时变IC工艺分布、序列数据相关和非参数数据分布。该方法已被证明是有效的空气污染监测方法。摘要:利用观察性数据估计治疗效果对于许多生物医学应用具有重要意义。特别是，对许多生物医学研究人员来说，治疗效果的可解释性是可取的。本文首先进行了理论分析，提出了在强可忽略性假设下平均处理效果估计偏差的上界。提出的上界由两部分组成:事实结果的训练误差，以及处理分布和控制分布之间的距离。流行病学、生物统计学和数据科学是包含各种实质性领域的广泛学科。其中的共同点是关注解决复杂问题的定量方法。当实质性领域是卫生和保健时，这种重叠进一步巩固。这些学科的研究人员精通统计学、数据管理和分析、健康和医学，仅举几例。然而，这些领域有一些重要的，也许是相互排斥的属性，需要更紧密的集成。例如，流行病学家在研究设计、测量和因果推理的艺术方面接受了大量的培训。生物统计学家精通方法论技术的理论和应用，以及公共卫生研究的设计和实施。数据科学家在高维数据的计算和可视化方法方面接受同样严格的训练。与数据科学家相比，流行病学家和生物统计学家在计算机科学和信息学方面的专业知识可能较少，而数据科学家可能受益于研究设计和因果推理的工作知识。合作和交叉培训提供了分享和学习这些领域的结构、框架、理论和方法的机会，目的是为解决健康和医疗保健领域的挑战性问题提供新鲜和创新的视角。在本文中，我们首先描述了这些领域的演变，重点关注它们在电子健康数据时代的融合，特别是电子医疗记录(emr)。接下来，我们将介绍协作团队如何设计、分析和实施基于电子病历的研究。最后，我们回顾了主要流行病学、生物统计学和数据科学培训项目的课程，找出差距并为该领域的发展提供建议。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Plastic Film & Sheeting 工程技术-材料科学：膜

CiteScore

6.00

自引率

16.10%

发文量

审稿时长

>12 weeks

期刊介绍： The Journal of Plastic Film and Sheeting improves communication concerning plastic film and sheeting with major emphasis on the propogation of knowledge which will serve to advance the science and technology of these products and thus better serve industry and the ultimate consumer. The journal reports on the wide variety of advances that are rapidly taking place in the technology of plastic film and sheeting. This journal is a member of the Committee on Publication Ethics (COPE).