Prediction Strength for Clustering Activity Patterns Using Accelerometer Data

IF 1.7

Journal for the measurement of physical behaviour Pub Date : 2023-01-01 DOI:10.1123/jmpb.2022-0049

Jingzhi Yu, K. Kapphahn, Hyatt Moore, F. Haydel, Thomas Robinson, M. Desai

{"title":"Prediction Strength for Clustering Activity Patterns Using Accelerometer Data","authors":"Jingzhi Yu, K. Kapphahn, Hyatt Moore, F. Haydel, Thomas Robinson, M. Desai","doi":"10.1123/jmpb.2022-0049","DOIUrl":null,"url":null,"abstract":"Background: Clustering, a class of unsupervised machine learning methods, has been applied to physical activity data recorded by accelerometers to discover unique patterns of physical activity and health outcomes. The prediction strength metric provides a criterion to determine the optimal number of clusters for clustering methods. The aim of this study is to provide specific guidance for applying prediction strength to time series accelerometer data. Methods: For this purpose, we designed an extensive simulation study. We created a synthetic data set of accelerometer data using data from a childhood obesity management trial. We evaluated the role of a prespecified threshold of the prediction strength metric as a key input parameter. We compared the recommended threshold (between 0.8 and 0.9) with an approach we developed (Local Maxima). Results: The choice of threshold had a large impact on performance. When the noise level increased (greater overlap between true clusters), lower thresholds outperformed the recommended threshold, which tended to underestimate the true number of clusters. In addition, we found that sorting the data by magnitude of intensity in windows within the time series of interest prior to clustering alleviated sensitivity to threshold choice. Furthermore, for accelerometer data, we recommend that the Local Maxima approach be utilized together with a graphical evaluation of the prediction strength metric function over values of k. Finally, we strongly suggest sorting of the data prior to clustering if sorting retains meaning for the research question at hand. Conclusion: Our recommendations can help future researchers discover more robust patterns from accelerometer data.","PeriodicalId":73572,"journal":{"name":"Journal for the measurement of physical behaviour","volume":"49 1","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal for the measurement of physical behaviour","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1123/jmpb.2022-0049","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Clustering, a class of unsupervised machine learning methods, has been applied to physical activity data recorded by accelerometers to discover unique patterns of physical activity and health outcomes. The prediction strength metric provides a criterion to determine the optimal number of clusters for clustering methods. The aim of this study is to provide specific guidance for applying prediction strength to time series accelerometer data. Methods: For this purpose, we designed an extensive simulation study. We created a synthetic data set of accelerometer data using data from a childhood obesity management trial. We evaluated the role of a prespecified threshold of the prediction strength metric as a key input parameter. We compared the recommended threshold (between 0.8 and 0.9) with an approach we developed (Local Maxima). Results: The choice of threshold had a large impact on performance. When the noise level increased (greater overlap between true clusters), lower thresholds outperformed the recommended threshold, which tended to underestimate the true number of clusters. In addition, we found that sorting the data by magnitude of intensity in windows within the time series of interest prior to clustering alleviated sensitivity to threshold choice. Furthermore, for accelerometer data, we recommend that the Local Maxima approach be utilized together with a graphical evaluation of the prediction strength metric function over values of k. Finally, we strongly suggest sorting of the data prior to clustering if sorting retains meaning for the research question at hand. Conclusion: Our recommendations can help future researchers discover more robust patterns from accelerometer data.

查看原文本刊更多论文

利用加速度计数据聚类活动模式的预测强度

背景:聚类是一类无监督机器学习方法，已被应用于加速度计记录的身体活动数据，以发现身体活动和健康结果的独特模式。预测强度度量为确定聚类方法的最佳聚类数提供了一个准则。本研究的目的是为时间序列加速度计数据预测强度的应用提供具体的指导。方法:为此，我们设计了一个广泛的模拟研究。我们利用儿童肥胖管理试验的数据创建了一个加速度计数据的合成数据集。我们评估了预测强度度量的预设阈值作为关键输入参数的作用。我们将推荐的阈值(在0.8和0.9之间)与我们开发的方法(局部最大值)进行了比较。结果:阈值的选择对性能有较大影响。当噪声水平增加(真实集群之间的重叠更大)时，较低的阈值优于推荐的阈值，这往往低估了集群的真实数量。此外，我们发现，在聚类之前，根据感兴趣的时间序列中窗口的强度大小对数据进行排序可以减轻对阈值选择的敏感性。此外，对于加速度计数据，我们建议将局部最大值方法与预测强度度量函数在k值上的图形评估一起使用。最后，我们强烈建议在聚类之前对数据进行排序，如果排序对手头的研究问题有意义的话。结论:我们的建议可以帮助未来的研究人员从加速度计数据中发现更可靠的模式。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal for the measurement of physical behaviour

CiteScore

2.90

自引率

0.00%

发文量