A Novel Curve Clustering Method for Functional Data: Applications to COVID-19 and Financial Data

Big data analytics Pub Date : 2023-10-08 DOI:10.3390/analytics2040041

Ting Wei, Bo Wang

{"title":"A Novel Curve Clustering Method for Functional Data: Applications to COVID-19 and Financial Data","authors":"Ting Wei, Bo Wang","doi":"10.3390/analytics2040041","DOIUrl":null,"url":null,"abstract":"Functional data analysis has significantly enriched the landscape of existing data analysis methodologies, providing a new framework for comprehending data structures and extracting valuable insights. This paper is dedicated to addressing functional data clustering—a pivotal challenge within functional data analysis. Our contribution to this field manifests through the introduction of innovative clustering methodologies tailored specifically to functional curves. Initially, we present a proximity measure algorithm designed for functional curve clustering. This innovative clustering approach offers the flexibility to redefine measurement points on continuous functions, adapting to either equidistant or nonuniform arrangements, as dictated by the demands of the proximity measure. Central to this method is the “proximity threshold”, a critical parameter that governs the cluster count, and its selection is thoroughly explored. Subsequently, we propose a time-shift clustering algorithm designed for time-series data. This approach identifies historical data segments that share patterns similar to those observed in the present. To evaluate the effectiveness of our methodologies, we conduct comparisons with the classic K-means clustering method and apply them to simulated data, yielding encouraging simulation results. Moving beyond simulation, we apply the proposed proximity measure algorithm to COVID-19 data, yielding notable clustering accuracy. Additionally, the time-shift clustering algorithm is employed to analyse NASDAQ Composite data, successfully revealing underlying economic cycles.","PeriodicalId":93078,"journal":{"name":"Big data analytics","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Big data analytics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/analytics2040041","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Functional data analysis has significantly enriched the landscape of existing data analysis methodologies, providing a new framework for comprehending data structures and extracting valuable insights. This paper is dedicated to addressing functional data clustering—a pivotal challenge within functional data analysis. Our contribution to this field manifests through the introduction of innovative clustering methodologies tailored specifically to functional curves. Initially, we present a proximity measure algorithm designed for functional curve clustering. This innovative clustering approach offers the flexibility to redefine measurement points on continuous functions, adapting to either equidistant or nonuniform arrangements, as dictated by the demands of the proximity measure. Central to this method is the “proximity threshold”, a critical parameter that governs the cluster count, and its selection is thoroughly explored. Subsequently, we propose a time-shift clustering algorithm designed for time-series data. This approach identifies historical data segments that share patterns similar to those observed in the present. To evaluate the effectiveness of our methodologies, we conduct comparisons with the classic K-means clustering method and apply them to simulated data, yielding encouraging simulation results. Moving beyond simulation, we apply the proposed proximity measure algorithm to COVID-19 data, yielding notable clustering accuracy. Additionally, the time-shift clustering algorithm is employed to analyse NASDAQ Composite data, successfully revealing underlying economic cycles.

查看原文本刊更多论文

一种新的功能数据曲线聚类方法:在COVID-19和金融数据中的应用

功能数据分析极大地丰富了现有数据分析方法的格局，为理解数据结构和提取有价值的见解提供了一个新的框架。本文致力于解决功能数据聚类——功能数据分析中的关键挑战。我们对这一领域的贡献体现在引入专门为功能曲线量身定制的创新聚类方法。首先，我们提出了一种用于功能曲线聚类的接近度量算法。这种创新的聚类方法提供了在连续函数上重新定义测量点的灵活性，根据邻近测量的要求，适应等距或非均匀排列。该方法的核心是“接近阈值”，这是一个控制聚类计数的关键参数，并且对其选择进行了深入研究。随后，我们提出了一种针对时间序列数据的时移聚类算法。这种方法识别与当前观察到的模式相似的历史数据段。为了评估我们的方法的有效性，我们与经典的K-means聚类方法进行了比较，并将其应用于模拟数据，得到了令人鼓舞的模拟结果。在模拟之外，我们将提出的接近度量算法应用于COVID-19数据，产生了显着的聚类精度。此外，采用时移聚类算法分析纳斯达克综合指数数据，成功地揭示了潜在的经济周期。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Big data analytics

自引率

0.00%

发文量

审稿时长

5 weeks