Multimodal correlations-based data clustering

IF 1.7 Q2 MATHEMATICS, APPLIED

Foundations of data science (Springfield, Mo.) Pub Date : 2022-01-01 DOI:10.3934/fods.2022011

Jia Chen, I. Schizas

{"title":"Multimodal correlations-based data clustering","authors":"Jia Chen, I. Schizas","doi":"10.3934/fods.2022011","DOIUrl":null,"url":null,"abstract":"This work proposes a novel technique for clustering multimodal data according to their information content. Statistical correlations present in data that contain similar information are exploited to perform the clustering task. Specifically, multiset canonical correlation analysis is equipped with norm-one regularization mechanisms to identify clusters within different types of data that share the same information content. A pertinent minimization formulation is put forth, while block coordinate descent is employed to derive a batch clustering algorithm which achieves better clustering performance than existing alternatives. Relying on subgradient descent, an online clustering approach is derived which substantially lowers computational complexity compared to the batch approach, while not compromising significantly the clustering performance. It is established that for an increasing number of data the novel regularized multiset framework is able to correctly cluster the multimodal data entries. Further, it is proved that the online clustering scheme converges with probability one to a stationary point of the ensemble regularized multiset correlations cost having the potential to recover the correct clusters. Extensive numerical tests demonstrate that the novel clustering scheme outperforms existing alternatives, while the online scheme achieves substantial computational savings.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":"1 1","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Foundations of data science (Springfield, Mo.)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3934/fods.2022011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATHEMATICS, APPLIED","Score":null,"Total":0}

引用次数: 0

Abstract

This work proposes a novel technique for clustering multimodal data according to their information content. Statistical correlations present in data that contain similar information are exploited to perform the clustering task. Specifically, multiset canonical correlation analysis is equipped with norm-one regularization mechanisms to identify clusters within different types of data that share the same information content. A pertinent minimization formulation is put forth, while block coordinate descent is employed to derive a batch clustering algorithm which achieves better clustering performance than existing alternatives. Relying on subgradient descent, an online clustering approach is derived which substantially lowers computational complexity compared to the batch approach, while not compromising significantly the clustering performance. It is established that for an increasing number of data the novel regularized multiset framework is able to correctly cluster the multimodal data entries. Further, it is proved that the online clustering scheme converges with probability one to a stationary point of the ensemble regularized multiset correlations cost having the potential to recover the correct clusters. Extensive numerical tests demonstrate that the novel clustering scheme outperforms existing alternatives, while the online scheme achieves substantial computational savings.

查看原文本刊更多论文

基于多模态相关的数据聚类

本文提出了一种基于信息含量的多模态数据聚类方法。包含相似信息的数据中存在的统计相关性被用来执行聚类任务。具体来说，多集典型相关分析配备了规范一正则化机制，以识别共享相同信息内容的不同类型数据中的聚类。提出了相应的最小化公式，并采用块坐标下降法导出了一种比现有算法具有更好聚类性能的批量聚类算法。基于亚梯度下降，推导出了一种在线聚类方法，该方法与批处理方法相比大大降低了计算复杂度，同时不会显著影响聚类性能。结果表明，在数据量不断增加的情况下，本文提出的正则化多集框架能够正确聚类多模态数据。进一步证明了在线聚类方案以概率1收敛到集成正则化多集相关代价的平稳点，具有恢复正确聚类的潜力。大量的数值测试表明，新的聚类方案优于现有的替代方案，而在线方案实现了大量的计算节省。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Foundations of data science (Springfield, Mo.)

CiteScore

3.30

自引率

0.00%

发文量