基于二矩信息的多样本聚类

Proceedings of the 2020 3rd International Conference on Artificial Intelligence and Pattern Recognition Pub Date : 2020-06-26 DOI:10.1145/3430199.3430223

Xiang Wang

{"title":"基于二矩信息的多样本聚类","authors":"Xiang Wang","doi":"10.1145/3430199.3430223","DOIUrl":null,"url":null,"abstract":"The clustering algorithms that view each object data as a single sample drawn from a certain distribution have been a hot topic for decades. Many clustering algorithms, such as k-means and spectral clustering, are proposed based on the assumption that each clustering object is a vector generated by a Gaussian distribution. However, in real practice, each input object is usually a set of vectors drawn from a certain hidden distribution. Traditional clustering algorithms cannot handle such a situation. This fact calls for the multiple samples clustering algorithm. In this paper, we propose two algorithms for multiple samples clustering: Wasserstein distance based spectral clustering and Bhattacharyya distance based spectral clustering, and compare them with the traditional spectral clustering. The simulation results show that the second-moment information can greatly improve the clustering accuracy and stability. These algorithms are applied to the stock dataset to separate stocks into different groups based on their historical prices. Investors can make investment decisions based on the clustering information, to invest stocks in the same cluster and get the highest earning or to invest stocks of different clusters to avoid the risk.","PeriodicalId":371055,"journal":{"name":"Proceedings of the 2020 3rd International Conference on Artificial Intelligence and Pattern Recognition","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multiple Samples Clustering with Second-moment Information in Stock Clustering\",\"authors\":\"Xiang Wang\",\"doi\":\"10.1145/3430199.3430223\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The clustering algorithms that view each object data as a single sample drawn from a certain distribution have been a hot topic for decades. Many clustering algorithms, such as k-means and spectral clustering, are proposed based on the assumption that each clustering object is a vector generated by a Gaussian distribution. However, in real practice, each input object is usually a set of vectors drawn from a certain hidden distribution. Traditional clustering algorithms cannot handle such a situation. This fact calls for the multiple samples clustering algorithm. In this paper, we propose two algorithms for multiple samples clustering: Wasserstein distance based spectral clustering and Bhattacharyya distance based spectral clustering, and compare them with the traditional spectral clustering. The simulation results show that the second-moment information can greatly improve the clustering accuracy and stability. These algorithms are applied to the stock dataset to separate stocks into different groups based on their historical prices. Investors can make investment decisions based on the clustering information, to invest stocks in the same cluster and get the highest earning or to invest stocks of different clusters to avoid the risk.\",\"PeriodicalId\":371055,\"journal\":{\"name\":\"Proceedings of the 2020 3rd International Conference on Artificial Intelligence and Pattern Recognition\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-06-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2020 3rd International Conference on Artificial Intelligence and Pattern Recognition\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3430199.3430223\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 3rd International Conference on Artificial Intelligence and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3430199.3430223","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

将每个对象数据视为从特定分布中抽取的单个样本的聚类算法已经成为几十年来的热门话题。许多聚类算法，如k-means和谱聚类，都是基于假设每个聚类对象是由高斯分布产生的向量。然而，在实际操作中，每个输入对象通常是从某个隐藏分布中抽取的向量集合。传统的聚类算法无法处理这种情况。这就需要多样本聚类算法。本文提出了基于Wasserstein距离的光谱聚类和基于Bhattacharyya距离的光谱聚类两种多样本聚类算法，并与传统的光谱聚类进行了比较。仿真结果表明，利用二阶矩信息可以大大提高聚类的精度和稳定性。这些算法应用于股票数据集，根据历史价格将股票分成不同的组。投资者可以根据聚类信息进行投资决策，投资同一集群的股票以获得最高收益，或者投资不同集群的股票以规避风险。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Multiple Samples Clustering with Second-moment Information in Stock Clustering

The clustering algorithms that view each object data as a single sample drawn from a certain distribution have been a hot topic for decades. Many clustering algorithms, such as k-means and spectral clustering, are proposed based on the assumption that each clustering object is a vector generated by a Gaussian distribution. However, in real practice, each input object is usually a set of vectors drawn from a certain hidden distribution. Traditional clustering algorithms cannot handle such a situation. This fact calls for the multiple samples clustering algorithm. In this paper, we propose two algorithms for multiple samples clustering: Wasserstein distance based spectral clustering and Bhattacharyya distance based spectral clustering, and compare them with the traditional spectral clustering. The simulation results show that the second-moment information can greatly improve the clustering accuracy and stability. These algorithms are applied to the stock dataset to separate stocks into different groups based on their historical prices. Investors can make investment decisions based on the clustering information, to invest stocks in the same cluster and get the highest earning or to invest stocks of different clusters to avoid the risk.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2020 3rd International Conference on Artificial Intelligence and Pattern Recognition

自引率

0.00%

发文量