基于二矩信息的多样本聚类

Xiang Wang
{"title":"基于二矩信息的多样本聚类","authors":"Xiang Wang","doi":"10.1145/3430199.3430223","DOIUrl":null,"url":null,"abstract":"The clustering algorithms that view each object data as a single sample drawn from a certain distribution have been a hot topic for decades. Many clustering algorithms, such as k-means and spectral clustering, are proposed based on the assumption that each clustering object is a vector generated by a Gaussian distribution. However, in real practice, each input object is usually a set of vectors drawn from a certain hidden distribution. Traditional clustering algorithms cannot handle such a situation. This fact calls for the multiple samples clustering algorithm. In this paper, we propose two algorithms for multiple samples clustering: Wasserstein distance based spectral clustering and Bhattacharyya distance based spectral clustering, and compare them with the traditional spectral clustering. The simulation results show that the second-moment information can greatly improve the clustering accuracy and stability. These algorithms are applied to the stock dataset to separate stocks into different groups based on their historical prices. Investors can make investment decisions based on the clustering information, to invest stocks in the same cluster and get the highest earning or to invest stocks of different clusters to avoid the risk.","PeriodicalId":371055,"journal":{"name":"Proceedings of the 2020 3rd International Conference on Artificial Intelligence and Pattern Recognition","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multiple Samples Clustering with Second-moment Information in Stock Clustering\",\"authors\":\"Xiang Wang\",\"doi\":\"10.1145/3430199.3430223\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The clustering algorithms that view each object data as a single sample drawn from a certain distribution have been a hot topic for decades. Many clustering algorithms, such as k-means and spectral clustering, are proposed based on the assumption that each clustering object is a vector generated by a Gaussian distribution. However, in real practice, each input object is usually a set of vectors drawn from a certain hidden distribution. Traditional clustering algorithms cannot handle such a situation. This fact calls for the multiple samples clustering algorithm. In this paper, we propose two algorithms for multiple samples clustering: Wasserstein distance based spectral clustering and Bhattacharyya distance based spectral clustering, and compare them with the traditional spectral clustering. The simulation results show that the second-moment information can greatly improve the clustering accuracy and stability. These algorithms are applied to the stock dataset to separate stocks into different groups based on their historical prices. Investors can make investment decisions based on the clustering information, to invest stocks in the same cluster and get the highest earning or to invest stocks of different clusters to avoid the risk.\",\"PeriodicalId\":371055,\"journal\":{\"name\":\"Proceedings of the 2020 3rd International Conference on Artificial Intelligence and Pattern Recognition\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-06-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2020 3rd International Conference on Artificial Intelligence and Pattern Recognition\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3430199.3430223\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 3rd International Conference on Artificial Intelligence and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3430199.3430223","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

将每个对象数据视为从特定分布中抽取的单个样本的聚类算法已经成为几十年来的热门话题。许多聚类算法,如k-means和谱聚类,都是基于假设每个聚类对象是由高斯分布产生的向量。然而,在实际操作中,每个输入对象通常是从某个隐藏分布中抽取的向量集合。传统的聚类算法无法处理这种情况。这就需要多样本聚类算法。本文提出了基于Wasserstein距离的光谱聚类和基于Bhattacharyya距离的光谱聚类两种多样本聚类算法,并与传统的光谱聚类进行了比较。仿真结果表明,利用二阶矩信息可以大大提高聚类的精度和稳定性。这些算法应用于股票数据集,根据历史价格将股票分成不同的组。投资者可以根据聚类信息进行投资决策,投资同一集群的股票以获得最高收益,或者投资不同集群的股票以规避风险。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Multiple Samples Clustering with Second-moment Information in Stock Clustering
The clustering algorithms that view each object data as a single sample drawn from a certain distribution have been a hot topic for decades. Many clustering algorithms, such as k-means and spectral clustering, are proposed based on the assumption that each clustering object is a vector generated by a Gaussian distribution. However, in real practice, each input object is usually a set of vectors drawn from a certain hidden distribution. Traditional clustering algorithms cannot handle such a situation. This fact calls for the multiple samples clustering algorithm. In this paper, we propose two algorithms for multiple samples clustering: Wasserstein distance based spectral clustering and Bhattacharyya distance based spectral clustering, and compare them with the traditional spectral clustering. The simulation results show that the second-moment information can greatly improve the clustering accuracy and stability. These algorithms are applied to the stock dataset to separate stocks into different groups based on their historical prices. Investors can make investment decisions based on the clustering information, to invest stocks in the same cluster and get the highest earning or to invest stocks of different clusters to avoid the risk.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信