{"title":"基于二矩信息的多样本聚类","authors":"Xiang Wang","doi":"10.1145/3430199.3430223","DOIUrl":null,"url":null,"abstract":"The clustering algorithms that view each object data as a single sample drawn from a certain distribution have been a hot topic for decades. Many clustering algorithms, such as k-means and spectral clustering, are proposed based on the assumption that each clustering object is a vector generated by a Gaussian distribution. However, in real practice, each input object is usually a set of vectors drawn from a certain hidden distribution. Traditional clustering algorithms cannot handle such a situation. This fact calls for the multiple samples clustering algorithm. In this paper, we propose two algorithms for multiple samples clustering: Wasserstein distance based spectral clustering and Bhattacharyya distance based spectral clustering, and compare them with the traditional spectral clustering. The simulation results show that the second-moment information can greatly improve the clustering accuracy and stability. These algorithms are applied to the stock dataset to separate stocks into different groups based on their historical prices. Investors can make investment decisions based on the clustering information, to invest stocks in the same cluster and get the highest earning or to invest stocks of different clusters to avoid the risk.","PeriodicalId":371055,"journal":{"name":"Proceedings of the 2020 3rd International Conference on Artificial Intelligence and Pattern Recognition","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multiple Samples Clustering with Second-moment Information in Stock Clustering\",\"authors\":\"Xiang Wang\",\"doi\":\"10.1145/3430199.3430223\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The clustering algorithms that view each object data as a single sample drawn from a certain distribution have been a hot topic for decades. Many clustering algorithms, such as k-means and spectral clustering, are proposed based on the assumption that each clustering object is a vector generated by a Gaussian distribution. However, in real practice, each input object is usually a set of vectors drawn from a certain hidden distribution. Traditional clustering algorithms cannot handle such a situation. This fact calls for the multiple samples clustering algorithm. In this paper, we propose two algorithms for multiple samples clustering: Wasserstein distance based spectral clustering and Bhattacharyya distance based spectral clustering, and compare them with the traditional spectral clustering. The simulation results show that the second-moment information can greatly improve the clustering accuracy and stability. These algorithms are applied to the stock dataset to separate stocks into different groups based on their historical prices. Investors can make investment decisions based on the clustering information, to invest stocks in the same cluster and get the highest earning or to invest stocks of different clusters to avoid the risk.\",\"PeriodicalId\":371055,\"journal\":{\"name\":\"Proceedings of the 2020 3rd International Conference on Artificial Intelligence and Pattern Recognition\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-06-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2020 3rd International Conference on Artificial Intelligence and Pattern Recognition\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3430199.3430223\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 3rd International Conference on Artificial Intelligence and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3430199.3430223","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Multiple Samples Clustering with Second-moment Information in Stock Clustering
The clustering algorithms that view each object data as a single sample drawn from a certain distribution have been a hot topic for decades. Many clustering algorithms, such as k-means and spectral clustering, are proposed based on the assumption that each clustering object is a vector generated by a Gaussian distribution. However, in real practice, each input object is usually a set of vectors drawn from a certain hidden distribution. Traditional clustering algorithms cannot handle such a situation. This fact calls for the multiple samples clustering algorithm. In this paper, we propose two algorithms for multiple samples clustering: Wasserstein distance based spectral clustering and Bhattacharyya distance based spectral clustering, and compare them with the traditional spectral clustering. The simulation results show that the second-moment information can greatly improve the clustering accuracy and stability. These algorithms are applied to the stock dataset to separate stocks into different groups based on their historical prices. Investors can make investment decisions based on the clustering information, to invest stocks in the same cluster and get the highest earning or to invest stocks of different clusters to avoid the risk.