基于数据挖掘方法的高维库存数据聚类

2019 16th International Conference on Service Systems and Service Management (ICSSSM) Pub Date : 2019-07-01 DOI:10.1109/ICSSSM.2019.8887724

Dhea Indriyanti, Arian Dhini

{"title":"基于数据挖掘方法的高维库存数据聚类","authors":"Dhea Indriyanti, Arian Dhini","doi":"10.1109/ICSSSM.2019.8887724","DOIUrl":null,"url":null,"abstract":"In recent year, stock investor in Indonesia increased rapidly, so it is required to do analysis about the stock that helps the investor in their investment plan. Clustering is beneficial to select the appropriate stock for investors. Unfortunately, stock prices keep varying from time to time. Consequently, it is not an easy work to select the stock for investment. In addition, stock price time series data are high dimensional data that influenced by many factors. In this study, high dimensional data are obtained by the time frame of each factor. Therefore, it is important to use a suitable technique to cluster high dimensional data. This paper presents High Dimensional Data Clustering (HDDC), a model-based clustering based on Gaussian Mixture Model, using the Expectation-Maximization (EM) algorithm. HDDC via EM algorithm gives a more robust result, and it possible to make an additional assumption. Moreover, this paper combines a high-dimensional clustering technique HDDC via EM algorithm and the most popular feature extraction technique Principal Component Analysis (PCA). This paper comparing methods of clustering technique HDDC and the combination between HDDC and PCA to know the most effective method which gives better result in clustering high-dimensional time series data. The 155 data features are reduced to 7 principal components using PCA analysis. Despite PCA has increased the time efficiency of building the model, clustering technique HDDC via EM algorithm enables to handle the high-dimensional data better than the combination with PCA.","PeriodicalId":442421,"journal":{"name":"2019 16th International Conference on Service Systems and Service Management (ICSSSM)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Clustering High-Dimensional Stock Data using Data Mining Approach\",\"authors\":\"Dhea Indriyanti, Arian Dhini\",\"doi\":\"10.1109/ICSSSM.2019.8887724\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent year, stock investor in Indonesia increased rapidly, so it is required to do analysis about the stock that helps the investor in their investment plan. Clustering is beneficial to select the appropriate stock for investors. Unfortunately, stock prices keep varying from time to time. Consequently, it is not an easy work to select the stock for investment. In addition, stock price time series data are high dimensional data that influenced by many factors. In this study, high dimensional data are obtained by the time frame of each factor. Therefore, it is important to use a suitable technique to cluster high dimensional data. This paper presents High Dimensional Data Clustering (HDDC), a model-based clustering based on Gaussian Mixture Model, using the Expectation-Maximization (EM) algorithm. HDDC via EM algorithm gives a more robust result, and it possible to make an additional assumption. Moreover, this paper combines a high-dimensional clustering technique HDDC via EM algorithm and the most popular feature extraction technique Principal Component Analysis (PCA). This paper comparing methods of clustering technique HDDC and the combination between HDDC and PCA to know the most effective method which gives better result in clustering high-dimensional time series data. The 155 data features are reduced to 7 principal components using PCA analysis. Despite PCA has increased the time efficiency of building the model, clustering technique HDDC via EM algorithm enables to handle the high-dimensional data better than the combination with PCA.\",\"PeriodicalId\":442421,\"journal\":{\"name\":\"2019 16th International Conference on Service Systems and Service Management (ICSSSM)\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 16th International Conference on Service Systems and Service Management (ICSSSM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSSSM.2019.8887724\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 16th International Conference on Service Systems and Service Management (ICSSSM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSSSM.2019.8887724","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

近年来，印度尼西亚的股票投资者迅速增加，因此需要对股票进行分析，以帮助投资者制定投资计划。聚类有利于投资者选择合适的股票。不幸的是，股票价格不时变化。因此，选择股票进行投资并不是一件容易的事。此外，股票价格时间序列数据是受多种因素影响的高维数据。在本研究中，通过各因素的时间框架获得高维数据。因此，采用合适的技术对高维数据进行聚类是非常重要的。本文提出了一种基于高斯混合模型的高维数据聚类(HDDC)算法，该算法采用期望最大化(EM)算法。基于EM算法的HDDC具有更强的鲁棒性，并且可以进行额外的假设。此外，本文将基于EM算法的高维聚类技术HDDC与最流行的特征提取技术主成分分析(PCA)相结合。本文比较了HDDC聚类方法和HDDC与PCA相结合的聚类方法，找出了对高维时间序列数据聚类效果较好的最有效方法。利用主成分分析法将155个数据特征简化为7个主成分。尽管PCA提高了构建模型的时间效率，但是基于EM算法的聚类技术HDDC能够比结合PCA更好地处理高维数据。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Clustering High-Dimensional Stock Data using Data Mining Approach

In recent year, stock investor in Indonesia increased rapidly, so it is required to do analysis about the stock that helps the investor in their investment plan. Clustering is beneficial to select the appropriate stock for investors. Unfortunately, stock prices keep varying from time to time. Consequently, it is not an easy work to select the stock for investment. In addition, stock price time series data are high dimensional data that influenced by many factors. In this study, high dimensional data are obtained by the time frame of each factor. Therefore, it is important to use a suitable technique to cluster high dimensional data. This paper presents High Dimensional Data Clustering (HDDC), a model-based clustering based on Gaussian Mixture Model, using the Expectation-Maximization (EM) algorithm. HDDC via EM algorithm gives a more robust result, and it possible to make an additional assumption. Moreover, this paper combines a high-dimensional clustering technique HDDC via EM algorithm and the most popular feature extraction technique Principal Component Analysis (PCA). This paper comparing methods of clustering technique HDDC and the combination between HDDC and PCA to know the most effective method which gives better result in clustering high-dimensional time series data. The 155 data features are reduced to 7 principal components using PCA analysis. Despite PCA has increased the time efficiency of building the model, clustering technique HDDC via EM algorithm enables to handle the high-dimensional data better than the combination with PCA.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 16th International Conference on Service Systems and Service Management (ICSSSM)

自引率

0.00%

发文量