{"title":"基于k均值和三元特征向量的文本数据流聚类方法","authors":"M. PhridviRaj, C. V. Rao","doi":"10.1145/2832987.2833081","DOIUrl":null,"url":null,"abstract":"Clustering text data streams is an unsupervised learning process which requires handling data streams. In the current work, we find the pair wise distance between customer transactions using the transaction similarity measure and obtain corresponding pair wise distance matrix. This pair wise distance matrix is then used to cluster the data streams such as customer transactions which are generated continuously in super markets and stored in to the database. For clustering, customer transactions, we use the k-means clustering algorithm. The input to k-means algorithm is the distance matrix in contrast to conventional approach which does not use the distance matrix. Finally, we define the proposed distance measure and validate it using the case study. We compare the results obtained using this approach with the one obtained using conventional k-means.","PeriodicalId":416001,"journal":{"name":"Proceedings of the The International Conference on Engineering & MIS 2015","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":"{\"title\":\"An Approach for Clustering Text Data Streams Using K-means and Ternary Feature Vector Based Similarity Measure\",\"authors\":\"M. PhridviRaj, C. V. Rao\",\"doi\":\"10.1145/2832987.2833081\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Clustering text data streams is an unsupervised learning process which requires handling data streams. In the current work, we find the pair wise distance between customer transactions using the transaction similarity measure and obtain corresponding pair wise distance matrix. This pair wise distance matrix is then used to cluster the data streams such as customer transactions which are generated continuously in super markets and stored in to the database. For clustering, customer transactions, we use the k-means clustering algorithm. The input to k-means algorithm is the distance matrix in contrast to conventional approach which does not use the distance matrix. Finally, we define the proposed distance measure and validate it using the case study. We compare the results obtained using this approach with the one obtained using conventional k-means.\",\"PeriodicalId\":416001,\"journal\":{\"name\":\"Proceedings of the The International Conference on Engineering & MIS 2015\",\"volume\":\"36 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-09-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"16\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the The International Conference on Engineering & MIS 2015\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2832987.2833081\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the The International Conference on Engineering & MIS 2015","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2832987.2833081","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An Approach for Clustering Text Data Streams Using K-means and Ternary Feature Vector Based Similarity Measure
Clustering text data streams is an unsupervised learning process which requires handling data streams. In the current work, we find the pair wise distance between customer transactions using the transaction similarity measure and obtain corresponding pair wise distance matrix. This pair wise distance matrix is then used to cluster the data streams such as customer transactions which are generated continuously in super markets and stored in to the database. For clustering, customer transactions, we use the k-means clustering algorithm. The input to k-means algorithm is the distance matrix in contrast to conventional approach which does not use the distance matrix. Finally, we define the proposed distance measure and validate it using the case study. We compare the results obtained using this approach with the one obtained using conventional k-means.