{"title":"聚类对词聚类降维的意义","authors":"Toshinori Deguchi, Sin-Yeong Seo, Naohiro Ishii","doi":"10.1109/IIAIAAI55812.2022.00072","DOIUrl":null,"url":null,"abstract":"In text mining, Latent Semantic Analysis (LSA) is the popular method to reduce the dimension of document vectors. Since LSA produces a set of topics by statistical information, the meaning of each topic is not clear.We proposed a method to reduce the dimension by clustering the words in the documents. This method produces a set of clusters of words instead of topics. Using Word2vec to vectorize the words, the mean vector of the cluster is calculated, which shows the meaning of the cluster.In this paper, we show the dimensionality reduction and the meaning of the generated clusters by word cloud, on document classification problem with a subset of BBC Dataset.","PeriodicalId":156230,"journal":{"name":"2022 12th International Congress on Advanced Applied Informatics (IIAI-AAI)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Meaning of the Clusters on Dimensionality Reduction by Word Clustering\",\"authors\":\"Toshinori Deguchi, Sin-Yeong Seo, Naohiro Ishii\",\"doi\":\"10.1109/IIAIAAI55812.2022.00072\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In text mining, Latent Semantic Analysis (LSA) is the popular method to reduce the dimension of document vectors. Since LSA produces a set of topics by statistical information, the meaning of each topic is not clear.We proposed a method to reduce the dimension by clustering the words in the documents. This method produces a set of clusters of words instead of topics. Using Word2vec to vectorize the words, the mean vector of the cluster is calculated, which shows the meaning of the cluster.In this paper, we show the dimensionality reduction and the meaning of the generated clusters by word cloud, on document classification problem with a subset of BBC Dataset.\",\"PeriodicalId\":156230,\"journal\":{\"name\":\"2022 12th International Congress on Advanced Applied Informatics (IIAI-AAI)\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 12th International Congress on Advanced Applied Informatics (IIAI-AAI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IIAIAAI55812.2022.00072\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 12th International Congress on Advanced Applied Informatics (IIAI-AAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IIAIAAI55812.2022.00072","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Meaning of the Clusters on Dimensionality Reduction by Word Clustering
In text mining, Latent Semantic Analysis (LSA) is the popular method to reduce the dimension of document vectors. Since LSA produces a set of topics by statistical information, the meaning of each topic is not clear.We proposed a method to reduce the dimension by clustering the words in the documents. This method produces a set of clusters of words instead of topics. Using Word2vec to vectorize the words, the mean vector of the cluster is calculated, which shows the meaning of the cluster.In this paper, we show the dimensionality reduction and the meaning of the generated clusters by word cloud, on document classification problem with a subset of BBC Dataset.