{"title":"Web服务聚类中主题建模与词嵌入方法的比较研究","authors":"N. Agarwal, Geeta Sikka, L. Awasthi","doi":"10.1145/3474124.3474169","DOIUrl":null,"url":null,"abstract":"Vector space representation of web services plays a prominent role in enhancing the performance of different web service-based processes like clustering, recommendation, ranking, discovery, etc. Generally, Term Frequency - Inverse Document Frequency (TF-IDF) and topic modeling methods are widely used for service representation. In recent years, word embedding techniques have attracted researchers a lot because they can map services or documents based on semantic similarity. This paper provides a comparative analysis of two topic modeling techniques, i.e., Latent Dirichlet Allocation (LDA) and Gibbs Sampling algorithm for Dirichlet Multinomial Mixture (GSDMM) & two word embedding techniques, i.e., word2vec and fastText. These topic modeling and word embedding techniques are applied to a dataset of web service documents for vector space representation. K-Means clustering is used to analyze the performance, and results are evaluated based on standard evaluation criteria. Results demonstrate that word2vec model outperforms other techniques and provides a satisfactory improvement on clustering.","PeriodicalId":144611,"journal":{"name":"2021 Thirteenth International Conference on Contemporary Computing (IC3-2021)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparative Study of Topic Modeling and Word Embedding Approaches for Web Service Clustering\",\"authors\":\"N. Agarwal, Geeta Sikka, L. Awasthi\",\"doi\":\"10.1145/3474124.3474169\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Vector space representation of web services plays a prominent role in enhancing the performance of different web service-based processes like clustering, recommendation, ranking, discovery, etc. Generally, Term Frequency - Inverse Document Frequency (TF-IDF) and topic modeling methods are widely used for service representation. In recent years, word embedding techniques have attracted researchers a lot because they can map services or documents based on semantic similarity. This paper provides a comparative analysis of two topic modeling techniques, i.e., Latent Dirichlet Allocation (LDA) and Gibbs Sampling algorithm for Dirichlet Multinomial Mixture (GSDMM) & two word embedding techniques, i.e., word2vec and fastText. These topic modeling and word embedding techniques are applied to a dataset of web service documents for vector space representation. K-Means clustering is used to analyze the performance, and results are evaluated based on standard evaluation criteria. Results demonstrate that word2vec model outperforms other techniques and provides a satisfactory improvement on clustering.\",\"PeriodicalId\":144611,\"journal\":{\"name\":\"2021 Thirteenth International Conference on Contemporary Computing (IC3-2021)\",\"volume\":\"76 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-08-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 Thirteenth International Conference on Contemporary Computing (IC3-2021)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3474124.3474169\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Thirteenth International Conference on Contemporary Computing (IC3-2021)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3474124.3474169","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Comparative Study of Topic Modeling and Word Embedding Approaches for Web Service Clustering
Vector space representation of web services plays a prominent role in enhancing the performance of different web service-based processes like clustering, recommendation, ranking, discovery, etc. Generally, Term Frequency - Inverse Document Frequency (TF-IDF) and topic modeling methods are widely used for service representation. In recent years, word embedding techniques have attracted researchers a lot because they can map services or documents based on semantic similarity. This paper provides a comparative analysis of two topic modeling techniques, i.e., Latent Dirichlet Allocation (LDA) and Gibbs Sampling algorithm for Dirichlet Multinomial Mixture (GSDMM) & two word embedding techniques, i.e., word2vec and fastText. These topic modeling and word embedding techniques are applied to a dataset of web service documents for vector space representation. K-Means clustering is used to analyze the performance, and results are evaluated based on standard evaluation criteria. Results demonstrate that word2vec model outperforms other techniques and provides a satisfactory improvement on clustering.