{"title":"Comparative Study of Topic Modeling and Word Embedding Approaches for Web Service Clustering","authors":"N. Agarwal, Geeta Sikka, L. Awasthi","doi":"10.1145/3474124.3474169","DOIUrl":null,"url":null,"abstract":"Vector space representation of web services plays a prominent role in enhancing the performance of different web service-based processes like clustering, recommendation, ranking, discovery, etc. Generally, Term Frequency - Inverse Document Frequency (TF-IDF) and topic modeling methods are widely used for service representation. In recent years, word embedding techniques have attracted researchers a lot because they can map services or documents based on semantic similarity. This paper provides a comparative analysis of two topic modeling techniques, i.e., Latent Dirichlet Allocation (LDA) and Gibbs Sampling algorithm for Dirichlet Multinomial Mixture (GSDMM) & two word embedding techniques, i.e., word2vec and fastText. These topic modeling and word embedding techniques are applied to a dataset of web service documents for vector space representation. K-Means clustering is used to analyze the performance, and results are evaluated based on standard evaluation criteria. Results demonstrate that word2vec model outperforms other techniques and provides a satisfactory improvement on clustering.","PeriodicalId":144611,"journal":{"name":"2021 Thirteenth International Conference on Contemporary Computing (IC3-2021)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Thirteenth International Conference on Contemporary Computing (IC3-2021)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3474124.3474169","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Vector space representation of web services plays a prominent role in enhancing the performance of different web service-based processes like clustering, recommendation, ranking, discovery, etc. Generally, Term Frequency - Inverse Document Frequency (TF-IDF) and topic modeling methods are widely used for service representation. In recent years, word embedding techniques have attracted researchers a lot because they can map services or documents based on semantic similarity. This paper provides a comparative analysis of two topic modeling techniques, i.e., Latent Dirichlet Allocation (LDA) and Gibbs Sampling algorithm for Dirichlet Multinomial Mixture (GSDMM) & two word embedding techniques, i.e., word2vec and fastText. These topic modeling and word embedding techniques are applied to a dataset of web service documents for vector space representation. K-Means clustering is used to analyze the performance, and results are evaluated based on standard evaluation criteria. Results demonstrate that word2vec model outperforms other techniques and provides a satisfactory improvement on clustering.