{"title":"Enhancing Clustering Performance Using Topic Modeling-Based Dimensionality Reduction","authors":"T. Ramathulasi, M. Babu","doi":"10.4018/ijossp.300755","DOIUrl":null,"url":null,"abstract":"Mainly in the present times, the description of the services and their working procedure have been established in natural text language. We have obtained service groups based on their similarities to reduce search space and time in service innovation. Major topic models such as LSA, LDA, and CTM policies have not been able to show effective performance due to the short description and limited description of services in text form, the reduction or absence of words that occur. To solve the issues created by brief text, the Dirichlet Multinomial Mixer model (DMM) with features representation using the Gibbs algorithm has been developed to reduce dimensionality in clustering and enhance performance. The launch results prove that DMM-Gibbs can give better results than all other methods with agglomerative or K-means clustering methods by sampling. Evaluations with internal and external criteria were used to calculate clustering performance based on these two values. Using this standard model, the dimensionality can be reduced to 93.13% and better clustering performance can be achieved.","PeriodicalId":53605,"journal":{"name":"International Journal of Open Source Software and Processes","volume":"23 1","pages":"1-16"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Open Source Software and Processes","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4018/ijossp.300755","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Computer Science","Score":null,"Total":0}
引用次数: 1
Abstract
Mainly in the present times, the description of the services and their working procedure have been established in natural text language. We have obtained service groups based on their similarities to reduce search space and time in service innovation. Major topic models such as LSA, LDA, and CTM policies have not been able to show effective performance due to the short description and limited description of services in text form, the reduction or absence of words that occur. To solve the issues created by brief text, the Dirichlet Multinomial Mixer model (DMM) with features representation using the Gibbs algorithm has been developed to reduce dimensionality in clustering and enhance performance. The launch results prove that DMM-Gibbs can give better results than all other methods with agglomerative or K-means clustering methods by sampling. Evaluations with internal and external criteria were used to calculate clustering performance based on these two values. Using this standard model, the dimensionality can be reduced to 93.13% and better clustering performance can be achieved.
期刊介绍:
The International Journal of Open Source Software and Processes (IJOSSP) publishes high-quality peer-reviewed and original research articles on the large field of open source software and processes. This wide area entails many intriguing question and facets, including the special development process performed by a large number of geographically dispersed programmers, community issues like coordination and communication, motivations of the participants, and also economic and legal issues. Beyond this topic, open source software is an example of a highly distributed innovation process led by the users. Therefore, many aspects have relevance beyond the realm of software and its development. In this tradition, IJOSSP also publishes papers on these topics. IJOSSP is a multi-disciplinary outlet, and welcomes submissions from all relevant fields of research and applying a multitude of research approaches.