使用基于主题建模的降维增强聚类性能

Q4 Computer Science
T. Ramathulasi, M. Babu
{"title":"使用基于主题建模的降维增强聚类性能","authors":"T. Ramathulasi, M. Babu","doi":"10.4018/ijossp.300755","DOIUrl":null,"url":null,"abstract":"Mainly in the present times, the description of the services and their working procedure have been established in natural text language. We have obtained service groups based on their similarities to reduce search space and time in service innovation. Major topic models such as LSA, LDA, and CTM policies have not been able to show effective performance due to the short description and limited description of services in text form, the reduction or absence of words that occur. To solve the issues created by brief text, the Dirichlet Multinomial Mixer model (DMM) with features representation using the Gibbs algorithm has been developed to reduce dimensionality in clustering and enhance performance. The launch results prove that DMM-Gibbs can give better results than all other methods with agglomerative or K-means clustering methods by sampling. Evaluations with internal and external criteria were used to calculate clustering performance based on these two values. Using this standard model, the dimensionality can be reduced to 93.13% and better clustering performance can be achieved.","PeriodicalId":53605,"journal":{"name":"International Journal of Open Source Software and Processes","volume":"23 1","pages":"1-16"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Enhancing Clustering Performance Using Topic Modeling-Based Dimensionality Reduction\",\"authors\":\"T. Ramathulasi, M. Babu\",\"doi\":\"10.4018/ijossp.300755\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Mainly in the present times, the description of the services and their working procedure have been established in natural text language. We have obtained service groups based on their similarities to reduce search space and time in service innovation. Major topic models such as LSA, LDA, and CTM policies have not been able to show effective performance due to the short description and limited description of services in text form, the reduction or absence of words that occur. To solve the issues created by brief text, the Dirichlet Multinomial Mixer model (DMM) with features representation using the Gibbs algorithm has been developed to reduce dimensionality in clustering and enhance performance. The launch results prove that DMM-Gibbs can give better results than all other methods with agglomerative or K-means clustering methods by sampling. Evaluations with internal and external criteria were used to calculate clustering performance based on these two values. Using this standard model, the dimensionality can be reduced to 93.13% and better clustering performance can be achieved.\",\"PeriodicalId\":53605,\"journal\":{\"name\":\"International Journal of Open Source Software and Processes\",\"volume\":\"23 1\",\"pages\":\"1-16\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Open Source Software and Processes\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4018/ijossp.300755\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"Computer Science\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Open Source Software and Processes","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4018/ijossp.300755","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Computer Science","Score":null,"Total":0}
引用次数: 1

摘要

目前,主要是用自然文本语言来描述服务及其工作流程。在服务创新中,我们根据服务组的相似性得到服务组,减少了搜索的空间和时间。主要的主题模型,如LSA、LDA和CTM策略,由于以文本形式描述服务的简短和有限的描述,减少或没有出现单词,因此无法显示出有效的性能。为了解决短文本产生的问题,提出了基于Gibbs算法的Dirichlet多项式混合器模型(Dirichlet Multinomial Mixer model, DMM)来降低聚类的维数,提高聚类的性能。发射结果证明,DMM-Gibbs方法比其他所有采用聚集或K-means聚类方法的抽样方法都能给出更好的结果。使用内部和外部标准的评估来计算基于这两个值的聚类性能。使用该标准模型,可以将维数降至93.13%,获得更好的聚类性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Enhancing Clustering Performance Using Topic Modeling-Based Dimensionality Reduction
Mainly in the present times, the description of the services and their working procedure have been established in natural text language. We have obtained service groups based on their similarities to reduce search space and time in service innovation. Major topic models such as LSA, LDA, and CTM policies have not been able to show effective performance due to the short description and limited description of services in text form, the reduction or absence of words that occur. To solve the issues created by brief text, the Dirichlet Multinomial Mixer model (DMM) with features representation using the Gibbs algorithm has been developed to reduce dimensionality in clustering and enhance performance. The launch results prove that DMM-Gibbs can give better results than all other methods with agglomerative or K-means clustering methods by sampling. Evaluations with internal and external criteria were used to calculate clustering performance based on these two values. Using this standard model, the dimensionality can be reduced to 93.13% and better clustering performance can be achieved.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
1.90
自引率
0.00%
发文量
16
期刊介绍: The International Journal of Open Source Software and Processes (IJOSSP) publishes high-quality peer-reviewed and original research articles on the large field of open source software and processes. This wide area entails many intriguing question and facets, including the special development process performed by a large number of geographically dispersed programmers, community issues like coordination and communication, motivations of the participants, and also economic and legal issues. Beyond this topic, open source software is an example of a highly distributed innovation process led by the users. Therefore, many aspects have relevance beyond the realm of software and its development. In this tradition, IJOSSP also publishes papers on these topics. IJOSSP is a multi-disciplinary outlet, and welcomes submissions from all relevant fields of research and applying a multitude of research approaches.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信