Machine learning with word embedding for detecting web-services anti-patterns

IF 1.8 3区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Journal of Computer Languages Pub Date : 2023-06-01 DOI:10.1016/j.cola.2023.101207

Lov Kumar , Sahithi Tummalapalli , Sonika Chandrakant Rathi , Lalita Bhanu Murthy , Aneesh Krishna , Sanjay Misra

{"title":"Machine learning with word embedding for detecting web-services anti-patterns","authors":"Lov Kumar , Sahithi Tummalapalli , Sonika Chandrakant Rathi , Lalita Bhanu Murthy , Aneesh Krishna , Sanjay Misra","doi":"10.1016/j.cola.2023.101207","DOIUrl":null,"url":null,"abstract":"<div><p>Software design Anti-pattern is the common feedback to a recurring problem that is ineffective and has a high risk of failure. Early prediction of these Anti-patterns helps reduce the design process’s efforts, resources, and costs. In earlier research, static code or Web Service Description Language (WSDL) metrics were used to develop anti-pattern prediction models. These source code metrics are calculated at either file-level or system-level. So, the values of these metrics are frequently dependent on assumptions that are not defined or standardized and might vary depending on the tools available. This study aims to develop a machine learning-based Anti-patterns prediction model using natural language processing techniques for representing the WSDL file as an input. In this research, the four-word embedding methods have been used to process the WSDL file. The processed outputs are used as input to the models trained using thirty-three classifier techniques. This study also uses eight feature selection techniques to remove ineffective features and five data sampling techniques to handle the class imbalance nature of the datasets. The results indicate that the developed models using text metrics perform better than the static code or WSDL metrics. Additionally, the results suggest that selecting features using feature selection and balancing data using sampling techniques helps improve the models’ performance.</p></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"75 ","pages":"Article 101207"},"PeriodicalIF":1.8000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computer Languages","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590118423000175","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Software design Anti-pattern is the common feedback to a recurring problem that is ineffective and has a high risk of failure. Early prediction of these Anti-patterns helps reduce the design process’s efforts, resources, and costs. In earlier research, static code or Web Service Description Language (WSDL) metrics were used to develop anti-pattern prediction models. These source code metrics are calculated at either file-level or system-level. So, the values of these metrics are frequently dependent on assumptions that are not defined or standardized and might vary depending on the tools available. This study aims to develop a machine learning-based Anti-patterns prediction model using natural language processing techniques for representing the WSDL file as an input. In this research, the four-word embedding methods have been used to process the WSDL file. The processed outputs are used as input to the models trained using thirty-three classifier techniques. This study also uses eight feature selection techniques to remove ineffective features and five data sampling techniques to handle the class imbalance nature of the datasets. The results indicate that the developed models using text metrics perform better than the static code or WSDL metrics. Additionally, the results suggest that selecting features using feature selection and balancing data using sampling techniques helps improve the models’ performance.

查看原文本刊更多论文

基于单词嵌入的机器学习检测web服务反模式

软件设计反模式是对重复出现的问题的常见反馈，该问题无效且有很高的失败风险。这些反模式的早期预测有助于减少设计过程的工作量、资源和成本。在早期的研究中，静态代码或Web服务描述语言（WSDL）度量被用于开发反模式预测模型。这些源代码度量是在文件级别或系统级别计算的。因此，这些指标的值通常取决于未定义或标准化的假设，并且可能因可用工具而异。本研究旨在开发一个基于机器学习的反模式预测模型，该模型使用自然语言处理技术将WSDL文件表示为输入。在本研究中，使用了四个单词的嵌入方法来处理WSDL文件。处理后的输出被用作使用三十三种分类器技术训练的模型的输入。本研究还使用了八种特征选择技术来去除无效特征，并使用了五种数据采样技术来处理数据集的类不平衡性质。结果表明，使用文本度量的开发模型的性能优于静态代码或WSDL度量。此外，研究结果表明，使用特征选择和采样技术平衡数据来选择特征有助于提高模型的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Computer Languages Computer Science-Computer Networks and Communications

CiteScore

5.00

自引率

13.60%

发文量