Improving Multi-Class Code Readability Classification with An Enhanced Data Augmentation Approach (130)

IF 0.6 4区计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Software Engineering and Knowledge Engineering Pub Date : 2022-11-18 DOI:10.1142/s0218194022500656

Qing Mi, Luo Wang, Lisha Hu, Liwei Ou, Yang Yu

{"title":"Improving Multi-Class Code Readability Classification with An Enhanced Data Augmentation Approach (130)","authors":"Qing Mi, Luo Wang, Lisha Hu, Liwei Ou, Yang Yu","doi":"10.1142/s0218194022500656","DOIUrl":null,"url":null,"abstract":"Being a critical factor affecting the maintainability and reusability of the software, code readability is growing crucial in modern software development, where a metric for classifying code readability levels is both applicable and desired. However, most prior research has treated code readability classification as a binary classification task due to the lack of labeled data. To support the training of multi-class code readability classification models, we propose an enhanced data augmentation approach that could be used to generate sufficient readability data and well train a multi-class code readability model. The approach includes the use of domain-specific data transformation and GAN-based data augmentation. We conduct a series of experiments to verify our augmentation approach and gain a state-of-the-art multi-class code readability classification performance with 69.5% Micro-F1, 54.0% Macro-F1 and 67.7% Macro-AUC. Compared to the results where no augmented data is used, the improvements on Micro-F1, Macro-F1 and Macro-AUC are significant with 6.9%, 11.3% and 11.2%, respectively. As an innovative work of proposing multi-class code readability classification and an enhanced code readability data augmentation approach, our method is proved to be effective.","PeriodicalId":50288,"journal":{"name":"International Journal of Software Engineering and Knowledge Engineering","volume":"20 1","pages":"1709-1731"},"PeriodicalIF":0.6000,"publicationDate":"2022-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Software Engineering and Knowledge Engineering","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1142/s0218194022500656","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Being a critical factor affecting the maintainability and reusability of the software, code readability is growing crucial in modern software development, where a metric for classifying code readability levels is both applicable and desired. However, most prior research has treated code readability classification as a binary classification task due to the lack of labeled data. To support the training of multi-class code readability classification models, we propose an enhanced data augmentation approach that could be used to generate sufficient readability data and well train a multi-class code readability model. The approach includes the use of domain-specific data transformation and GAN-based data augmentation. We conduct a series of experiments to verify our augmentation approach and gain a state-of-the-art multi-class code readability classification performance with 69.5% Micro-F1, 54.0% Macro-F1 and 67.7% Macro-AUC. Compared to the results where no augmented data is used, the improvements on Micro-F1, Macro-F1 and Macro-AUC are significant with 6.9%, 11.3% and 11.2%, respectively. As an innovative work of proposing multi-class code readability classification and an enhanced code readability data augmentation approach, our method is proved to be effective.

查看原文本刊更多论文

用增强的数据增强方法改进多类代码可读性分类(130)

作为影响软件可维护性和可重用性的关键因素，代码可读性在现代软件开发中变得越来越重要，在现代软件开发中，对代码可读性级别进行分类的度量既适用又需要。然而，由于缺乏标记数据，大多数先前的研究都将代码可读性分类视为一种二元分类任务。为了支持多类代码可读性分类模型的训练，我们提出了一种增强的数据增强方法，该方法可以生成足够的可读性数据并很好地训练多类代码可读性模型。该方法包括使用特定于领域的数据转换和基于gan的数据增强。我们进行了一系列实验来验证我们的增强方法，并获得了最先进的多类代码可读性分类性能，Micro-F1为69.5%，Macro-F1为54.0%，Macro-AUC为67.7%。与不使用增强数据的结果相比，Micro-F1、Macro-F1和Macro-AUC的改进效果显著，分别为6.9%、11.3%和11.2%。作为提出多类代码可读性分类和增强代码可读性数据增强方法的创新工作，我们的方法被证明是有效的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Software Engineering and Knowledge Engineering 工程技术-工程：电子与电气

CiteScore

1.90

自引率

11.10%

发文量

审稿时长

16 months

期刊介绍： The International Journal of Software Engineering and Knowledge Engineering is intended to serve as a forum for researchers, practitioners, and developers to exchange ideas and results for the advancement of software engineering and knowledge engineering. Three types of papers will be published: Research papers reporting original research results Technology trend surveys reviewing an area of research in software engineering and knowledge engineering Survey articles surveying a broad area in software engineering and knowledge engineering In addition, tool reviews (no more than three manuscript pages) and book reviews (no more than two manuscript pages) are also welcome. A central theme of this journal is the interplay between software engineering and knowledge engineering: how knowledge engineering methods can be applied to software engineering, and vice versa. The journal publishes papers in the areas of software engineering methods and practices, object-oriented systems, rapid prototyping, software reuse, cleanroom software engineering, stepwise refinement/enhancement, formal methods of specification, ambiguity in software development, impact of CASE on software development life cycle, knowledge engineering methods and practices, logic programming, expert systems, knowledge-based systems, distributed knowledge-based systems, deductive database systems, knowledge representations, knowledge-based systems in language translation & processing, software and knowledge-ware maintenance, reverse engineering in software design, and applications in various domains of interest.