Improving Multi-Class Code Readability Classification with An Enhanced Data Augmentation Approach (130)

IF 0.6 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Qing Mi, Luo Wang, Lisha Hu, Liwei Ou, Yang Yu
{"title":"Improving Multi-Class Code Readability Classification with An Enhanced Data Augmentation Approach (130)","authors":"Qing Mi, Luo Wang, Lisha Hu, Liwei Ou, Yang Yu","doi":"10.1142/s0218194022500656","DOIUrl":null,"url":null,"abstract":"Being a critical factor affecting the maintainability and reusability of the software, code readability is growing crucial in modern software development, where a metric for classifying code readability levels is both applicable and desired. However, most prior research has treated code readability classification as a binary classification task due to the lack of labeled data. To support the training of multi-class code readability classification models, we propose an enhanced data augmentation approach that could be used to generate sufficient readability data and well train a multi-class code readability model. The approach includes the use of domain-specific data transformation and GAN-based data augmentation. We conduct a series of experiments to verify our augmentation approach and gain a state-of-the-art multi-class code readability classification performance with 69.5% Micro-F1, 54.0% Macro-F1 and 67.7% Macro-AUC. Compared to the results where no augmented data is used, the improvements on Micro-F1, Macro-F1 and Macro-AUC are significant with 6.9%, 11.3% and 11.2%, respectively. As an innovative work of proposing multi-class code readability classification and an enhanced code readability data augmentation approach, our method is proved to be effective.","PeriodicalId":50288,"journal":{"name":"International Journal of Software Engineering and Knowledge Engineering","volume":"20 1","pages":"1709-1731"},"PeriodicalIF":0.6000,"publicationDate":"2022-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Software Engineering and Knowledge Engineering","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1142/s0218194022500656","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Being a critical factor affecting the maintainability and reusability of the software, code readability is growing crucial in modern software development, where a metric for classifying code readability levels is both applicable and desired. However, most prior research has treated code readability classification as a binary classification task due to the lack of labeled data. To support the training of multi-class code readability classification models, we propose an enhanced data augmentation approach that could be used to generate sufficient readability data and well train a multi-class code readability model. The approach includes the use of domain-specific data transformation and GAN-based data augmentation. We conduct a series of experiments to verify our augmentation approach and gain a state-of-the-art multi-class code readability classification performance with 69.5% Micro-F1, 54.0% Macro-F1 and 67.7% Macro-AUC. Compared to the results where no augmented data is used, the improvements on Micro-F1, Macro-F1 and Macro-AUC are significant with 6.9%, 11.3% and 11.2%, respectively. As an innovative work of proposing multi-class code readability classification and an enhanced code readability data augmentation approach, our method is proved to be effective.
用增强的数据增强方法改进多类代码可读性分类(130)
作为影响软件可维护性和可重用性的关键因素,代码可读性在现代软件开发中变得越来越重要,在现代软件开发中,对代码可读性级别进行分类的度量既适用又需要。然而,由于缺乏标记数据,大多数先前的研究都将代码可读性分类视为一种二元分类任务。为了支持多类代码可读性分类模型的训练,我们提出了一种增强的数据增强方法,该方法可以生成足够的可读性数据并很好地训练多类代码可读性模型。该方法包括使用特定于领域的数据转换和基于gan的数据增强。我们进行了一系列实验来验证我们的增强方法,并获得了最先进的多类代码可读性分类性能,Micro-F1为69.5%,Macro-F1为54.0%,Macro-AUC为67.7%。与不使用增强数据的结果相比,Micro-F1、Macro-F1和Macro-AUC的改进效果显著,分别为6.9%、11.3%和11.2%。作为提出多类代码可读性分类和增强代码可读性数据增强方法的创新工作,我们的方法被证明是有效的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
1.90
自引率
11.10%
发文量
71
审稿时长
16 months
期刊介绍: The International Journal of Software Engineering and Knowledge Engineering is intended to serve as a forum for researchers, practitioners, and developers to exchange ideas and results for the advancement of software engineering and knowledge engineering. Three types of papers will be published: Research papers reporting original research results Technology trend surveys reviewing an area of research in software engineering and knowledge engineering Survey articles surveying a broad area in software engineering and knowledge engineering In addition, tool reviews (no more than three manuscript pages) and book reviews (no more than two manuscript pages) are also welcome. A central theme of this journal is the interplay between software engineering and knowledge engineering: how knowledge engineering methods can be applied to software engineering, and vice versa. The journal publishes papers in the areas of software engineering methods and practices, object-oriented systems, rapid prototyping, software reuse, cleanroom software engineering, stepwise refinement/enhancement, formal methods of specification, ambiguity in software development, impact of CASE on software development life cycle, knowledge engineering methods and practices, logic programming, expert systems, knowledge-based systems, distributed knowledge-based systems, deductive database systems, knowledge representations, knowledge-based systems in language translation & processing, software and knowledge-ware maintenance, reverse engineering in software design, and applications in various domains of interest.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信