2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE)最新文献

筛选
英文 中文
GitHub Issue Classification Using BERT-Style Models GitHub发布使用bert风格模型的分类
Shikhar Bharadwaj, Tushar Kadam
{"title":"GitHub Issue Classification Using BERT-Style Models","authors":"Shikhar Bharadwaj, Tushar Kadam","doi":"10.1145/3528588.3528663","DOIUrl":"https://doi.org/10.1145/3528588.3528663","url":null,"abstract":"Recent innovations in natural language processing techniques have led to the development of various tools for assisting software developers. This paper provides a report of our proposed solution to the issue report classification task from the NL-Based Software Engineering workshop. We approach the task of classifying issues on GitHub repositories using BERT-style models [1, 2, 6, 8] We propose a neural architecture for the problem that utilizes contextual embeddings for the text content in the GitHub issues. Besides, we design additional features for the classification task. We perform a thorough ablation analysis of the designed features and benchmark various BERT-style models for generating textual embeddings. Our proposed solution performs better than the competition organizer’s method and achieves an F1 score of 0.8653. Our code and trained models are available at https://github.com/Kadam-Tushar/Issue-Classifier.","PeriodicalId":313397,"journal":{"name":"2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115180641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
On the Evaluation of NLP-based Models for Software Engineering 基于nlp的软件工程模型评价研究
M. Izadi, Matin Nili Ahmadabadi
{"title":"On the Evaluation of NLP-based Models for Software Engineering","authors":"M. Izadi, Matin Nili Ahmadabadi","doi":"10.1145/3528588.3528665","DOIUrl":"https://doi.org/10.1145/3528588.3528665","url":null,"abstract":"NLP-based models have been increasingly incorporated to address SE problems. These models are either employed in the SE domain with little to no change, or they are greatly tailored to source code and its unique characteristics. Many of these approaches are considered to be outperforming or complementing existing solutions. However, an important question arises here: Are these models evaluated fairly and consistently in the SE community?. To answer this question, we reviewed how NLP-based models for SE problems are being evaluated by researchers. The findings indicate that currently there is no consistent and widely-accepted protocol for the evaluation of these models. While different aspects of the same task are being assessed in different studies, metrics are defined based on custom choices, rather than a system, and finally, answers are collected and interpreted case by case. Consequently, there is a dire need to provide a methodological way of evaluating NLP-based models to have a consistent assessment and preserve the possibility of fair and efficient comparison.","PeriodicalId":313397,"journal":{"name":"2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124443121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
CatIss: An Intelligent Tool for Categorizing Issues Reports using Transformers catatis:一个使用变压器对问题报告进行分类的智能工具
M. Izadi
{"title":"CatIss: An Intelligent Tool for Categorizing Issues Reports using Transformers","authors":"M. Izadi","doi":"10.1145/3528588.3528662","DOIUrl":"https://doi.org/10.1145/3528588.3528662","url":null,"abstract":"Users use Issue Tracking Systems to keep track and manage issue reports in their repositories. An issue is a rich source of software information that contains different reports including a problem, a request for new features, or merely a question about the software product. As the number of these issues increases, it becomes harder to manage them manually. Thus, automatic approaches are proposed to help facilitate the management of issue reports. This paper describes CatIss, an automatic Categorizer of Issue reports which is built upon the Transformer-based pre-trained RoBERTa model. CatIss classifies issue reports into three main categories of Bug report, Enhancement/feature request, and Question. First, the datasets provided for the NLBSE tool competition are cleaned and preprocessed. Then, the pre-trained RoBERTa model is fine-tuned on the preprocessed dataset. Evaluating CatIss on about 80 thousand issue reports from GitHub, indicates that it performs very well surpassing the competition baseline, TicketTagger, and achieving 87.2% F1-score (micro average). Additionally, as CatIss is trained on a wide set of repositories, it is a generic prediction model, hence applicable for any unseen software project or projects with little historical data. Scripts for cleaning the datasets, training CatIss and evaluating the model are publicly available. 1","PeriodicalId":313397,"journal":{"name":"2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131764525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Can NMT Understand Me? Towards Perturbation-based Evaluation of NMT Models for Code Generation NMT能理解我吗?基于微扰的代码生成NMT模型评价
Pietro Liguori, Cristina Improta, S. D. Vivo, R. Natella, B. Cukic, Domenico Cotroneo
{"title":"Can NMT Understand Me? Towards Perturbation-based Evaluation of NMT Models for Code Generation","authors":"Pietro Liguori, Cristina Improta, S. D. Vivo, R. Natella, B. Cukic, Domenico Cotroneo","doi":"10.1145/3528588.3528653","DOIUrl":"https://doi.org/10.1145/3528588.3528653","url":null,"abstract":"Neural Machine Translation (NMT) has reached a level of maturity to be recognized as the premier method for the translation between different languages and aroused interest in different research areas, including software engineering. A key step to validate the robustness of the NMT models consists in evaluating the performance of the models on adversarial inputs, i.e., inputs obtained from the original ones by adding small amounts of perturbation. However, when dealing with the specific task of the code generation (i.e., the generation of code starting from a description in natural language), it has not yet been defined an approach to validate the robustness of the NMT models. In this work, we address the problem by identifying a set of perturbations and metrics tailored for the robustness assessment of such models. We present a preliminary experimental evaluation, showing what type of perturbations affect the model the most and deriving useful insights for future directions.","PeriodicalId":313397,"journal":{"name":"2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131902651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Understanding Digits in Identifier Names: An Exploratory Study 理解标识符名称中的数字:一项探索性研究
Anthony Peruma, Christian D. Newman
{"title":"Understanding Digits in Identifier Names: An Exploratory Study","authors":"Anthony Peruma, Christian D. Newman","doi":"10.1145/3528588.3528657","DOIUrl":"https://doi.org/10.1145/3528588.3528657","url":null,"abstract":"Before any software maintenance can occur, developers must read the identifier names found in the code to be maintained. Thus, high-quality identifier names are essential for productive program comprehension and maintenance activities. With developers free to construct identifier names to their liking, it can be difficult to automatically reason about the quality and semantics behind an identifier name. Studying the structure of identifier names can help alleviate this problem. Existing research focuses on studying words within identifiers, but there are other symbols that appear in identifier names–such as digits. This paper explores the presence and purpose of digits in identifier names through an empirical study of 800 open-source Java systems. We study how digits contribute to the semantics of identifier names and how identifier names that contain digits evolve over time through renaming. We envision our findings improving the efficiency of name appraisal and recommendation tools and techniques.","PeriodicalId":313397,"journal":{"name":"2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE)","volume":"308 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114415608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信