2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE)最新文献

Story Point Level Classification by Text Level Graph Neural Network 基于文本级图神经网络的故事点级别分类

2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE) Pub Date : 2022-05-01 DOI: 10.1145/3528588.3528654

H. Phan, A. Jannesari

{"title":"Story Point Level Classification by Text Level Graph Neural Network","authors":"H. Phan, A. Jannesari","doi":"10.1145/3528588.3528654","DOIUrl":"https://doi.org/10.1145/3528588.3528654","url":null,"abstract":"Estimating the software projects’ efforts developed by agile methods is important for project managers or technical leads. It provides a summary as a first view of how many hours and developers are required to complete the tasks. There are research works on automatic predicting the software efforts, including Term Frequency - Inverse Document Frequency (TFIDF) as the traditional approach for this problem. Graph Neural Network is a new approach that has been applied in Natural Language Processing for text classification. The advantages of Graph Neural Network are based on the ability to learn information via graph data structure, which has more representations such as the relationships between words compared to approaches of vectorizing sequence of words. In this paper, we show the potential and possible challenges of Graph Neural Network text classification in story point level estimation. By the experiments, we show that the GNN Text Level Classification can achieve as high accuracy as about 80% for story points level classification, which is comparable to the traditional approach. We also analyze the GNN approach and point out several current disadvantages that the GNN approach can improve for this problem or other problems in software engineering.","PeriodicalId":313397,"journal":{"name":"2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE)","volume":"329 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115874168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

BERT-Based GitHub Issue Report Classification 基于bert的GitHub问题报告分类

2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE) Pub Date : 2022-05-01 DOI: 10.1145/3528588.3528660

Mohammed Latif Siddiq, Joanna C. S. Santos

引用次数: 13

Automatic Identification of Informative Code in Stack Overflow Posts 堆栈溢出岗位信息码的自动识别

2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE) Pub Date : 2022-05-01 DOI: 10.1145/3528588.3528656

Preetha Chatterjee

引用次数: 2

NLBSE’22 Tool Competition NLBSE ' 22工具竞赛

2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE) Pub Date : 2022-05-01 DOI: 10.1145/3528588.3528664

Rafael Kallis, Oscar Chaparro, Andrea Di Sorbo, Sebastiano Panichella

引用次数: 13

From Zero to Hero: Generating Training Data for Question-To-Cypher Models 从零到英雄:生成问题到密码模型的训练数据

2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE) Pub Date : 2022-05-01 DOI: 10.1145/3528588.3528655

Dominik Opitz, N. Hochgeschwender

引用次数: 21

Predicting Issue Types with seBERT 用seBERT预测问题类型

2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE) Pub Date : 2022-05-01 DOI: 10.1145/3528588.3528661

Alexander Trautsch, S. Herbold

引用次数: 8

Unsupervised Extreme Multi Label Classification of Stack Overflow Posts 堆栈溢出岗位的无监督极端多标签分类

2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE) Pub Date : 2022-05-01 DOI: 10.1145/3528588.3528652

Peter Devine, Kelly Blincoe

{"title":"Unsupervised Extreme Multi Label Classification of Stack Overflow Posts","authors":"Peter Devine, Kelly Blincoe","doi":"10.1145/3528588.3528652","DOIUrl":"https://doi.org/10.1145/3528588.3528652","url":null,"abstract":"Knowing the topics of a software forum post, such as those on StackOverflow, allows for greater analysis and understanding of the large amounts of data that come from these communities. One approach to this problem is using extreme multi label classification (XMLC) to predict the topic (or “tag”) of a post from a potentially very large candidate label set. While previous work has trained these models on data which has explicit text-to-tag information, we assess the classification ability of embedding models which have not been trained using such structured data (and are thus “unsupervised”) to assess the potential applicability to other forums or domains in which tag data is not available.We evaluate 14 unsupervised pre-trained models on 0.1% of all StackOverflow posts against all 61,662 possible StackOverflow tags. We find that an MPNet model trained partially on unlabelled StackExchange data (i.e. without tag data) achieves the highest score overall for this task, with a recall score of 0.161 R@1. These results inform which models are most appropriate for use in XMLC of StackOverflow posts when supervised training is not feasible. This offers insight into these models’ applicability in similar but not identical domains, such as software product forums. These results suggest that training embedding models using in-domain title-body or question-answer pairs can create an effective zero-shot topic classifier for situations where no topic data is available.","PeriodicalId":313397,"journal":{"name":"2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124474795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Issue Report Classification Using Pre-trained Language Models 使用预训练的语言模型发布报告分类

2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE) Pub Date : 2022-05-01 DOI: 10.1145/3528588.3528659

Giuseppe Colavito, F. Lanubile, Nicole Novielli

引用次数: 7

Supporting Systematic Literature Reviews Using Deep-Learning-Based Language Models 使用基于深度学习的语言模型支持系统文献综述

2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE) Pub Date : 2022-05-01 DOI: 10.1145/3528588.3528658

Rand Alchokr, M. Borkar, Sharanya Thotadarya, G. Saake, Thomas Leich

{"title":"Supporting Systematic Literature Reviews Using Deep-Learning-Based Language Models","authors":"Rand Alchokr, M. Borkar, Sharanya Thotadarya, G. Saake, Thomas Leich","doi":"10.1145/3528588.3528658","DOIUrl":"https://doi.org/10.1145/3528588.3528658","url":null,"abstract":"Background: Systematic Literature Reviews are an important research method for gathering and evaluating the available evidence regarding a specific research topic. However, the process of conducting a Systematic Literature Review manually can be difficult and time-consuming. For this reason, researchers aim to semi-automate this process or some of its phases.Aim: We aimed at using a deep-learning based contextualized embeddings clustering technique involving transformer-based language models and a weighted scheme to accelerate the conduction phase of Systematic Literature Reviews for efficiently scanning the initial set of retrieved publications.Method: We performed an experiment using two manually conducted SLRs to evaluate the performance of two deep-learning-based clustering models. These models build on transformer-based deep language models (i.e., BERT and S-BERT) to extract contextualized embeddings on different text levels along with a weighted scheme to cluster similar publications.Results: Our primary results show that clustering based on embedding at paragraph-level using S-BERT-paragraph represents the best performing model setting in terms of optimizing the required parameters such as correctly identifying primary studies, number of additional documents identified as part of the relevant cluster and the execution time of the experiments.Conclusions: The findings indicate that using natural-language-based deep-learning architectures for semi-automating the selection of primary studies can accelerate the scanning and identification process. While our results represent first insights only, such a technique seems to enhance SLR process, promising to help researchers identify the most relevant publications more quickly and efficiently.","PeriodicalId":313397,"journal":{"name":"2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127673951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Identification of Intra-Domain Ambiguity using Transformer-based Machine Learning 基于变换的机器学习识别域内歧义

2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE) Pub Date : 2022-05-01 DOI: 10.1145/3528588.3528651

A. Moharil, Arpit Sharma

{"title":"Identification of Intra-Domain Ambiguity using Transformer-based Machine Learning","authors":"A. Moharil, Arpit Sharma","doi":"10.1145/3528588.3528651","DOIUrl":"https://doi.org/10.1145/3528588.3528651","url":null,"abstract":"Recently, the application of neural word embeddings for detecting cross-domain ambiguities in software requirements has gained a significant attention from the requirements engineering (RE) community. Several approaches have been proposed in the literature for estimating the variation of meaning of commonly used terms in different domains. A major limitation of these techniques is that they are unable to identify and detect the terms that have been used in different contexts within the same application domain, i.e. intra-domain ambiguities or in a requirements document of an interdisciplinary project. We propose an approach based on the idea of bidirectional encoder representations from Transformers (BERT) and clustering for identifying such ambiguities. For every context in which a term has been used in the document, our approach returns a list of its most similar words and also provides some example sentences from the corpus highlighting its context-specific interpretation. We apply our approach to a computer science (CS) specific corpora and a multi-domain corpora which consists of textual data from eight different application domains. Our experimental results show that this approach is very effective in identifying and detecting intra-domain ambiguities.","PeriodicalId":313397,"journal":{"name":"2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE)","volume":"19 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120888076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3