A Pipeline for Automating Labeling to Prediction in Classification of NFRs

2021 IEEE 29th International Requirements Engineering Conference (RE) Pub Date : 2021-09-01 DOI:10.1109/RE51729.2021.00036

Ranit Chatterjee, Abdul Ahmed, Preethu Rose Anish, B. Suman, Prashant Lawhatre, S. Ghaisas

{"title":"A Pipeline for Automating Labeling to Prediction in Classification of NFRs","authors":"Ranit Chatterjee, Abdul Ahmed, Preethu Rose Anish, B. Suman, Prashant Lawhatre, S. Ghaisas","doi":"10.1109/RE51729.2021.00036","DOIUrl":null,"url":null,"abstract":"Non-Functional Requirements (NFRs) focus on the operational constraints of the software system. Early detection of NFRs enables their incorporation into the architectural design at an initial stage, a practice obviously preferable to expensive refactoring at a later stage. Automated identification and classification of NFRs has therefore seen numerous efforts using rule-based, machine learning and deep learning-based approaches. One of the major challenges for such an automation is the manual effort that needs to be invested into labeling of training data. This is a concern for large software vendors who typically work on a variety of applications in diverse domains. We address this challenge by designing a pipeline that facilitates classification of NFRs using only a limited amount (~ 20% of an available new dataset) of labeled data for training. We (1) employed Snorkel to automatically label a dataset comprising NFRs from various Software Requirement Specification documents, (2) trained several classifiers using it, and (3) reused these pre-trained classifiers using a Transfer Learning approach to classify NFRs in industry-specific datasets. From among the various language model classifiers, the best results have been obtained for a BERT based classifier fine-tuned to learn the linguistic intricacies of three different domain-specific datasets from real-life projects.","PeriodicalId":440285,"journal":{"name":"2021 IEEE 29th International Requirements Engineering Conference (RE)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 29th International Requirements Engineering Conference (RE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RE51729.2021.00036","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Non-Functional Requirements (NFRs) focus on the operational constraints of the software system. Early detection of NFRs enables their incorporation into the architectural design at an initial stage, a practice obviously preferable to expensive refactoring at a later stage. Automated identification and classification of NFRs has therefore seen numerous efforts using rule-based, machine learning and deep learning-based approaches. One of the major challenges for such an automation is the manual effort that needs to be invested into labeling of training data. This is a concern for large software vendors who typically work on a variety of applications in diverse domains. We address this challenge by designing a pipeline that facilitates classification of NFRs using only a limited amount (~ 20% of an available new dataset) of labeled data for training. We (1) employed Snorkel to automatically label a dataset comprising NFRs from various Software Requirement Specification documents, (2) trained several classifiers using it, and (3) reused these pre-trained classifiers using a Transfer Learning approach to classify NFRs in industry-specific datasets. From among the various language model classifiers, the best results have been obtained for a BERT based classifier fine-tuned to learn the linguistic intricacies of three different domain-specific datasets from real-life projects.

查看原文本刊更多论文

NFRs分类中自动标注到预测的流水线

非功能需求(nfr)关注软件系统的操作约束。早期检测nfr可以在初始阶段将它们合并到架构设计中，这种做法显然比在后期进行昂贵的重构更可取。因此，使用基于规则、机器学习和深度学习的方法对非自然灾害的自动识别和分类进行了大量的努力。这种自动化的主要挑战之一是需要投入人工工作来标记训练数据。这是大型软件供应商所关心的问题，他们通常在不同领域中处理各种应用程序。我们通过设计一个管道来解决这一挑战，该管道仅使用有限数量(约占可用新数据集的20%)的标记数据进行训练，从而促进nfr的分类。我们(1)使用Snorkel自动标记包含来自各种软件需求规范文档的nfr的数据集，(2)使用它训练几个分类器，(3)使用迁移学习方法重用这些预训练的分类器，对行业特定数据集中的nfr进行分类。在各种语言模型分类器中，基于BERT的分类器获得了最好的结果，该分类器经过微调，可以从现实项目中学习三种不同领域特定数据集的语言复杂性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE 29th International Requirements Engineering Conference (RE)

自引率

0.00%

发文量