Ranit Chatterjee, Abdul Ahmed, Preethu Rose Anish, B. Suman, Prashant Lawhatre, S. Ghaisas
{"title":"A Pipeline for Automating Labeling to Prediction in Classification of NFRs","authors":"Ranit Chatterjee, Abdul Ahmed, Preethu Rose Anish, B. Suman, Prashant Lawhatre, S. Ghaisas","doi":"10.1109/RE51729.2021.00036","DOIUrl":null,"url":null,"abstract":"Non-Functional Requirements (NFRs) focus on the operational constraints of the software system. Early detection of NFRs enables their incorporation into the architectural design at an initial stage, a practice obviously preferable to expensive refactoring at a later stage. Automated identification and classification of NFRs has therefore seen numerous efforts using rule-based, machine learning and deep learning-based approaches. One of the major challenges for such an automation is the manual effort that needs to be invested into labeling of training data. This is a concern for large software vendors who typically work on a variety of applications in diverse domains. We address this challenge by designing a pipeline that facilitates classification of NFRs using only a limited amount (~ 20% of an available new dataset) of labeled data for training. We (1) employed Snorkel to automatically label a dataset comprising NFRs from various Software Requirement Specification documents, (2) trained several classifiers using it, and (3) reused these pre-trained classifiers using a Transfer Learning approach to classify NFRs in industry-specific datasets. From among the various language model classifiers, the best results have been obtained for a BERT based classifier fine-tuned to learn the linguistic intricacies of three different domain-specific datasets from real-life projects.","PeriodicalId":440285,"journal":{"name":"2021 IEEE 29th International Requirements Engineering Conference (RE)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 29th International Requirements Engineering Conference (RE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RE51729.2021.00036","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Non-Functional Requirements (NFRs) focus on the operational constraints of the software system. Early detection of NFRs enables their incorporation into the architectural design at an initial stage, a practice obviously preferable to expensive refactoring at a later stage. Automated identification and classification of NFRs has therefore seen numerous efforts using rule-based, machine learning and deep learning-based approaches. One of the major challenges for such an automation is the manual effort that needs to be invested into labeling of training data. This is a concern for large software vendors who typically work on a variety of applications in diverse domains. We address this challenge by designing a pipeline that facilitates classification of NFRs using only a limited amount (~ 20% of an available new dataset) of labeled data for training. We (1) employed Snorkel to automatically label a dataset comprising NFRs from various Software Requirement Specification documents, (2) trained several classifiers using it, and (3) reused these pre-trained classifiers using a Transfer Learning approach to classify NFRs in industry-specific datasets. From among the various language model classifiers, the best results have been obtained for a BERT based classifier fine-tuned to learn the linguistic intricacies of three different domain-specific datasets from real-life projects.