Using Domain-Specific Corpora for Improved Handling of Ambiguity in Requirements

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE) Pub Date : 2021-05-01 DOI:10.1109/ICSE43902.2021.00133

Saad Ezzini, Sallam Abualhaija, Chetan Arora, M. Sabetzadeh, L. Briand

{"title":"Using Domain-Specific Corpora for Improved Handling of Ambiguity in Requirements","authors":"Saad Ezzini, Sallam Abualhaija, Chetan Arora, M. Sabetzadeh, L. Briand","doi":"10.1109/ICSE43902.2021.00133","DOIUrl":null,"url":null,"abstract":"Ambiguity in natural-language requirements is a pervasive issue that has been studied by the requirements engineering community for more than two decades. A fully manual approach for addressing ambiguity in requirements is tedious and time-consuming, and may further overlook unacknowledged ambiguity – the situation where different stakeholders perceive a requirement as unambiguous but, in reality, interpret the requirement differently. In this paper, we propose an automated approach that uses natural language processing for handling ambiguity in requirements. Our approach is based on the automatic generation of a domain-specific corpus from Wikipedia. Integrating domain knowledge, as we show in our evaluation, leads to a significant positive improvement in the accuracy of ambiguity detection and interpretation. We scope our work to coordination ambiguity (CA) and prepositional-phrase attachment ambiguity (PAA) because of the prevalence of these types of ambiguity in natural-language requirements [1]. We evaluate our approach on 20 industrial requirements documents. These documents collectively contain more than 5000 requirements from seven distinct application domains. Over this dataset, our approach detects CA and PAA with an average precision of 80% and an average recall of 89% (90% for cases of unacknowledged ambiguity). The automatic interpretations that our approach yields have an average accuracy of 85%. Compared to baselines that use generic corpora, our approach, which uses domain-specific corpora, has 33% better accuracy in ambiguity detection and 16% better accuracy in interpretation.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"28","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSE43902.2021.00133","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 28

Abstract

Ambiguity in natural-language requirements is a pervasive issue that has been studied by the requirements engineering community for more than two decades. A fully manual approach for addressing ambiguity in requirements is tedious and time-consuming, and may further overlook unacknowledged ambiguity – the situation where different stakeholders perceive a requirement as unambiguous but, in reality, interpret the requirement differently. In this paper, we propose an automated approach that uses natural language processing for handling ambiguity in requirements. Our approach is based on the automatic generation of a domain-specific corpus from Wikipedia. Integrating domain knowledge, as we show in our evaluation, leads to a significant positive improvement in the accuracy of ambiguity detection and interpretation. We scope our work to coordination ambiguity (CA) and prepositional-phrase attachment ambiguity (PAA) because of the prevalence of these types of ambiguity in natural-language requirements [1]. We evaluate our approach on 20 industrial requirements documents. These documents collectively contain more than 5000 requirements from seven distinct application domains. Over this dataset, our approach detects CA and PAA with an average precision of 80% and an average recall of 89% (90% for cases of unacknowledged ambiguity). The automatic interpretations that our approach yields have an average accuracy of 85%. Compared to baselines that use generic corpora, our approach, which uses domain-specific corpora, has 33% better accuracy in ambiguity detection and 16% better accuracy in interpretation.

查看原文本刊更多论文

使用领域特定语料库改进需求中歧义的处理

自然语言需求中的歧义是一个普遍存在的问题，需求工程团体已经研究了二十多年。处理需求中的模糊性的完全手工方法是冗长且耗时的，并且可能进一步忽略未被承认的模糊性——不同涉众认为需求是明确的，但实际上对需求的解释不同的情况。在本文中，我们提出了一种使用自然语言处理来处理需求歧义的自动化方法。我们的方法是基于维基百科中特定领域语料库的自动生成。正如我们在评估中所显示的那样，整合领域知识可以显著提高歧义检测和解释的准确性。我们将工作范围扩大到协调歧义(CA)和介词短语连接歧义(PAA)，因为这些类型的歧义在自然语言需求中很普遍[1]。我们对20个工业需求文件评估了我们的方法。这些文档总共包含了来自7个不同应用领域的5000多个需求。在这个数据集上，我们的方法检测CA和PAA的平均精度为80%，平均召回率为89%(未确认歧义的情况下为90%)。我们的方法产生的自动解释平均准确率为85%。与使用通用语料库的基线相比，我们使用领域特定语料库的方法在歧义检测方面的准确率提高了33%，在解释方面的准确率提高了16%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

自引率

0.00%

发文量