DRIP: Segmenting individual requirements from software requirement documents

Ziyan Zhao, Li Zhang, Xiaoli Lian, Heyang Lv
{"title":"DRIP: Segmenting individual requirements from software requirement documents","authors":"Ziyan Zhao, Li Zhang, Xiaoli Lian, Heyang Lv","doi":"10.1002/spe.3303","DOIUrl":null,"url":null,"abstract":"Numerous academic research projects and industrial tasks related to software engineering require individual requirements as input. Unfortunately, according to our observation, several requirements may be packed in one paragraph without explicit boundaries in specification documents. To understand this problem's prevalence, we performed a preliminary study on the open requirement documents widely used in the academic community over the last 10 years, and found that 26% of them include this phenomenon. Several text segmentation approaches have been reported; however, they tend to identify topically coherent units which may contain more than one requirement. What is more, they do not take the constitutions of semantic units of requirements into consideration. Here we report a two-phase learning-based approach named DRIP to segment individual requirements from paragraphs. To be specific, we first propose a Requirement Segmentation Siamese framework, which models the similarity of sentences and their conjunction relations, and then detects the initial boundaries between individual requirements. Then, we optimize the boundaries heuristically based on the semantic completeness validation of the segments. Experiments with 1132 paragraphs and 6826 sentences show that DRIP outperforms the popular unsupervised and supervised text segmentation algorithms with respect to processing different documents (with accuracy gains of 57.65%–187.53%) and processing paragraphs of different complexity (with average accuracy gains of 54.46%–158.68%). We also show the importance of each component of DRIP to the segmentation.","PeriodicalId":21899,"journal":{"name":"Software: Practice and Experience","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Software: Practice and Experience","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/spe.3303","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Numerous academic research projects and industrial tasks related to software engineering require individual requirements as input. Unfortunately, according to our observation, several requirements may be packed in one paragraph without explicit boundaries in specification documents. To understand this problem's prevalence, we performed a preliminary study on the open requirement documents widely used in the academic community over the last 10 years, and found that 26% of them include this phenomenon. Several text segmentation approaches have been reported; however, they tend to identify topically coherent units which may contain more than one requirement. What is more, they do not take the constitutions of semantic units of requirements into consideration. Here we report a two-phase learning-based approach named DRIP to segment individual requirements from paragraphs. To be specific, we first propose a Requirement Segmentation Siamese framework, which models the similarity of sentences and their conjunction relations, and then detects the initial boundaries between individual requirements. Then, we optimize the boundaries heuristically based on the semantic completeness validation of the segments. Experiments with 1132 paragraphs and 6826 sentences show that DRIP outperforms the popular unsupervised and supervised text segmentation algorithms with respect to processing different documents (with accuracy gains of 57.65%–187.53%) and processing paragraphs of different complexity (with average accuracy gains of 54.46%–158.68%). We also show the importance of each component of DRIP to the segmentation.
DRIP:从软件需求文档中分离出个性化需求
许多与软件工程相关的学术研究项目和工业任务都需要将单个需求作为输入。遗憾的是,根据我们的观察,在规范文档中,多个需求可能被打包在一个段落中,而没有明确的界限。为了了解这一问题的普遍性,我们对过去 10 年中学术界广泛使用的开放式需求文档进行了初步研究,发现其中 26% 的文档存在这种现象。目前已经报道了几种文本分割方法,但这些方法倾向于识别可能包含一个以上需求的拓扑连贯单元。此外,它们也没有考虑到需求语义单元的构成。在此,我们报告了一种名为 DRIP 的基于学习的两阶段方法,用于从段落中分割出单个需求。具体来说,我们首先提出了一个 "需求分割连体框架"(Requirement Segmentation Siamese Framework),该框架对句子及其连接关系的相似性进行建模,然后检测单个需求之间的初始边界。然后,我们根据分段的语义完整性验证,启发式地优化边界。对 1132 个段落和 6826 个句子的实验表明,在处理不同文档(准确率提高了 57.65%-187.53%)和处理不同复杂度段落(平均准确率提高了 54.46%-158.68%)方面,DRIP 优于流行的无监督和有监督文本分割算法。我们还展示了 DRIP 各组成部分对分段的重要性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信