DeNom:一个使用NLP发现有问题的名词化的工具

2015 IEEE Second International Workshop on Artificial Intelligence for Requirements Engineering (AIRE) Pub Date : 2015-08-24 DOI:10.1109/AIRE.2015.7337623

Mathias Landhäußer, Sven J. Körner, W. Tichy, Jan Keim, J. Krisch

{"title":"DeNom:一个使用NLP发现有问题的名词化的工具","authors":"Mathias Landhäußer, Sven J. Körner, W. Tichy, Jan Keim, J. Krisch","doi":"10.1109/AIRE.2015.7337623","DOIUrl":null,"url":null,"abstract":"Nominalizations in natural language requirements specifications can lead to imprecision. For example, in the phrase \"transportation of pallets\" it is unclear who transports the pallets from where to where and how. Guidelines for requirements specifications therefore recommend avoiding nominalizations. However, not all nominalizations are problematic. We present an industrial-strength text analysis tool called DeNom, which detects problematic nominalizations and reports them to the user for reformulation. DeNom uses Stanford's parser and the Cyc ontology. It classifies nominalizations as problematic or acceptable by first detecting all nominalizations in the specification and then subtracting those which are sufficiently specified within the sentence through word references, attributes, nominal phrase constructions, etc. All remaining nominalizations are incompletely specified, and are therefore prone to conceal complex processes. These nominalizations are deemed problematic. A thorough evaluation used 10 real-world requirements specifications from Daimler AG consisting of 60,000 words. DeNom identified over 1,100 nominalizations and classified 129 of them as problematic. Only 45 of which were false positives, resulting in a precision of 66%. Recall was 88%. In contrast, a naive nominalization detector would overload the user with 1,100 warnings, a thousand of which would be false positives.","PeriodicalId":320862,"journal":{"name":"2015 IEEE Second International Workshop on Artificial Intelligence for Requirements Engineering (AIRE)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"DeNom: a tool to find problematic nominalizations using NLP\",\"authors\":\"Mathias Landhäußer, Sven J. Körner, W. Tichy, Jan Keim, J. Krisch\",\"doi\":\"10.1109/AIRE.2015.7337623\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Nominalizations in natural language requirements specifications can lead to imprecision. For example, in the phrase \\\"transportation of pallets\\\" it is unclear who transports the pallets from where to where and how. Guidelines for requirements specifications therefore recommend avoiding nominalizations. However, not all nominalizations are problematic. We present an industrial-strength text analysis tool called DeNom, which detects problematic nominalizations and reports them to the user for reformulation. DeNom uses Stanford's parser and the Cyc ontology. It classifies nominalizations as problematic or acceptable by first detecting all nominalizations in the specification and then subtracting those which are sufficiently specified within the sentence through word references, attributes, nominal phrase constructions, etc. All remaining nominalizations are incompletely specified, and are therefore prone to conceal complex processes. These nominalizations are deemed problematic. A thorough evaluation used 10 real-world requirements specifications from Daimler AG consisting of 60,000 words. DeNom identified over 1,100 nominalizations and classified 129 of them as problematic. Only 45 of which were false positives, resulting in a precision of 66%. Recall was 88%. In contrast, a naive nominalization detector would overload the user with 1,100 warnings, a thousand of which would be false positives.\",\"PeriodicalId\":320862,\"journal\":{\"name\":\"2015 IEEE Second International Workshop on Artificial Intelligence for Requirements Engineering (AIRE)\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-08-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 IEEE Second International Workshop on Artificial Intelligence for Requirements Engineering (AIRE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/AIRE.2015.7337623\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE Second International Workshop on Artificial Intelligence for Requirements Engineering (AIRE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AIRE.2015.7337623","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

自然语言需求规范中的名词化可能导致不精确。例如，在短语“托盘运输”中，不清楚是谁将托盘从哪里运输到哪里以及如何运输。因此，需求规范指南建议避免使用名词化。然而，并不是所有的名词化都有问题。我们提出了一种工业强度的文本分析工具，称为DeNom，它可以检测有问题的名词化并将其报告给用户以进行重新表述。DeNom使用斯坦福的解析器和Cyc本体。它首先通过检测规范中的所有名词化，然后通过单词引用、属性、名词短语结构等在句子中减去那些充分规定的名词化，从而将名词化分类为有问题的或可接受的。所有剩下的名词化都是不完全指定的，因此容易隐藏复杂的过程。这些名词化被认为有问题。全面的评估使用了戴姆勒公司的10个实际需求规范，共计6万字。DeNom确定了1100多个名词化，并将其中129个归类为有问题的。其中只有45个是假阳性，准确率为66%。回忆率为88%。相比之下，朴素的名词化检测器会向用户发出1100个警告，其中1000个是误报。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

DeNom: a tool to find problematic nominalizations using NLP

Nominalizations in natural language requirements specifications can lead to imprecision. For example, in the phrase "transportation of pallets" it is unclear who transports the pallets from where to where and how. Guidelines for requirements specifications therefore recommend avoiding nominalizations. However, not all nominalizations are problematic. We present an industrial-strength text analysis tool called DeNom, which detects problematic nominalizations and reports them to the user for reformulation. DeNom uses Stanford's parser and the Cyc ontology. It classifies nominalizations as problematic or acceptable by first detecting all nominalizations in the specification and then subtracting those which are sufficiently specified within the sentence through word references, attributes, nominal phrase constructions, etc. All remaining nominalizations are incompletely specified, and are therefore prone to conceal complex processes. These nominalizations are deemed problematic. A thorough evaluation used 10 real-world requirements specifications from Daimler AG consisting of 60,000 words. DeNom identified over 1,100 nominalizations and classified 129 of them as problematic. Only 45 of which were false positives, resulting in a precision of 66%. Recall was 88%. In contrast, a naive nominalization detector would overload the user with 1,100 warnings, a thousand of which would be false positives.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2015 IEEE Second International Workshop on Artificial Intelligence for Requirements Engineering (AIRE)

自引率

0.00%

发文量