Automatic extraction of glossary terms from natural language requirements

2013 21st IEEE International Requirements Engineering Conference (RE) Pub Date : 2013-07-15 DOI:10.1109/RE.2013.6636736

Anurag Dwarakanath, Roshni Ramnani, Shubhashis Sengupta

引用次数: 34

Abstract

We present a method for the automatic extraction of glossary terms from unconstrained natural language requirements. The glossary terms are identified in two steps - a) compute units (which are candidates for glossary terms) b) disambiguate between the mutually exclusive units to identify terms. We introduce novel linguistic techniques to identify process nouns, abstract nouns and auxiliary verbs. The identification of units also handles co-ordinating conjunctions and adjectival modifiers. This requires solving co-ordination ambiguity and adjectival modifier ambiguity. The identification of terms among the units adapts an in-document statistical metric. We present an evaluation of our method over a real-life set of software requirements' documents and compare our results with that of a base algorithm. The intricate linguistic classification and the tackling of ambiguity result in superior performance of our approach over the base algorithm.

查看原文本刊更多论文

从自然语言需求中自动提取术语表术语

提出了一种从无约束的自然语言需求中自动提取词汇表术语的方法。术语表术语的识别分两个步骤—a)计算单元(它们是术语表术语的候选单位)b)消除互斥单元之间的歧义以识别术语。我们引入新的语言技术来识别过程名词、抽象名词和助动词。单位的识别也涉及到并列连词和形容词修饰语。这就需要解决搭配歧义和形容词修饰语歧义。单位之间的术语识别采用文件内统计度量。我们在一组真实的软件需求文档上对我们的方法进行了评估，并将我们的结果与基本算法的结果进行了比较。复杂的语言分类和歧义处理使我们的方法优于基本算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2013 21st IEEE International Requirements Engineering Conference (RE)

自引率

0.00%

发文量