Automatic extraction of glossary terms from natural language requirements

Anurag Dwarakanath, Roshni Ramnani, Shubhashis Sengupta
{"title":"Automatic extraction of glossary terms from natural language requirements","authors":"Anurag Dwarakanath, Roshni Ramnani, Shubhashis Sengupta","doi":"10.1109/RE.2013.6636736","DOIUrl":null,"url":null,"abstract":"We present a method for the automatic extraction of glossary terms from unconstrained natural language requirements. The glossary terms are identified in two steps - a) compute units (which are candidates for glossary terms) b) disambiguate between the mutually exclusive units to identify terms. We introduce novel linguistic techniques to identify process nouns, abstract nouns and auxiliary verbs. The identification of units also handles co-ordinating conjunctions and adjectival modifiers. This requires solving co-ordination ambiguity and adjectival modifier ambiguity. The identification of terms among the units adapts an in-document statistical metric. We present an evaluation of our method over a real-life set of software requirements' documents and compare our results with that of a base algorithm. The intricate linguistic classification and the tackling of ambiguity result in superior performance of our approach over the base algorithm.","PeriodicalId":6342,"journal":{"name":"2013 21st IEEE International Requirements Engineering Conference (RE)","volume":"30 1","pages":"314-319"},"PeriodicalIF":0.0000,"publicationDate":"2013-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"34","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 21st IEEE International Requirements Engineering Conference (RE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RE.2013.6636736","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 34

Abstract

We present a method for the automatic extraction of glossary terms from unconstrained natural language requirements. The glossary terms are identified in two steps - a) compute units (which are candidates for glossary terms) b) disambiguate between the mutually exclusive units to identify terms. We introduce novel linguistic techniques to identify process nouns, abstract nouns and auxiliary verbs. The identification of units also handles co-ordinating conjunctions and adjectival modifiers. This requires solving co-ordination ambiguity and adjectival modifier ambiguity. The identification of terms among the units adapts an in-document statistical metric. We present an evaluation of our method over a real-life set of software requirements' documents and compare our results with that of a base algorithm. The intricate linguistic classification and the tackling of ambiguity result in superior performance of our approach over the base algorithm.
从自然语言需求中自动提取术语表术语
提出了一种从无约束的自然语言需求中自动提取词汇表术语的方法。术语表术语的识别分两个步骤—a)计算单元(它们是术语表术语的候选单位)b)消除互斥单元之间的歧义以识别术语。我们引入新的语言技术来识别过程名词、抽象名词和助动词。单位的识别也涉及到并列连词和形容词修饰语。这就需要解决搭配歧义和形容词修饰语歧义。单位之间的术语识别采用文件内统计度量。我们在一组真实的软件需求文档上对我们的方法进行了评估,并将我们的结果与基本算法的结果进行了比较。复杂的语言分类和歧义处理使我们的方法优于基本算法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信