Identifying named entities on a University intranet

M. Althobaiti, Udo Kruschwitz, Massimo Poesio
{"title":"Identifying named entities on a University intranet","authors":"M. Althobaiti, Udo Kruschwitz, Massimo Poesio","doi":"10.1109/CEEC.2012.6375385","DOIUrl":null,"url":null,"abstract":"Named entities (NEs) are textual references via proper names, such aspeople names, company names, places and so on. The importance of NEs has been observed in intranet search engines, including university web sites. In this paper, a mechanism is built exclusively to recognize the three named entities, which are constantly referenced in the University of Essex domain: names, course codes, and room numbers. While a person name is considered a common named entity, course codes and room numbers are specific to the University domain. We developed a technique specifically to train three different classifiers on electronic corpora, consisting of 16,629 examples in total, which were collected and annotated manually from the University domain. The resulting models were then incorporated into the NER system that was built to use pre-trained classifiers in the detection process, mark these NEs, and cross-reference them to the related documents. The proposed method performed well on a test corpus, with the average precision reaching nearly 0.97. The recall varied, but was lower overall than precision with an average of 0.82. Moreover, in terms of name recognition in the University domain, our system outperformed two other systems: the OpenNLP name finder and ANNIE system.","PeriodicalId":142286,"journal":{"name":"2012 4th Computer Science and Electronic Engineering Conference (CEEC)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 4th Computer Science and Electronic Engineering Conference (CEEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CEEC.2012.6375385","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Named entities (NEs) are textual references via proper names, such aspeople names, company names, places and so on. The importance of NEs has been observed in intranet search engines, including university web sites. In this paper, a mechanism is built exclusively to recognize the three named entities, which are constantly referenced in the University of Essex domain: names, course codes, and room numbers. While a person name is considered a common named entity, course codes and room numbers are specific to the University domain. We developed a technique specifically to train three different classifiers on electronic corpora, consisting of 16,629 examples in total, which were collected and annotated manually from the University domain. The resulting models were then incorporated into the NER system that was built to use pre-trained classifiers in the detection process, mark these NEs, and cross-reference them to the related documents. The proposed method performed well on a test corpus, with the average precision reaching nearly 0.97. The recall varied, but was lower overall than precision with an average of 0.82. Moreover, in terms of name recognition in the University domain, our system outperformed two other systems: the OpenNLP name finder and ANNIE system.
识别大学内部网上的命名实体
命名实体(NEs)是通过专有名称(如人名、公司名称、地点等)进行的文本引用。网元的重要性已在包括大学网站在内的内部网搜索引擎中观察到。在本文中,专门建立了一种机制来识别在埃塞克斯大学域中不断引用的三个命名实体:名称、课程代码和房间号码。虽然人名被认为是一个通用的命名实体,但课程代码和房间号码是特定于大学域的。我们开发了一种专门用于在电子语料库上训练三种不同分类器的技术,该语料库共包含16,629个示例,这些示例是从大学领域手动收集和注释的。然后将得到的模型合并到NER系统中,该系统在检测过程中使用预训练的分类器,标记这些ne,并将它们交叉引用到相关文档中。该方法在测试语料上表现良好,平均精度接近0.97。召回率各不相同,但总体上低于准确率,平均为0.82。此外,在大学领域的名称识别方面,我们的系统优于其他两个系统:OpenNLP名称查找器和ANNIE系统。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信