Hybrid Distance-based, CNN and Bi-LSTM System for Dictionary Expansion

IF 0.9 Q4 TELECOMMUNICATIONS
Béla Benedek Szakács, T. Mészáros
{"title":"Hybrid Distance-based, CNN and Bi-LSTM System for Dictionary Expansion","authors":"Béla Benedek Szakács, T. Mészáros","doi":"10.36244/ICJ.2020.4.2","DOIUrl":null,"url":null,"abstract":"Dictionaries like Wordnet can help in a variety of Natural Language Processing applications by providing additional morphological data. They can be used in Digital Humanities research, building knowledge graphs and other applications. Creating dictionaries from large corpora of texts written in a natural language is a task that has not been a primary focus of research, as other tasks have dominated the field (such as chat-bots), but it can be a very useful tool in analysing texts. Even in the case of contemporary texts, categorizing the words according to their dictionary entry is a complex task, and for less conventional texts (in old or less researched languages) it is even harder to solve this problem automatically. Our task was to create a software that helps in expanding a dictionary containing word forms and tagging unprocessed text. We used a manually created corpus for training and testing the model. We created a combination of Bidirectional Long-Short Term Memory networks, convolutional networks and a distancebased solution that outperformed other existing solutions. While manual post-processing for the tagged text is still needed, it significantly reduces the amount of it.","PeriodicalId":42504,"journal":{"name":"Infocommunications Journal","volume":"57 1","pages":""},"PeriodicalIF":0.9000,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Infocommunications Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.36244/ICJ.2020.4.2","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"TELECOMMUNICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

Dictionaries like Wordnet can help in a variety of Natural Language Processing applications by providing additional morphological data. They can be used in Digital Humanities research, building knowledge graphs and other applications. Creating dictionaries from large corpora of texts written in a natural language is a task that has not been a primary focus of research, as other tasks have dominated the field (such as chat-bots), but it can be a very useful tool in analysing texts. Even in the case of contemporary texts, categorizing the words according to their dictionary entry is a complex task, and for less conventional texts (in old or less researched languages) it is even harder to solve this problem automatically. Our task was to create a software that helps in expanding a dictionary containing word forms and tagging unprocessed text. We used a manually created corpus for training and testing the model. We created a combination of Bidirectional Long-Short Term Memory networks, convolutional networks and a distancebased solution that outperformed other existing solutions. While manual post-processing for the tagged text is still needed, it significantly reduces the amount of it.
基于距离、CNN和Bi-LSTM的词典扩展混合系统
像Wordnet这样的字典可以通过提供额外的形态学数据来帮助各种自然语言处理应用程序。它们可以用于数字人文研究、构建知识图谱和其他应用。从自然语言文本的大型语料库中创建词典并不是研究的主要焦点,因为其他任务已经主导了该领域(如聊天机器人),但它可以成为分析文本的非常有用的工具。即使在当代文本的情况下,根据词典条目对单词进行分类也是一项复杂的任务,而对于不太传统的文本(在古老的或较少研究的语言中),自动解决这个问题就更难了。我们的任务是创建一个软件,帮助扩展包含单词形式的字典,并标记未处理的文本。我们使用手动创建的语料库来训练和测试模型。我们创造了双向长短期记忆网络、卷积网络和基于距离的解决方案的组合,该解决方案优于其他现有解决方案。虽然仍然需要对标记文本进行手动后处理,但它大大减少了后处理的数量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Infocommunications Journal
Infocommunications Journal TELECOMMUNICATIONS-
CiteScore
1.90
自引率
27.30%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信