Bilingual Auto-Categorization Comparison of Two LSTM Text Classifiers

Johannes Lindén, Xutao Wang, Stefan Forsström, Tingting Zhang
{"title":"Bilingual Auto-Categorization Comparison of Two LSTM Text Classifiers","authors":"Johannes Lindén, Xutao Wang, Stefan Forsström, Tingting Zhang","doi":"10.1109/IIAI-AAI.2019.00127","DOIUrl":null,"url":null,"abstract":"Multi linguistic problems such as auto-categorization is not an easy task. It is possible to train different models for each language, another way to do auto-categorization is to build the model in one base language and use automatic translation from other languages to that base language. Different languages have a bias to a language specific grammar and syntax and will therefore pose problems to be expressed in other languages. Translating from one language into a non-verbal language could potentially have a positive impact of the categorization results. A non-verbal language could for example be pure information in form of a knowledge graph relation extraction from the text. In this article a comparison is conducted between Chinese and Swedish languages. Two categorization models are developed and validated on each dataset. The purpose is to make an auto-categorization model that works for n'importe quel langage. One model is built upon LSTM and optimized for Swedish and the other is an improved Bidirectional-LSTM Convolution model optimized for Chinese. The improved algorithm is trained on both languages and compared with the LSTM algorithm. The Bidirectional-LSTM algorithm performs approximately 20% units better than the LSTM algorithm, which is significant.","PeriodicalId":136474,"journal":{"name":"2019 8th International Congress on Advanced Applied Informatics (IIAI-AAI)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 8th International Congress on Advanced Applied Informatics (IIAI-AAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IIAI-AAI.2019.00127","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Multi linguistic problems such as auto-categorization is not an easy task. It is possible to train different models for each language, another way to do auto-categorization is to build the model in one base language and use automatic translation from other languages to that base language. Different languages have a bias to a language specific grammar and syntax and will therefore pose problems to be expressed in other languages. Translating from one language into a non-verbal language could potentially have a positive impact of the categorization results. A non-verbal language could for example be pure information in form of a knowledge graph relation extraction from the text. In this article a comparison is conducted between Chinese and Swedish languages. Two categorization models are developed and validated on each dataset. The purpose is to make an auto-categorization model that works for n'importe quel langage. One model is built upon LSTM and optimized for Swedish and the other is an improved Bidirectional-LSTM Convolution model optimized for Chinese. The improved algorithm is trained on both languages and compared with the LSTM algorithm. The Bidirectional-LSTM algorithm performs approximately 20% units better than the LSTM algorithm, which is significant.
两种LSTM文本分类器的双语自动分类比较
像自动分类这样的多语言问题不是一件容易的事。可以为每种语言训练不同的模型,另一种进行自动分类的方法是用一种基本语言构建模型,并使用从其他语言到该基本语言的自动翻译。不同的语言对一种语言特定的语法和句法有偏见,因此会造成用其他语言表达的问题。从一种语言翻译成非言语语言可能会对分类结果产生积极的影响。例如,非言语语言可以是从文本中提取的知识图关系形式的纯信息。本文对汉语和瑞典语进行了比较。在每个数据集上开发并验证了两个分类模型。目的是建立一个自动分类模型,适用于非导入语言。其中一个模型是基于LSTM并针对瑞典语进行了优化的,另一个模型是针对汉语进行了优化的改进的双向LSTM卷积模型。改进算法在两种语言上进行了训练,并与LSTM算法进行了比较。Bidirectional-LSTM算法比LSTM算法的性能提高了约20%,这是非常显著的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信