Attention-Based Sub-Word Network for Multilingual Short Text Classification

Yaru Sun, Ying Yang, Yongjian Wang
{"title":"Attention-Based Sub-Word Network for Multilingual Short Text Classification","authors":"Yaru Sun, Ying Yang, Yongjian Wang","doi":"10.1145/3503047.3503060","DOIUrl":null,"url":null,"abstract":"Feature computation of multilingual text is an important semantic processing task in the field of natural language processing (NLP). In the actual production environment, the state-of-the-art models cannot analyze short texts mixed with multi-languages correctly. To tackle these problems, we propose a sub-word embedding network with multilingual features for short text understanding to capture the most important semantic information in a multilingual short sentence. In this work, our method utilizes a coupling coefficient calculation-based model that generates the sub-words of the input sentence. By sharing sub-word features, the feature space of multilingual mixed-word is constructed. The method that can extract the most significant information in a sentence without ignoring other relevant information. While the model is structurally simple, it can easily be trained end-to-end and scales to a large amount of training data. The experimental results on the Multilingual Short Text (MST), THUCNews and AGNews datasets show that our method outperforms most of the existing methods.","PeriodicalId":190604,"journal":{"name":"Proceedings of the 3rd International Conference on Advanced Information Science and System","volume":"138 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 3rd International Conference on Advanced Information Science and System","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3503047.3503060","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Feature computation of multilingual text is an important semantic processing task in the field of natural language processing (NLP). In the actual production environment, the state-of-the-art models cannot analyze short texts mixed with multi-languages correctly. To tackle these problems, we propose a sub-word embedding network with multilingual features for short text understanding to capture the most important semantic information in a multilingual short sentence. In this work, our method utilizes a coupling coefficient calculation-based model that generates the sub-words of the input sentence. By sharing sub-word features, the feature space of multilingual mixed-word is constructed. The method that can extract the most significant information in a sentence without ignoring other relevant information. While the model is structurally simple, it can easily be trained end-to-end and scales to a large amount of training data. The experimental results on the Multilingual Short Text (MST), THUCNews and AGNews datasets show that our method outperforms most of the existing methods.
基于注意的多语言短文本分类子词网络
多语言文本特征计算是自然语言处理(NLP)领域的一项重要语义处理任务。在实际的生产环境中,最先进的模型不能正确地分析混合了多种语言的短文本。为了解决这些问题,我们提出了一种具有多语言特征的子词嵌入网络,用于短文本理解,以捕获多语言短文句中最重要的语义信息。在这项工作中,我们的方法利用基于耦合系数计算的模型来生成输入句子的子词。通过共享子词特征,构建多语言混合词的特征空间。在不忽略其他相关信息的情况下,提取句子中最重要信息的方法。虽然该模型结构简单,但它可以很容易地进行端到端训练,并扩展到大量的训练数据。在多语言短文本(MST)、THUCNews和AGNews数据集上的实验结果表明,该方法优于大多数现有方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信