{"title":"Attention-Based Sub-Word Network for Multilingual Short Text Classification","authors":"Yaru Sun, Ying Yang, Yongjian Wang","doi":"10.1145/3503047.3503060","DOIUrl":null,"url":null,"abstract":"Feature computation of multilingual text is an important semantic processing task in the field of natural language processing (NLP). In the actual production environment, the state-of-the-art models cannot analyze short texts mixed with multi-languages correctly. To tackle these problems, we propose a sub-word embedding network with multilingual features for short text understanding to capture the most important semantic information in a multilingual short sentence. In this work, our method utilizes a coupling coefficient calculation-based model that generates the sub-words of the input sentence. By sharing sub-word features, the feature space of multilingual mixed-word is constructed. The method that can extract the most significant information in a sentence without ignoring other relevant information. While the model is structurally simple, it can easily be trained end-to-end and scales to a large amount of training data. The experimental results on the Multilingual Short Text (MST), THUCNews and AGNews datasets show that our method outperforms most of the existing methods.","PeriodicalId":190604,"journal":{"name":"Proceedings of the 3rd International Conference on Advanced Information Science and System","volume":"138 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 3rd International Conference on Advanced Information Science and System","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3503047.3503060","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Feature computation of multilingual text is an important semantic processing task in the field of natural language processing (NLP). In the actual production environment, the state-of-the-art models cannot analyze short texts mixed with multi-languages correctly. To tackle these problems, we propose a sub-word embedding network with multilingual features for short text understanding to capture the most important semantic information in a multilingual short sentence. In this work, our method utilizes a coupling coefficient calculation-based model that generates the sub-words of the input sentence. By sharing sub-word features, the feature space of multilingual mixed-word is constructed. The method that can extract the most significant information in a sentence without ignoring other relevant information. While the model is structurally simple, it can easily be trained end-to-end and scales to a large amount of training data. The experimental results on the Multilingual Short Text (MST), THUCNews and AGNews datasets show that our method outperforms most of the existing methods.