Fernando Tadao Ito, Helena de Medeiros Caseli, J. Moreira
{"title":"The Effects of Underlying Mono and Multilingual Representations for Text Classification","authors":"Fernando Tadao Ito, Helena de Medeiros Caseli, J. Moreira","doi":"10.1109/BRACIS.2018.00054","DOIUrl":null,"url":null,"abstract":"With the exponential growth of multimedia datasets comes the need to combine multiple data representations to create \"conceptual\" vector spaces in order to use all available sources of information. Following previous experiments [1], in this paper we explore how two different languages can be combined to better represent information. Methods to create textual representations, such as Word2Vec and GloVe, are already well-established in academia and, usually, a single representation method is used in Machine Learning tasks. In this paper, we investigate the effects of different combinations of textual representations to perform classification tasks on a multilingual dataset composed of international news in Portuguese and English. This paper aims to analyze the differences between combinations, and how these representations perform in a small dataset with multiple data inputs.","PeriodicalId":405190,"journal":{"name":"2018 7th Brazilian Conference on Intelligent Systems (BRACIS)","volume":"150 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 7th Brazilian Conference on Intelligent Systems (BRACIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BRACIS.2018.00054","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
With the exponential growth of multimedia datasets comes the need to combine multiple data representations to create "conceptual" vector spaces in order to use all available sources of information. Following previous experiments [1], in this paper we explore how two different languages can be combined to better represent information. Methods to create textual representations, such as Word2Vec and GloVe, are already well-established in academia and, usually, a single representation method is used in Machine Learning tasks. In this paper, we investigate the effects of different combinations of textual representations to perform classification tasks on a multilingual dataset composed of international news in Portuguese and English. This paper aims to analyze the differences between combinations, and how these representations perform in a small dataset with multiple data inputs.