Pantea Ferdosian, Sean Grace, Vasudha Manikandan, Lucas Moles, Debajyoti Datta, Donald E. Brown
{"title":"Improving the Efficiency and Effectiveness of Multilingual Classification Methods for Sentiment Analysis","authors":"Pantea Ferdosian, Sean Grace, Vasudha Manikandan, Lucas Moles, Debajyoti Datta, Donald E. Brown","doi":"10.1109/SIEDS52267.2021.9483767","DOIUrl":null,"url":null,"abstract":"The growing field of customer experience management relies heavily on natural language processing (NLP). An important current use of NLP in this industry is to efficiently build sentiment models in new languages. These new language models will allow access to a greater range of clients. In this work, we examine the practical effectiveness and training data requirements of transfer learning methods, specifically mBERT and XLM-RoBERTa, for developing sentiment analysis models in German. To provide a meaningful comparison that excludes transfer learning, we also utilize and train an LSTM classification model. The models are tested by studying the performance gains for different amounts of target language training data. The results enable efficient building of NLP models by allowing prediction of the data requirements for a desired accuracy.","PeriodicalId":426747,"journal":{"name":"2021 Systems and Information Engineering Design Symposium (SIEDS)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Systems and Information Engineering Design Symposium (SIEDS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SIEDS52267.2021.9483767","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The growing field of customer experience management relies heavily on natural language processing (NLP). An important current use of NLP in this industry is to efficiently build sentiment models in new languages. These new language models will allow access to a greater range of clients. In this work, we examine the practical effectiveness and training data requirements of transfer learning methods, specifically mBERT and XLM-RoBERTa, for developing sentiment analysis models in German. To provide a meaningful comparison that excludes transfer learning, we also utilize and train an LSTM classification model. The models are tested by studying the performance gains for different amounts of target language training data. The results enable efficient building of NLP models by allowing prediction of the data requirements for a desired accuracy.