{"title":"基于字节级预训练语言模型的推文情感提取","authors":"Haowei Liu, Enhao Tan","doi":"10.1145/3529836.3529941","DOIUrl":null,"url":null,"abstract":"Research on sentiment analysis developed rapidly in recent years, and twitter sentiment analysis is one of the most popular topics. Besides classifying the sentiment, it is also important to find out the decisive phrases or words of the text to the classified sentimental category. In this paper, we proposed and developed byte-level pretained RoBERTa models, they are designed to extract phrases from tweet data with sentiment labels. We compared RoBERTa model and its’ variants, including RoBERTa-base, RoBERTa-large, XLM-RoBERTa-base, and RoBERTa-large-mnli. We build the model with RoBERTa model and CNN, then train the model with given tweet text and sentiment labels so that the deciding phrases of sentiments can be predicted. Our results show that RoBERTa-base obtains Jaccard score of 0.712 and training time of 240 minutes in total, which is the best performance among all the models.","PeriodicalId":285191,"journal":{"name":"2022 14th International Conference on Machine Learning and Computing (ICMLC)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Tweet Sentiment Extraction Using Byte Level Pretrained Language Model∗\",\"authors\":\"Haowei Liu, Enhao Tan\",\"doi\":\"10.1145/3529836.3529941\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Research on sentiment analysis developed rapidly in recent years, and twitter sentiment analysis is one of the most popular topics. Besides classifying the sentiment, it is also important to find out the decisive phrases or words of the text to the classified sentimental category. In this paper, we proposed and developed byte-level pretained RoBERTa models, they are designed to extract phrases from tweet data with sentiment labels. We compared RoBERTa model and its’ variants, including RoBERTa-base, RoBERTa-large, XLM-RoBERTa-base, and RoBERTa-large-mnli. We build the model with RoBERTa model and CNN, then train the model with given tweet text and sentiment labels so that the deciding phrases of sentiments can be predicted. Our results show that RoBERTa-base obtains Jaccard score of 0.712 and training time of 240 minutes in total, which is the best performance among all the models.\",\"PeriodicalId\":285191,\"journal\":{\"name\":\"2022 14th International Conference on Machine Learning and Computing (ICMLC)\",\"volume\":\"21 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-02-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 14th International Conference on Machine Learning and Computing (ICMLC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3529836.3529941\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 14th International Conference on Machine Learning and Computing (ICMLC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3529836.3529941","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Tweet Sentiment Extraction Using Byte Level Pretrained Language Model∗
Research on sentiment analysis developed rapidly in recent years, and twitter sentiment analysis is one of the most popular topics. Besides classifying the sentiment, it is also important to find out the decisive phrases or words of the text to the classified sentimental category. In this paper, we proposed and developed byte-level pretained RoBERTa models, they are designed to extract phrases from tweet data with sentiment labels. We compared RoBERTa model and its’ variants, including RoBERTa-base, RoBERTa-large, XLM-RoBERTa-base, and RoBERTa-large-mnli. We build the model with RoBERTa model and CNN, then train the model with given tweet text and sentiment labels so that the deciding phrases of sentiments can be predicted. Our results show that RoBERTa-base obtains Jaccard score of 0.712 and training time of 240 minutes in total, which is the best performance among all the models.