{"title":"SentiLexBR: An Automatic Methodology of Building Sentiment Lexicons for the Portuguese Language","authors":"Tiago de Melo","doi":"10.5753/jidm.2022.2504","DOIUrl":null,"url":null,"abstract":"User reviews are readily available on the Web and widely used for sentiment analysis tasks. Sentiment lexicons plays an important role in sentiment analysis, where each sentiment word is given a sentiment label (positive or negative) or score (1 or -1). However, a sentiment lexicon may express different sentiment polarity according different domain. In addition, only a few studies on Portuguese sentiment analysis are reported due to the lack of resources including domain-specific sentiment lexical corpora. In this paper, we present an effective methodology, called SentiLexBR, using probabilities of the Bayes’ Theorem for building a set of sentiment lexicons. An unsupervised algorithm is proposed to automatically identify sentiment lexicons with their polarities for the Portuguese language. Experimental results on user reviews datasets in 12 different domains indicate the effectiveness of our methodology in domain-specific sentiment lexicon generation for Portuguese. In addition, the sentiment lexicon produced by SentiLexBR also significantly outperforms several alternative approaches of building domain-specific sentiment lexicons.","PeriodicalId":293511,"journal":{"name":"Journal of Information and Data Management","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Information and Data Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5753/jidm.2022.2504","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
User reviews are readily available on the Web and widely used for sentiment analysis tasks. Sentiment lexicons plays an important role in sentiment analysis, where each sentiment word is given a sentiment label (positive or negative) or score (1 or -1). However, a sentiment lexicon may express different sentiment polarity according different domain. In addition, only a few studies on Portuguese sentiment analysis are reported due to the lack of resources including domain-specific sentiment lexical corpora. In this paper, we present an effective methodology, called SentiLexBR, using probabilities of the Bayes’ Theorem for building a set of sentiment lexicons. An unsupervised algorithm is proposed to automatically identify sentiment lexicons with their polarities for the Portuguese language. Experimental results on user reviews datasets in 12 different domains indicate the effectiveness of our methodology in domain-specific sentiment lexicon generation for Portuguese. In addition, the sentiment lexicon produced by SentiLexBR also significantly outperforms several alternative approaches of building domain-specific sentiment lexicons.