{"title":"Building a corpus from supermarket reviews in portuguese for document-level Sentiment Analysis","authors":"Vinícius Takeo Friedrich Kuwaki, Mateus Nepomuceno Ladeira, Matias Giuliano Gutierrez Benitez, Rui Jorge Tramontin Junior","doi":"10.14210/cotb.v13.p119-125","DOIUrl":null,"url":null,"abstract":"ABSTRACTSentiment Analysis (SA) is a field of research within Natural LanguageProcessing that has been growing in the last decades dueto social media and smartphones popularization. Many SA applicationsmake use of a corpus: a collection of data in textual formused to train and/or test SA resources. This work describes theconstruction of a corpus intended for document-level SA. The corpuscontains reviews of supermarkets throughout Brazil, extractedfrom Google Places. The data were collected taking into accountthe Brazilian geographic distribution and linguistic variations, andwere carefully reviewed. The corpus was then evaluated using ak-fold cross-validation method applied in both machine learningand deep learning techniques in which precision, accuracy, recalland f1-score were collected and compared among the techniques.It was also tested by a lexical approach using a domain specificlexicon.","PeriodicalId":375380,"journal":{"name":"Anais do XIII Computer on the Beach - COTB'22","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Anais do XIII Computer on the Beach - COTB'22","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14210/cotb.v13.p119-125","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
ABSTRACTSentiment Analysis (SA) is a field of research within Natural LanguageProcessing that has been growing in the last decades dueto social media and smartphones popularization. Many SA applicationsmake use of a corpus: a collection of data in textual formused to train and/or test SA resources. This work describes theconstruction of a corpus intended for document-level SA. The corpuscontains reviews of supermarkets throughout Brazil, extractedfrom Google Places. The data were collected taking into accountthe Brazilian geographic distribution and linguistic variations, andwere carefully reviewed. The corpus was then evaluated using ak-fold cross-validation method applied in both machine learningand deep learning techniques in which precision, accuracy, recalland f1-score were collected and compared among the techniques.It was also tested by a lexical approach using a domain specificlexicon.