{"title":"建立一个葡萄牙语超市评论语料库,用于文档级情感分析","authors":"Vinícius Takeo Friedrich Kuwaki, Mateus Nepomuceno Ladeira, Matias Giuliano Gutierrez Benitez, Rui Jorge Tramontin Junior","doi":"10.14210/cotb.v13.p119-125","DOIUrl":null,"url":null,"abstract":"ABSTRACTSentiment Analysis (SA) is a field of research within Natural LanguageProcessing that has been growing in the last decades dueto social media and smartphones popularization. Many SA applicationsmake use of a corpus: a collection of data in textual formused to train and/or test SA resources. This work describes theconstruction of a corpus intended for document-level SA. The corpuscontains reviews of supermarkets throughout Brazil, extractedfrom Google Places. The data were collected taking into accountthe Brazilian geographic distribution and linguistic variations, andwere carefully reviewed. The corpus was then evaluated using ak-fold cross-validation method applied in both machine learningand deep learning techniques in which precision, accuracy, recalland f1-score were collected and compared among the techniques.It was also tested by a lexical approach using a domain specificlexicon.","PeriodicalId":375380,"journal":{"name":"Anais do XIII Computer on the Beach - COTB'22","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Building a corpus from supermarket reviews in portuguese for document-level Sentiment Analysis\",\"authors\":\"Vinícius Takeo Friedrich Kuwaki, Mateus Nepomuceno Ladeira, Matias Giuliano Gutierrez Benitez, Rui Jorge Tramontin Junior\",\"doi\":\"10.14210/cotb.v13.p119-125\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"ABSTRACTSentiment Analysis (SA) is a field of research within Natural LanguageProcessing that has been growing in the last decades dueto social media and smartphones popularization. Many SA applicationsmake use of a corpus: a collection of data in textual formused to train and/or test SA resources. This work describes theconstruction of a corpus intended for document-level SA. The corpuscontains reviews of supermarkets throughout Brazil, extractedfrom Google Places. The data were collected taking into accountthe Brazilian geographic distribution and linguistic variations, andwere carefully reviewed. The corpus was then evaluated using ak-fold cross-validation method applied in both machine learningand deep learning techniques in which precision, accuracy, recalland f1-score were collected and compared among the techniques.It was also tested by a lexical approach using a domain specificlexicon.\",\"PeriodicalId\":375380,\"journal\":{\"name\":\"Anais do XIII Computer on the Beach - COTB'22\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Anais do XIII Computer on the Beach - COTB'22\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.14210/cotb.v13.p119-125\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Anais do XIII Computer on the Beach - COTB'22","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14210/cotb.v13.p119-125","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Building a corpus from supermarket reviews in portuguese for document-level Sentiment Analysis
ABSTRACTSentiment Analysis (SA) is a field of research within Natural LanguageProcessing that has been growing in the last decades dueto social media and smartphones popularization. Many SA applicationsmake use of a corpus: a collection of data in textual formused to train and/or test SA resources. This work describes theconstruction of a corpus intended for document-level SA. The corpuscontains reviews of supermarkets throughout Brazil, extractedfrom Google Places. The data were collected taking into accountthe Brazilian geographic distribution and linguistic variations, andwere carefully reviewed. The corpus was then evaluated using ak-fold cross-validation method applied in both machine learningand deep learning techniques in which precision, accuracy, recalland f1-score were collected and compared among the techniques.It was also tested by a lexical approach using a domain specificlexicon.