医疗保健领域意大利语评论情感分析的机器学习方法

Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020 Pub Date : 1900-01-01 DOI:10.4000/books.aaccademia.8225

Luca Bacco, Andrea Cimino, L. Paulon, M. Merone, F. Dell’Orletta

{"title":"医疗保健领域意大利语评论情感分析的机器学习方法","authors":"Luca Bacco, Andrea Cimino, L. Paulon, M. Merone, F. Dell’Orletta","doi":"10.4000/books.aaccademia.8225","DOIUrl":null,"url":null,"abstract":"In this paper, we present our approach to the task of binary sentiment classification for Italian reviews in healthcare domain. We first collected a new dataset for such domain. Then, we compared the results obtained by two different systems, one including a Support Vector Machine and one with BERT. For the first one, we linguistic pre–processed the dataset to extract hand-crafted features exploited by the classifier. For the second one, we oversampled the dataset to achieve better results. Our results show that the SVMbased system, without the worry of having to oversample, has better performance than the BERT-based one, achieving an F1-score of 91.21%.","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":"121 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"A Machine Learning approach for Sentiment Analysis for Italian Reviews in Healthcare\",\"authors\":\"Luca Bacco, Andrea Cimino, L. Paulon, M. Merone, F. Dell’Orletta\",\"doi\":\"10.4000/books.aaccademia.8225\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we present our approach to the task of binary sentiment classification for Italian reviews in healthcare domain. We first collected a new dataset for such domain. Then, we compared the results obtained by two different systems, one including a Support Vector Machine and one with BERT. For the first one, we linguistic pre–processed the dataset to extract hand-crafted features exploited by the classifier. For the second one, we oversampled the dataset to achieve better results. Our results show that the SVMbased system, without the worry of having to oversample, has better performance than the BERT-based one, achieving an F1-score of 91.21%.\",\"PeriodicalId\":300279,\"journal\":{\"name\":\"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020\",\"volume\":\"121 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4000/books.aaccademia.8225\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4000/books.aaccademia.8225","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

在本文中，我们提出了我们的方法来二元情感分类任务的意大利评论在医疗保健领域。我们首先为该领域收集了一个新的数据集。然后，我们比较了两种不同系统的结果，一种是包含支持向量机的系统，一种是包含BERT的系统。对于第一个，我们对数据集进行语言预处理，以提取分类器利用的手工特征。对于第二个，我们对数据集进行过采样以获得更好的结果。我们的研究结果表明，基于svm的系统在不需要过采样的情况下，比基于bert的系统有更好的性能，达到了91.21%的f1分数。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Machine Learning approach for Sentiment Analysis for Italian Reviews in Healthcare

In this paper, we present our approach to the task of binary sentiment classification for Italian reviews in healthcare domain. We first collected a new dataset for such domain. Then, we compared the results obtained by two different systems, one including a Support Vector Machine and one with BERT. For the first one, we linguistic pre–processed the dataset to extract hand-crafted features exploited by the classifier. For the second one, we oversampled the dataset to achieve better results. Our results show that the SVMbased system, without the worry of having to oversample, has better performance than the BERT-based one, achieving an F1-score of 91.21%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020

自引率

0.00%

发文量