深度学习情感分析方法的比较,包括LSTM和机器学习

Q2 Social Sciences
Jean Max T. Habib, A. A. Poguda
{"title":"深度学习情感分析方法的比较,包括LSTM和机器学习","authors":"Jean Max T. Habib, A. A. Poguda","doi":"10.21686/1818-4243-2023-4-60-71","DOIUrl":null,"url":null,"abstract":"Purpose of research. The purpose of the study is to evaluate certain machine learning models in data processing based on speed and efficiency related to the analysis of sentiment or consumer opinions in business intelligence. To highlight the existing developments, an overview of modern methods and models of sentiment analysis is given, demonstrating their advantages and disadvantages.Materials and methods. In order to improve the semester analysis process, organized using existing methods and models, it is necessary to adjust it in accordance with the growing changes in information flows today. In this case, it is crucial for researchers to explore the possibilities of updating certain tools, either to combine them or to develop them to adapt them to modern tasks in order to provide a clearer understanding of the results of their treatment. We present a comparison of several deep learning models, including convolutional neural networks, recurrent neural networks, and long-term and shortterm bidirectional memory, evaluated using different approaches to word integration, including Bidirectional Encoder Representations from Transformers (BERT) and its variants, FastText and Word2Vec. Data augmentation was conducted using a simple data augmentation approach. This project uses natural language processing (NLP), deep learning, and models such as LSTM, CNN, SVM TF-IDF, Adaboost, Naive Bayes, and then combinations of models.The results of the study allowed us to obtain and verify model results with user reviews and compare model accuracy to see which model had the highest accuracy results from the models and their combination of CNN with LSTM model, but SVM with TF-IDF vectoring was most effective for this unbalanced data set. In the constructed model, the result was the following indexes: ROC AUC - 0.82, precision - 0.92, F1 - 0.82, Precision - 0.82, and Recall - 0.82. More research and model implementation can be done to find a better model.Conclusion. Natural language text analysis has advanced quite a bit in recent years, and it is possible that such problems will be completely solved in the near future. Several different models in ML and CNN with the LSTM model, but SVM with the TF-IDF vectorizer proved most effective for this unbalanced data set. In general, both deep classification algorithm. A combination of both approaches can also learning and feature-based selection methods can be used to solve be used to further improve the efficiency of the algorithm. some of the most pressing problems. Deep learning is useful when the most relevant features are not known in advance, while feature-based","PeriodicalId":33645,"journal":{"name":"Open Education Studies","volume":"57 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparison of Deep Learning Sentiment Analysis Methods, Including LSTM and Machine Learning\",\"authors\":\"Jean Max T. Habib, A. A. Poguda\",\"doi\":\"10.21686/1818-4243-2023-4-60-71\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Purpose of research. The purpose of the study is to evaluate certain machine learning models in data processing based on speed and efficiency related to the analysis of sentiment or consumer opinions in business intelligence. To highlight the existing developments, an overview of modern methods and models of sentiment analysis is given, demonstrating their advantages and disadvantages.Materials and methods. In order to improve the semester analysis process, organized using existing methods and models, it is necessary to adjust it in accordance with the growing changes in information flows today. In this case, it is crucial for researchers to explore the possibilities of updating certain tools, either to combine them or to develop them to adapt them to modern tasks in order to provide a clearer understanding of the results of their treatment. We present a comparison of several deep learning models, including convolutional neural networks, recurrent neural networks, and long-term and shortterm bidirectional memory, evaluated using different approaches to word integration, including Bidirectional Encoder Representations from Transformers (BERT) and its variants, FastText and Word2Vec. Data augmentation was conducted using a simple data augmentation approach. This project uses natural language processing (NLP), deep learning, and models such as LSTM, CNN, SVM TF-IDF, Adaboost, Naive Bayes, and then combinations of models.The results of the study allowed us to obtain and verify model results with user reviews and compare model accuracy to see which model had the highest accuracy results from the models and their combination of CNN with LSTM model, but SVM with TF-IDF vectoring was most effective for this unbalanced data set. In the constructed model, the result was the following indexes: ROC AUC - 0.82, precision - 0.92, F1 - 0.82, Precision - 0.82, and Recall - 0.82. More research and model implementation can be done to find a better model.Conclusion. Natural language text analysis has advanced quite a bit in recent years, and it is possible that such problems will be completely solved in the near future. Several different models in ML and CNN with the LSTM model, but SVM with the TF-IDF vectorizer proved most effective for this unbalanced data set. In general, both deep classification algorithm. A combination of both approaches can also learning and feature-based selection methods can be used to solve be used to further improve the efficiency of the algorithm. some of the most pressing problems. Deep learning is useful when the most relevant features are not known in advance, while feature-based\",\"PeriodicalId\":33645,\"journal\":{\"name\":\"Open Education Studies\",\"volume\":\"57 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-08-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Open Education Studies\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.21686/1818-4243-2023-4-60-71\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Social Sciences\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Open Education Studies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21686/1818-4243-2023-4-60-71","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Social Sciences","Score":null,"Total":0}
引用次数: 0

摘要

研究目的。本研究的目的是基于商业智能中与情绪或消费者意见分析相关的速度和效率来评估数据处理中的某些机器学习模型。为了突出现有的发展,概述了现代情感分析方法和模型,并指出了它们的优缺点。材料和方法。为了改进使用现有方法和模型组织的学期分析过程,有必要根据当今信息流的日益变化对其进行调整。在这种情况下,研究人员探索更新某些工具的可能性是至关重要的,要么将它们组合起来,要么开发它们以使其适应现代任务,以便更清楚地了解它们的治疗结果。我们展示了几种深度学习模型的比较,包括卷积神经网络、循环神经网络以及长期和短期双向记忆,使用不同的单词整合方法进行评估,包括来自变形器(BERT)及其变体的双向编码器表示,FastText和Word2Vec。使用简单的数据增强方法进行数据增强。本项目使用自然语言处理(NLP),深度学习,以及LSTM, CNN, SVM TF-IDF, Adaboost,朴素贝叶斯等模型,然后组合模型。本研究的结果使我们可以通过用户评论来获取和验证模型结果,并比较模型精度,从模型和CNN与LSTM模型的组合来看,哪个模型的精度结果最高,但对于这个不平衡的数据集,使用TF-IDF矢量的SVM效果最好。在构建的模型中,结果如下指标:ROC AUC - 0.82, precision - 0.92, F1 - 0.82, precision - 0.82, Recall - 0.82。可以进行更多的研究和模型实现,以找到更好的模型。近年来,自然语言文本分析已经取得了相当大的进步,在不久的将来,这些问题有可能得到彻底解决。ML和CNN中有几种不同的模型使用LSTM模型,但使用TF-IDF矢量器的SVM对这种不平衡数据集证明是最有效的。一般来说,这两种深度分类算法。两种方法的结合也可以采用学习和基于特征的选择方法来求解,从而进一步提高算法的效率。一些最紧迫的问题。深度学习在不知道最相关的特征时是有用的,而基于特征
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Comparison of Deep Learning Sentiment Analysis Methods, Including LSTM and Machine Learning
Purpose of research. The purpose of the study is to evaluate certain machine learning models in data processing based on speed and efficiency related to the analysis of sentiment or consumer opinions in business intelligence. To highlight the existing developments, an overview of modern methods and models of sentiment analysis is given, demonstrating their advantages and disadvantages.Materials and methods. In order to improve the semester analysis process, organized using existing methods and models, it is necessary to adjust it in accordance with the growing changes in information flows today. In this case, it is crucial for researchers to explore the possibilities of updating certain tools, either to combine them or to develop them to adapt them to modern tasks in order to provide a clearer understanding of the results of their treatment. We present a comparison of several deep learning models, including convolutional neural networks, recurrent neural networks, and long-term and shortterm bidirectional memory, evaluated using different approaches to word integration, including Bidirectional Encoder Representations from Transformers (BERT) and its variants, FastText and Word2Vec. Data augmentation was conducted using a simple data augmentation approach. This project uses natural language processing (NLP), deep learning, and models such as LSTM, CNN, SVM TF-IDF, Adaboost, Naive Bayes, and then combinations of models.The results of the study allowed us to obtain and verify model results with user reviews and compare model accuracy to see which model had the highest accuracy results from the models and their combination of CNN with LSTM model, but SVM with TF-IDF vectoring was most effective for this unbalanced data set. In the constructed model, the result was the following indexes: ROC AUC - 0.82, precision - 0.92, F1 - 0.82, Precision - 0.82, and Recall - 0.82. More research and model implementation can be done to find a better model.Conclusion. Natural language text analysis has advanced quite a bit in recent years, and it is possible that such problems will be completely solved in the near future. Several different models in ML and CNN with the LSTM model, but SVM with the TF-IDF vectorizer proved most effective for this unbalanced data set. In general, both deep classification algorithm. A combination of both approaches can also learning and feature-based selection methods can be used to solve be used to further improve the efficiency of the algorithm. some of the most pressing problems. Deep learning is useful when the most relevant features are not known in advance, while feature-based
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Open Education Studies
Open Education Studies Social Sciences-Social Sciences (miscellaneous)
CiteScore
1.80
自引率
0.00%
发文量
19
审稿时长
27 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信