基于大数据分析和深度学习模型的Twitter数据情感分析

2023 International Conference on Artificial Intelligence and Knowledge Discovery in Concurrent Engineering (ICECONF) Pub Date : 2023-01-05 DOI:10.1109/ICECONF57129.2023.10084281

Harika Vanam, Jeberson Retna Raj R

{"title":"基于大数据分析和深度学习模型的Twitter数据情感分析","authors":"Harika Vanam, Jeberson Retna Raj R","doi":"10.1109/ICECONF57129.2023.10084281","DOIUrl":null,"url":null,"abstract":"People from all over the world can express their thoughts and opinions through various online social media platforms. People use social media platforms online daily to communicate with one another and stay informed about current events. A large number of tweets covering a wide range of subjects are sent to Twitter daily. Twitter is one of the most well-known and widely used online social media platforms. Extracting features and locating trends can be accomplished through the use of machine learning algorithms. Tools and strategies designed specifically for working with large amounts of data are required to successfully extract useful information from the never-ending stream of data that is produced by Twitter. In this paper, we mainly focus on hashtag identification and identify the industry that possesses the highest share of voice. In this paper, we collect live data from Twitter by using Apache Spark. After that, we classify each tweet by making use of the machine learning techniques that are provided by the Apache Spark machine learning library. To test the model, Convolution neural network (CNN) and logistic regression (LR) are being utilized. The CNN method outperformed the Logistic Regression strategy by performing with an accuracy of approximately 95% on average and scoring 0.60 on the F1 scale. Both the accuracy and the F1 score are currently sitting at 0.59. According to the findings, real-time tweets can be evaluated considerably more quickly using the Apache Spark tool for big data as opposed to the conventional execution environment. The results show that real-time tweets can be evaluated much faster using the Apache Spark tool for big data instead of the usual execution environment.","PeriodicalId":436733,"journal":{"name":"2023 International Conference on Artificial Intelligence and Knowledge Discovery in Concurrent Engineering (ICECONF)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Sentiment Analysis of Twitter Data Using Big Data Analytics and Deep Learning Model\",\"authors\":\"Harika Vanam, Jeberson Retna Raj R\",\"doi\":\"10.1109/ICECONF57129.2023.10084281\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"People from all over the world can express their thoughts and opinions through various online social media platforms. People use social media platforms online daily to communicate with one another and stay informed about current events. A large number of tweets covering a wide range of subjects are sent to Twitter daily. Twitter is one of the most well-known and widely used online social media platforms. Extracting features and locating trends can be accomplished through the use of machine learning algorithms. Tools and strategies designed specifically for working with large amounts of data are required to successfully extract useful information from the never-ending stream of data that is produced by Twitter. In this paper, we mainly focus on hashtag identification and identify the industry that possesses the highest share of voice. In this paper, we collect live data from Twitter by using Apache Spark. After that, we classify each tweet by making use of the machine learning techniques that are provided by the Apache Spark machine learning library. To test the model, Convolution neural network (CNN) and logistic regression (LR) are being utilized. The CNN method outperformed the Logistic Regression strategy by performing with an accuracy of approximately 95% on average and scoring 0.60 on the F1 scale. Both the accuracy and the F1 score are currently sitting at 0.59. According to the findings, real-time tweets can be evaluated considerably more quickly using the Apache Spark tool for big data as opposed to the conventional execution environment. The results show that real-time tweets can be evaluated much faster using the Apache Spark tool for big data instead of the usual execution environment.\",\"PeriodicalId\":436733,\"journal\":{\"name\":\"2023 International Conference on Artificial Intelligence and Knowledge Discovery in Concurrent Engineering (ICECONF)\",\"volume\":\"41 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 International Conference on Artificial Intelligence and Knowledge Discovery in Concurrent Engineering (ICECONF)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICECONF57129.2023.10084281\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on Artificial Intelligence and Knowledge Discovery in Concurrent Engineering (ICECONF)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICECONF57129.2023.10084281","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

来自世界各地的人们可以通过各种在线社交媒体平台表达自己的想法和观点。人们每天都在网上使用社交媒体平台相互交流，了解时事。每天都有大量的推文被发送到Twitter上，这些推文涵盖了广泛的主题。推特是最知名和最广泛使用的在线社交媒体平台之一。提取特征和定位趋势可以通过使用机器学习算法来完成。需要专门设计用于处理大量数据的工具和策略，才能成功地从Twitter产生的永无止境的数据流中提取有用的信息。在本文中，我们主要关注于标签识别，并找出拥有最高话语权份额的行业。在本文中，我们使用Apache Spark从Twitter收集实时数据。之后，我们通过使用Apache Spark机器学习库提供的机器学习技术对每条推文进行分类。为了测试模型，使用卷积神经网络(CNN)和逻辑回归(LR)。CNN方法优于Logistic回归策略，平均准确率约为95%，F1评分为0.60。准确率和F1分数目前都是0.59。根据研究结果，与传统的执行环境相比，使用Apache Spark大数据工具可以更快地评估实时tweet。结果表明，使用Apache Spark大数据工具而不是通常的执行环境，可以更快地评估实时tweet。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Sentiment Analysis of Twitter Data Using Big Data Analytics and Deep Learning Model

People from all over the world can express their thoughts and opinions through various online social media platforms. People use social media platforms online daily to communicate with one another and stay informed about current events. A large number of tweets covering a wide range of subjects are sent to Twitter daily. Twitter is one of the most well-known and widely used online social media platforms. Extracting features and locating trends can be accomplished through the use of machine learning algorithms. Tools and strategies designed specifically for working with large amounts of data are required to successfully extract useful information from the never-ending stream of data that is produced by Twitter. In this paper, we mainly focus on hashtag identification and identify the industry that possesses the highest share of voice. In this paper, we collect live data from Twitter by using Apache Spark. After that, we classify each tweet by making use of the machine learning techniques that are provided by the Apache Spark machine learning library. To test the model, Convolution neural network (CNN) and logistic regression (LR) are being utilized. The CNN method outperformed the Logistic Regression strategy by performing with an accuracy of approximately 95% on average and scoring 0.60 on the F1 scale. Both the accuracy and the F1 score are currently sitting at 0.59. According to the findings, real-time tweets can be evaluated considerably more quickly using the Apache Spark tool for big data as opposed to the conventional execution environment. The results show that real-time tweets can be evaluated much faster using the Apache Spark tool for big data instead of the usual execution environment.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2023 International Conference on Artificial Intelligence and Knowledge Discovery in Concurrent Engineering (ICECONF)

自引率

0.00%

发文量