Gideon Popoola, Khadijat-Kuburat Abdullah, Gerard Shu Fuhnwi, Janet O. Agbaje
{"title":"使用 TF-IDF 和机器学习算法对金融新闻数据进行情感分析","authors":"Gideon Popoola, Khadijat-Kuburat Abdullah, Gerard Shu Fuhnwi, Janet O. Agbaje","doi":"10.1109/ICAIC60265.2024.10433843","DOIUrl":null,"url":null,"abstract":"Blogs, online forums, comment sections, and social networking sites like Facebook, Twitter (now known as X), and Instagram can all be called social media. The growing use of social media has made some unstructured data available, which can benefit us if we clean, structure, and analyze the data. Twitter is a popular microblogging social media platform where people share and express their opinions about any topic. The act of analyzing these opinions of people is called sentimental analysis. Sentimental analysis can be helpful to individuals, businesses, government agencies, etc. In this study, tweets related to financial news were extracted, labeled, and analyzed to capture the opinions of people around the world. This paper proposes a novel machine learning-based approach to analyze social media data for sentiment analysis. The presented approach is divided into three steps. The first stage is preprocessing, where the tweets are refined and filtered. In the second stage, feature extraction was performed using Term Frequency and Inverse Document Frequency (TF-IDF). The third stage involves using the extracted features to make predictions using machine learning algorithms. Three machine learning models were used, namely, random forest classifier (RF), Naïve Bayes (NB), and k-nearest neighbor (KNN). The evaluation results show that both NB and RF perform better than KNN in accuracy, precision, Recall, and F1-score metrics. These results also show an overwhelmingly positive opinion regarding financial news.","PeriodicalId":517265,"journal":{"name":"2024 IEEE 3rd International Conference on AI in Cybersecurity (ICAIC)","volume":"283 8","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2024-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Sentiment Analysis of Financial News Data using TF-IDF and Machine Learning Algorithms\",\"authors\":\"Gideon Popoola, Khadijat-Kuburat Abdullah, Gerard Shu Fuhnwi, Janet O. Agbaje\",\"doi\":\"10.1109/ICAIC60265.2024.10433843\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Blogs, online forums, comment sections, and social networking sites like Facebook, Twitter (now known as X), and Instagram can all be called social media. The growing use of social media has made some unstructured data available, which can benefit us if we clean, structure, and analyze the data. Twitter is a popular microblogging social media platform where people share and express their opinions about any topic. The act of analyzing these opinions of people is called sentimental analysis. Sentimental analysis can be helpful to individuals, businesses, government agencies, etc. In this study, tweets related to financial news were extracted, labeled, and analyzed to capture the opinions of people around the world. This paper proposes a novel machine learning-based approach to analyze social media data for sentiment analysis. The presented approach is divided into three steps. The first stage is preprocessing, where the tweets are refined and filtered. In the second stage, feature extraction was performed using Term Frequency and Inverse Document Frequency (TF-IDF). The third stage involves using the extracted features to make predictions using machine learning algorithms. Three machine learning models were used, namely, random forest classifier (RF), Naïve Bayes (NB), and k-nearest neighbor (KNN). The evaluation results show that both NB and RF perform better than KNN in accuracy, precision, Recall, and F1-score metrics. These results also show an overwhelmingly positive opinion regarding financial news.\",\"PeriodicalId\":517265,\"journal\":{\"name\":\"2024 IEEE 3rd International Conference on AI in Cybersecurity (ICAIC)\",\"volume\":\"283 8\",\"pages\":\"1-6\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-02-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2024 IEEE 3rd International Conference on AI in Cybersecurity (ICAIC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICAIC60265.2024.10433843\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2024 IEEE 3rd International Conference on AI in Cybersecurity (ICAIC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAIC60265.2024.10433843","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
博客、在线论坛、评论区以及 Facebook、Twitter(现在称为 X)和 Instagram 等社交网站都可称为社交媒体。社交媒体的使用日益增多,使得一些非结构化数据变得可用,如果我们对这些数据进行清理、结构化和分析,就能从中受益。Twitter 是一个流行的微博社交媒体平台,人们在这个平台上分享和表达自己对任何话题的看法。对这些观点进行分析的行为被称为情感分析。情感分析对个人、企业、政府机构等都有帮助。本研究对与财经新闻相关的推文进行了提取、标记和分析,以捕捉世界各地人们的观点。本文提出了一种基于机器学习的新方法来分析社交媒体数据,以进行情感分析。该方法分为三个步骤。第一阶段是预处理,对推文进行提炼和过滤。在第二阶段,使用术语频率和反向文档频率(TF-IDF)进行特征提取。第三阶段是利用提取的特征,使用机器学习算法进行预测。使用了三种机器学习模型,即随机森林分类器(RF)、奈夫贝叶斯(NB)和 k 近邻(KNN)。评估结果表明,NB 和 RF 在准确率、精确度、召回率和 F1 分数指标上都优于 KNN。这些结果还表明,人们对财经新闻的看法绝大多数是正面的。
Sentiment Analysis of Financial News Data using TF-IDF and Machine Learning Algorithms
Blogs, online forums, comment sections, and social networking sites like Facebook, Twitter (now known as X), and Instagram can all be called social media. The growing use of social media has made some unstructured data available, which can benefit us if we clean, structure, and analyze the data. Twitter is a popular microblogging social media platform where people share and express their opinions about any topic. The act of analyzing these opinions of people is called sentimental analysis. Sentimental analysis can be helpful to individuals, businesses, government agencies, etc. In this study, tweets related to financial news were extracted, labeled, and analyzed to capture the opinions of people around the world. This paper proposes a novel machine learning-based approach to analyze social media data for sentiment analysis. The presented approach is divided into three steps. The first stage is preprocessing, where the tweets are refined and filtered. In the second stage, feature extraction was performed using Term Frequency and Inverse Document Frequency (TF-IDF). The third stage involves using the extracted features to make predictions using machine learning algorithms. Three machine learning models were used, namely, random forest classifier (RF), Naïve Bayes (NB), and k-nearest neighbor (KNN). The evaluation results show that both NB and RF perform better than KNN in accuracy, precision, Recall, and F1-score metrics. These results also show an overwhelmingly positive opinion regarding financial news.