基于集成方法的混合顶级特征提取模型用于检测X个谣言事件

IF 0.7 4区计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Journal of Web Engineering Pub Date : 2025-01-01 DOI:10.13052/jwe1540-9589.2414

Taukir Alam;Wei Chung Shia;Fang Rong Hsu;Taimoor Hassan;Pei-Chun Lin;Eric Odle;Junzo Watada

{"title":"基于集成方法的混合顶级特征提取模型用于检测X个谣言事件","authors":"Taukir Alam;Wei Chung Shia;Fang Rong Hsu;Taimoor Hassan;Pei-Chun Lin;Eric Odle;Junzo Watada","doi":"10.13052/jwe1540-9589.2414","DOIUrl":null,"url":null,"abstract":"The paper describes a novel a hybrid ensemble algorithm (HEA) that combines ensemble learning, class imbalance handling, and feature extraction. To address class imbalance in the dataset, the suggested approach integrates SMOTE oversampling and random under sampling (RU) feature extraction. To begin, Pearson correlation analysis is used to detect highly associated features in a dataset. This analysis aids in the selection of the most relevant features, which are either substantially related to the target variable or have a strong association with other features. The method seeks to improve classification performance by focusing on these correlated features. Following that, the SMOTE oversampling and RU algorithms are used to balance the majority and minority categorization characteristics. The SMOTE (synthetic minority oversampling technique) develops synthetic cases for the minority class by interpolating between existing instances, enhancing minority class representation. RU, on the other hand, removes instances from the majority class at random to obtain a balanced distribution. Furthermore, the random forest classifier (RFC) model's key features are input into an ensemble of decision tree (DT), k-nearest neighbor (KNN), adaptive boosting (AdaBoost), and convolutional neural network (CNN) approaches. This ensemble approach combines multiple models' predictions, exploiting their particular strengths and catching varied patterns in the data. Popular machine learning algorithms include DT, KNN, AdaBoost, and CNN, which are notable for their capacity to handle many types of data and capture complicated relationships. The evaluation findings show that the suggested HEA approach is effective, with a maximum precision, recall, F-score, and accuracy of 90%. The proposed methodology produces encouraging results, proving its applicability to a variety of categorization problems.","PeriodicalId":49952,"journal":{"name":"Journal of Web Engineering","volume":"24 1","pages":"79-106"},"PeriodicalIF":0.7000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10924702","citationCount":"0","resultStr":"{\"title\":\"Hybrid Top Features Extraction Model for Detecting X Rumor Events Using an Ensemble Method\",\"authors\":\"Taukir Alam;Wei Chung Shia;Fang Rong Hsu;Taimoor Hassan;Pei-Chun Lin;Eric Odle;Junzo Watada\",\"doi\":\"10.13052/jwe1540-9589.2414\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The paper describes a novel a hybrid ensemble algorithm (HEA) that combines ensemble learning, class imbalance handling, and feature extraction. To address class imbalance in the dataset, the suggested approach integrates SMOTE oversampling and random under sampling (RU) feature extraction. To begin, Pearson correlation analysis is used to detect highly associated features in a dataset. This analysis aids in the selection of the most relevant features, which are either substantially related to the target variable or have a strong association with other features. The method seeks to improve classification performance by focusing on these correlated features. Following that, the SMOTE oversampling and RU algorithms are used to balance the majority and minority categorization characteristics. The SMOTE (synthetic minority oversampling technique) develops synthetic cases for the minority class by interpolating between existing instances, enhancing minority class representation. RU, on the other hand, removes instances from the majority class at random to obtain a balanced distribution. Furthermore, the random forest classifier (RFC) model's key features are input into an ensemble of decision tree (DT), k-nearest neighbor (KNN), adaptive boosting (AdaBoost), and convolutional neural network (CNN) approaches. This ensemble approach combines multiple models' predictions, exploiting their particular strengths and catching varied patterns in the data. Popular machine learning algorithms include DT, KNN, AdaBoost, and CNN, which are notable for their capacity to handle many types of data and capture complicated relationships. The evaluation findings show that the suggested HEA approach is effective, with a maximum precision, recall, F-score, and accuracy of 90%. The proposed methodology produces encouraging results, proving its applicability to a variety of categorization problems.\",\"PeriodicalId\":49952,\"journal\":{\"name\":\"Journal of Web Engineering\",\"volume\":\"24 1\",\"pages\":\"79-106\"},\"PeriodicalIF\":0.7000,\"publicationDate\":\"2025-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10924702\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Web Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10924702/\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Web Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10924702/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

摘要

提出了一种结合集成学习、类不平衡处理和特征提取的混合集成算法（HEA）。为了解决数据集中的类不平衡问题，该方法将SMOTE过采样和随机欠采样（RU）特征提取相结合。首先，Pearson相关分析用于检测数据集中高度相关的特征。这种分析有助于选择最相关的特征，这些特征要么与目标变量基本相关，要么与其他特征有很强的关联。该方法旨在通过关注这些相关特征来提高分类性能。然后，使用SMOTE过采样和RU算法来平衡多数和少数分类特征。SMOTE（合成少数派过采样技术）通过在现有实例之间进行插值来开发少数派类的合成案例，从而增强少数派类的代表性。另一方面，RU从大多数类中随机移除实例以获得平衡分布。此外，随机森林分类器（RFC）模型的关键特征被输入到决策树（DT）、k近邻（KNN）、自适应增强（AdaBoost）和卷积神经网络（CNN）方法的集合中。这种集成方法结合了多个模型的预测，利用它们的特殊优势，捕捉数据中的不同模式。流行的机器学习算法包括DT、KNN、AdaBoost和CNN，它们以处理多种类型的数据和捕获复杂关系的能力而闻名。评价结果表明，建议的HEA方法是有效的，最高精度、召回率、f分和准确率为90%。所提出的方法产生了令人鼓舞的结果，证明其适用于各种分类问题。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Hybrid Top Features Extraction Model for Detecting X Rumor Events Using an Ensemble Method

The paper describes a novel a hybrid ensemble algorithm (HEA) that combines ensemble learning, class imbalance handling, and feature extraction. To address class imbalance in the dataset, the suggested approach integrates SMOTE oversampling and random under sampling (RU) feature extraction. To begin, Pearson correlation analysis is used to detect highly associated features in a dataset. This analysis aids in the selection of the most relevant features, which are either substantially related to the target variable or have a strong association with other features. The method seeks to improve classification performance by focusing on these correlated features. Following that, the SMOTE oversampling and RU algorithms are used to balance the majority and minority categorization characteristics. The SMOTE (synthetic minority oversampling technique) develops synthetic cases for the minority class by interpolating between existing instances, enhancing minority class representation. RU, on the other hand, removes instances from the majority class at random to obtain a balanced distribution. Furthermore, the random forest classifier (RFC) model's key features are input into an ensemble of decision tree (DT), k-nearest neighbor (KNN), adaptive boosting (AdaBoost), and convolutional neural network (CNN) approaches. This ensemble approach combines multiple models' predictions, exploiting their particular strengths and catching varied patterns in the data. Popular machine learning algorithms include DT, KNN, AdaBoost, and CNN, which are notable for their capacity to handle many types of data and capture complicated relationships. The evaluation findings show that the suggested HEA approach is effective, with a maximum precision, recall, F-score, and accuracy of 90%. The proposed methodology produces encouraging results, proving its applicability to a variety of categorization problems.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Web Engineering 工程技术-计算机：理论方法

CiteScore

1.80

自引率

12.50%

发文量

审稿时长

9 months

期刊介绍： The World Wide Web and its associated technologies have become a major implementation and delivery platform for a large variety of applications, ranging from simple institutional information Web sites to sophisticated supply-chain management systems, financial applications, e-government, distance learning, and entertainment, among others. Such applications, in addition to their intrinsic functionality, also exhibit the more complex behavior of distributed applications.