Ala’ M. Al-Zoubi , Antonio M. Mora , Hossam Faris , Raneem Qaddoura
{"title":"基于情感特征和预训练嵌入的多语言垃圾邮件审查检测混合TwinSVM-HHO模型","authors":"Ala’ M. Al-Zoubi , Antonio M. Mora , Hossam Faris , Raneem Qaddoura","doi":"10.1016/j.eswa.2025.128160","DOIUrl":null,"url":null,"abstract":"<div><div>The detection of spam reviews in multilingual environments remains a challenging task due to linguistic diversity, data imbalance, and semantic complexity. This paper proposes a novel hybrid model that integrates Twin Support Vector Machine (TwinSVM) with Harris Hawks Optimization (HHO) for simultaneous parameter optimization and feature selection. To enhance semantic understanding, sentiment-based features are incorporated alongside pre-trained word embedding models—BERT, FastText, and MUSE—across English, Arabic, and Spanish datasets. Our approach generates 24 high-quality datasets using embeddings with 100 and 400 dimensions, including a combined multilingual set. Experimental results demonstrate that our proposed HHO-TwinSVM model consistently outperforms conventional classifiers and metaheuristic-enhanced SVMs, achieving accuracy improvements of up to 9.44 % and enhanced robustness in low-resource languages. This integrated framework represents a scalable and adaptable solution for multilingual spam detection. Four detailed experiments were conducted in this study, each designed to address and demonstrate a specific aspect of the proposed approach. Across all experiments, the method outperformed existing algorithms, achieving impressive accuracy rates of 92.9741 %, 89.0314 %, 80.3580 %, and 85.0859 % on Arabic, English, Spanish, and multilingual datasets, respectively. Subsequently, sentiment analysis features were incorporated to further enhance detection performance, resulting in improvements of 1.0994 %, 2.6674 %, 9.4430 %, and 8.7448 %, respectively. A comprehensive analysis of the experimental results, including the influence of reviews and sentiment features, is also presented.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"287 ","pages":"Article 128160"},"PeriodicalIF":7.5000,"publicationDate":"2025-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A hybrid TwinSVM-HHO model for multilingual spam review detection using sentiment features and pre-trained embeddings\",\"authors\":\"Ala’ M. Al-Zoubi , Antonio M. Mora , Hossam Faris , Raneem Qaddoura\",\"doi\":\"10.1016/j.eswa.2025.128160\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The detection of spam reviews in multilingual environments remains a challenging task due to linguistic diversity, data imbalance, and semantic complexity. This paper proposes a novel hybrid model that integrates Twin Support Vector Machine (TwinSVM) with Harris Hawks Optimization (HHO) for simultaneous parameter optimization and feature selection. To enhance semantic understanding, sentiment-based features are incorporated alongside pre-trained word embedding models—BERT, FastText, and MUSE—across English, Arabic, and Spanish datasets. Our approach generates 24 high-quality datasets using embeddings with 100 and 400 dimensions, including a combined multilingual set. Experimental results demonstrate that our proposed HHO-TwinSVM model consistently outperforms conventional classifiers and metaheuristic-enhanced SVMs, achieving accuracy improvements of up to 9.44 % and enhanced robustness in low-resource languages. This integrated framework represents a scalable and adaptable solution for multilingual spam detection. Four detailed experiments were conducted in this study, each designed to address and demonstrate a specific aspect of the proposed approach. Across all experiments, the method outperformed existing algorithms, achieving impressive accuracy rates of 92.9741 %, 89.0314 %, 80.3580 %, and 85.0859 % on Arabic, English, Spanish, and multilingual datasets, respectively. Subsequently, sentiment analysis features were incorporated to further enhance detection performance, resulting in improvements of 1.0994 %, 2.6674 %, 9.4430 %, and 8.7448 %, respectively. A comprehensive analysis of the experimental results, including the influence of reviews and sentiment features, is also presented.</div></div>\",\"PeriodicalId\":50461,\"journal\":{\"name\":\"Expert Systems with Applications\",\"volume\":\"287 \",\"pages\":\"Article 128160\"},\"PeriodicalIF\":7.5000,\"publicationDate\":\"2025-05-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Expert Systems with Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0957417425017804\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417425017804","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
A hybrid TwinSVM-HHO model for multilingual spam review detection using sentiment features and pre-trained embeddings
The detection of spam reviews in multilingual environments remains a challenging task due to linguistic diversity, data imbalance, and semantic complexity. This paper proposes a novel hybrid model that integrates Twin Support Vector Machine (TwinSVM) with Harris Hawks Optimization (HHO) for simultaneous parameter optimization and feature selection. To enhance semantic understanding, sentiment-based features are incorporated alongside pre-trained word embedding models—BERT, FastText, and MUSE—across English, Arabic, and Spanish datasets. Our approach generates 24 high-quality datasets using embeddings with 100 and 400 dimensions, including a combined multilingual set. Experimental results demonstrate that our proposed HHO-TwinSVM model consistently outperforms conventional classifiers and metaheuristic-enhanced SVMs, achieving accuracy improvements of up to 9.44 % and enhanced robustness in low-resource languages. This integrated framework represents a scalable and adaptable solution for multilingual spam detection. Four detailed experiments were conducted in this study, each designed to address and demonstrate a specific aspect of the proposed approach. Across all experiments, the method outperformed existing algorithms, achieving impressive accuracy rates of 92.9741 %, 89.0314 %, 80.3580 %, and 85.0859 % on Arabic, English, Spanish, and multilingual datasets, respectively. Subsequently, sentiment analysis features were incorporated to further enhance detection performance, resulting in improvements of 1.0994 %, 2.6674 %, 9.4430 %, and 8.7448 %, respectively. A comprehensive analysis of the experimental results, including the influence of reviews and sentiment features, is also presented.
期刊介绍:
Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.