Jose M. Cuevas-Muñoz, Nicolás E. García-Pedrajas, Aida De Haro-García
{"title":"Sentiment analysis on review texts using category of words information and string kernels","authors":"Jose M. Cuevas-Muñoz, Nicolás E. García-Pedrajas, Aida De Haro-García","doi":"10.1007/s10489-026-07256-4","DOIUrl":null,"url":null,"abstract":"<div><p>With millions of opinions written every day around the internet, analyzing review sentiment has been shown to be an interesting and relevant problem. Support vector machines offer an excellent alternative when the amount of available data makes other models, such as deep learning, infeasible. A usual way to detect hidden sentiments in textual data is to address the mutual information through a corpus with a support vector machine or any other sophisticated classification algorithm. Approaches that are able to extract information from sequences of words, such as string kernels, have the potential for better performance. However, finding similarities can be difficult given the ample texts used to express opinions and the wide variety of vocabulary. To solve that problem, we suggest using clustering methods to automatically group words into categories based on a word vector, replacing the words in the dataset with their corresponding categories, and then using these categories to find mutual information in the text with support vector machines that use string kernels. This approach significantly reduces the token space and enhances the efficiency of the kernel methods. The proposed method showed better performance than state-of-the-art approaches for this task in a set of real-world problems. Different models were tested against our proposal. Results indicate that the proposed method has the ability to extract useful data from opinions in long texts and remains an interesting option for review sentiment analysis in general, even outperforming other state-of-the-art methods in certain datasets. It also opens the possibility of applying the same philosophy to deep learning and similar models.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"56 7","pages":""},"PeriodicalIF":3.5000,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10489-026-07256-4.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Intelligence","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10489-026-07256-4","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
With millions of opinions written every day around the internet, analyzing review sentiment has been shown to be an interesting and relevant problem. Support vector machines offer an excellent alternative when the amount of available data makes other models, such as deep learning, infeasible. A usual way to detect hidden sentiments in textual data is to address the mutual information through a corpus with a support vector machine or any other sophisticated classification algorithm. Approaches that are able to extract information from sequences of words, such as string kernels, have the potential for better performance. However, finding similarities can be difficult given the ample texts used to express opinions and the wide variety of vocabulary. To solve that problem, we suggest using clustering methods to automatically group words into categories based on a word vector, replacing the words in the dataset with their corresponding categories, and then using these categories to find mutual information in the text with support vector machines that use string kernels. This approach significantly reduces the token space and enhances the efficiency of the kernel methods. The proposed method showed better performance than state-of-the-art approaches for this task in a set of real-world problems. Different models were tested against our proposal. Results indicate that the proposed method has the ability to extract useful data from opinions in long texts and remains an interesting option for review sentiment analysis in general, even outperforming other state-of-the-art methods in certain datasets. It also opens the possibility of applying the same philosophy to deep learning and similar models.
期刊介绍:
With a focus on research in artificial intelligence and neural networks, this journal addresses issues involving solutions of real-life manufacturing, defense, management, government and industrial problems which are too complex to be solved through conventional approaches and require the simulation of intelligent thought processes, heuristics, applications of knowledge, and distributed and parallel processing. The integration of these multiple approaches in solving complex problems is of particular importance.
The journal presents new and original research and technological developments, addressing real and complex issues applicable to difficult problems. It provides a medium for exchanging scientific research and technological achievements accomplished by the international community.