Fan Yang , Tan Zhu , Jing Huang , Zhilin Huang , Guoqi Xie
{"title":"A novel graph kernel algorithm for improving the effect of text classification","authors":"Fan Yang , Tan Zhu , Jing Huang , Zhilin Huang , Guoqi Xie","doi":"10.1016/j.csl.2025.101818","DOIUrl":null,"url":null,"abstract":"<div><div>Text classification is an important topic in natural language processing. In recent years, both graph kernel methods and deep learning methods have been widely employed in text classification tasks. However, previous graph kernel algorithms focused too much on the graph structure itself, such as the shortest path subgraph,while focusing limited attention to the information of the text itself. Previous deep learning methods have often resulted in substantial utilization of computational resources. Therefore,we propose a new graph kernel algorithm to address the disadvantages. First,we extract the textual information of the document using the term weighting scheme. Second,we collect the structural information on the document graph. Third, graph kernel is used for similarity measurement for text classification.</div><div>We compared eight baseline methods on three experimental datasets, including traditional deep learning methods and graph-based classification methods, and tested our algorithm on multiple indicators. The experimental results demonstrate that our algorithm outperforms other baseline methods in terms of accuracy. Furthermore, it achieves a minimum reduction of 69% in memory consumption and a minimum decrease of 23% in runtime. Furthermore, as we decrease the percentage of training data, our algorithm continues to achieve superior results compared to other deep learning methods. The excellent experimental results show that our algorithm can improve the efficiency of text classification tasks and reduce the occupation of computer resources under the premise of ensuring high accuracy.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"95 ","pages":"Article 101818"},"PeriodicalIF":3.1000,"publicationDate":"2025-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Speech and Language","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0885230825000439","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Text classification is an important topic in natural language processing. In recent years, both graph kernel methods and deep learning methods have been widely employed in text classification tasks. However, previous graph kernel algorithms focused too much on the graph structure itself, such as the shortest path subgraph,while focusing limited attention to the information of the text itself. Previous deep learning methods have often resulted in substantial utilization of computational resources. Therefore,we propose a new graph kernel algorithm to address the disadvantages. First,we extract the textual information of the document using the term weighting scheme. Second,we collect the structural information on the document graph. Third, graph kernel is used for similarity measurement for text classification.
We compared eight baseline methods on three experimental datasets, including traditional deep learning methods and graph-based classification methods, and tested our algorithm on multiple indicators. The experimental results demonstrate that our algorithm outperforms other baseline methods in terms of accuracy. Furthermore, it achieves a minimum reduction of 69% in memory consumption and a minimum decrease of 23% in runtime. Furthermore, as we decrease the percentage of training data, our algorithm continues to achieve superior results compared to other deep learning methods. The excellent experimental results show that our algorithm can improve the efficiency of text classification tasks and reduce the occupation of computer resources under the premise of ensuring high accuracy.
期刊介绍:
Computer Speech & Language publishes reports of original research related to the recognition, understanding, production, coding and mining of speech and language.
The speech and language sciences have a long history, but it is only relatively recently that large-scale implementation of and experimentation with complex models of speech and language processing has become feasible. Such research is often carried out somewhat separately by practitioners of artificial intelligence, computer science, electronic engineering, information retrieval, linguistics, phonetics, or psychology.