一种新的提高文本分类效果的图核算法

IF 3.4 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computer Speech and Language Pub Date : 2025-05-16 DOI:10.1016/j.csl.2025.101818

Fan Yang , Tan Zhu , Jing Huang , Zhilin Huang , Guoqi Xie

{"title":"一种新的提高文本分类效果的图核算法","authors":"Fan Yang , Tan Zhu , Jing Huang , Zhilin Huang , Guoqi Xie","doi":"10.1016/j.csl.2025.101818","DOIUrl":null,"url":null,"abstract":"<div><div>Text classification is an important topic in natural language processing. In recent years, both graph kernel methods and deep learning methods have been widely employed in text classification tasks. However, previous graph kernel algorithms focused too much on the graph structure itself, such as the shortest path subgraph,while focusing limited attention to the information of the text itself. Previous deep learning methods have often resulted in substantial utilization of computational resources. Therefore,we propose a new graph kernel algorithm to address the disadvantages. First,we extract the textual information of the document using the term weighting scheme. Second,we collect the structural information on the document graph. Third, graph kernel is used for similarity measurement for text classification.</div><div>We compared eight baseline methods on three experimental datasets, including traditional deep learning methods and graph-based classification methods, and tested our algorithm on multiple indicators. The experimental results demonstrate that our algorithm outperforms other baseline methods in terms of accuracy. Furthermore, it achieves a minimum reduction of 69% in memory consumption and a minimum decrease of 23% in runtime. Furthermore, as we decrease the percentage of training data, our algorithm continues to achieve superior results compared to other deep learning methods. The excellent experimental results show that our algorithm can improve the efficiency of text classification tasks and reduce the occupation of computer resources under the premise of ensuring high accuracy.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"95 ","pages":"Article 101818"},"PeriodicalIF":3.4000,"publicationDate":"2025-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A novel graph kernel algorithm for improving the effect of text classification\",\"authors\":\"Fan Yang , Tan Zhu , Jing Huang , Zhilin Huang , Guoqi Xie\",\"doi\":\"10.1016/j.csl.2025.101818\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Text classification is an important topic in natural language processing. In recent years, both graph kernel methods and deep learning methods have been widely employed in text classification tasks. However, previous graph kernel algorithms focused too much on the graph structure itself, such as the shortest path subgraph,while focusing limited attention to the information of the text itself. Previous deep learning methods have often resulted in substantial utilization of computational resources. Therefore,we propose a new graph kernel algorithm to address the disadvantages. First,we extract the textual information of the document using the term weighting scheme. Second,we collect the structural information on the document graph. Third, graph kernel is used for similarity measurement for text classification.</div><div>We compared eight baseline methods on three experimental datasets, including traditional deep learning methods and graph-based classification methods, and tested our algorithm on multiple indicators. The experimental results demonstrate that our algorithm outperforms other baseline methods in terms of accuracy. Furthermore, it achieves a minimum reduction of 69% in memory consumption and a minimum decrease of 23% in runtime. Furthermore, as we decrease the percentage of training data, our algorithm continues to achieve superior results compared to other deep learning methods. The excellent experimental results show that our algorithm can improve the efficiency of text classification tasks and reduce the occupation of computer resources under the premise of ensuring high accuracy.</div></div>\",\"PeriodicalId\":50638,\"journal\":{\"name\":\"Computer Speech and Language\",\"volume\":\"95 \",\"pages\":\"Article 101818\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2025-05-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Speech and Language\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0885230825000439\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Speech and Language","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0885230825000439","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

文本分类是自然语言处理中的一个重要课题。近年来，图核方法和深度学习方法在文本分类任务中得到了广泛的应用。然而，以前的图核算法过于关注图结构本身，如最短路径子图，而对文本本身的信息关注有限。以前的深度学习方法往往导致大量的计算资源的利用。因此，我们提出了一种新的图核算法来解决这些缺点。首先，我们使用术语加权方案提取文档的文本信息。其次，我们收集文档图的结构信息。第三，利用图核进行文本分类的相似度度量。我们在三个实验数据集上比较了八种基线方法，包括传统的深度学习方法和基于图的分类方法，并在多个指标上测试了我们的算法。实验结果表明，我们的算法在准确率方面优于其他基线方法。此外，它实现了内存消耗最少减少69%，运行时最少减少23%。此外，随着我们减少训练数据的百分比，与其他深度学习方法相比，我们的算法继续取得更好的结果。优秀的实验结果表明，我们的算法可以在保证高准确率的前提下，提高文本分类任务的效率，减少对计算机资源的占用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A novel graph kernel algorithm for improving the effect of text classification

Text classification is an important topic in natural language processing. In recent years, both graph kernel methods and deep learning methods have been widely employed in text classification tasks. However, previous graph kernel algorithms focused too much on the graph structure itself, such as the shortest path subgraph,while focusing limited attention to the information of the text itself. Previous deep learning methods have often resulted in substantial utilization of computational resources. Therefore,we propose a new graph kernel algorithm to address the disadvantages. First,we extract the textual information of the document using the term weighting scheme. Second,we collect the structural information on the document graph. Third, graph kernel is used for similarity measurement for text classification.

We compared eight baseline methods on three experimental datasets, including traditional deep learning methods and graph-based classification methods, and tested our algorithm on multiple indicators. The experimental results demonstrate that our algorithm outperforms other baseline methods in terms of accuracy. Furthermore, it achieves a minimum reduction of 69% in memory consumption and a minimum decrease of 23% in runtime. Furthermore, as we decrease the percentage of training data, our algorithm continues to achieve superior results compared to other deep learning methods. The excellent experimental results show that our algorithm can improve the efficiency of text classification tasks and reduce the occupation of computer resources under the premise of ensuring high accuracy.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computer Speech and Language 工程技术-计算机：人工智能

CiteScore

11.30

自引率

4.70%

发文量

审稿时长

22.9 weeks

期刊介绍： Computer Speech & Language publishes reports of original research related to the recognition, understanding, production, coding and mining of speech and language. The speech and language sciences have a long history, but it is only relatively recently that large-scale implementation of and experimentation with complex models of speech and language processing has become feasible. Such research is often carried out somewhat separately by practitioners of artificial intelligence, computer science, electronic engineering, information retrieval, linguistics, phonetics, or psychology.