将无监督语义和语法深度集成到异构图中，用于归纳文本分类

IF 5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Complex & Intelligent Systems Pub Date : 2023-09-28 DOI:10.1007/s40747-023-01228-8

Yue Gao, Xiangling Fu, Xien Liu, Ji Wu

{"title":"将无监督语义和语法深度集成到异构图中，用于归纳文本分类","authors":"Yue Gao, Xiangling Fu, Xien Liu, Ji Wu","doi":"10.1007/s40747-023-01228-8","DOIUrl":null,"url":null,"abstract":"Abstract Graph-based neural networks and unsupervised pre-trained models are both cutting-edge text representation methods, given their outstanding ability to capture global information and contextualized information, respectively. However, both representation methods meet obstacles to further performance improvements. On one hand, graph-based neural networks lack knowledge orientation to guide textual interpretation during global information interaction. On the other hand, unsupervised pre-trained models imply rich semantic and syntactic knowledge which lacks sufficient induction and expression. Therefore, how to effectively integrate graph-based global information and unsupervised contextualized semantic and syntactic information to achieve better text representation is an important issue pending for solution. In this paper, we propose a representation method that deeply integrates Unsupervised Semantics and Syntax into heterogeneous Graphs (USS-Graph) for inductive text classification. By constructing a heterogeneous graph whose edges and nodes are totally generated by knowledge from unsupervised pre-trained models, USS-Graph can harmonize the two perspectives of information under a bidirectionally weighted graph structure and thereby realizing the intra-fusion of graph-based global information and unsupervised contextualized semantic and syntactic information. Based on USS-Graph, we also propose a series of optimization measures to further improve the knowledge integration and representation performance. Extensive experiments conducted on benchmark datasets show that USS-Graph consistently achieves state-of-the-art performances on inductive text classification tasks. Additionally, extended experiments are conducted to deeply analyze the characteristics of USS-Graph and the effectiveness of our proposed optimization measures for further knowledge integration and information complementation.","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"31 1","pages":"0"},"PeriodicalIF":5.0000,"publicationDate":"2023-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deeply integrating unsupervised semantics and syntax into heterogeneous graphs for inductive text classification\",\"authors\":\"Yue Gao, Xiangling Fu, Xien Liu, Ji Wu\",\"doi\":\"10.1007/s40747-023-01228-8\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract Graph-based neural networks and unsupervised pre-trained models are both cutting-edge text representation methods, given their outstanding ability to capture global information and contextualized information, respectively. However, both representation methods meet obstacles to further performance improvements. On one hand, graph-based neural networks lack knowledge orientation to guide textual interpretation during global information interaction. On the other hand, unsupervised pre-trained models imply rich semantic and syntactic knowledge which lacks sufficient induction and expression. Therefore, how to effectively integrate graph-based global information and unsupervised contextualized semantic and syntactic information to achieve better text representation is an important issue pending for solution. In this paper, we propose a representation method that deeply integrates Unsupervised Semantics and Syntax into heterogeneous Graphs (USS-Graph) for inductive text classification. By constructing a heterogeneous graph whose edges and nodes are totally generated by knowledge from unsupervised pre-trained models, USS-Graph can harmonize the two perspectives of information under a bidirectionally weighted graph structure and thereby realizing the intra-fusion of graph-based global information and unsupervised contextualized semantic and syntactic information. Based on USS-Graph, we also propose a series of optimization measures to further improve the knowledge integration and representation performance. Extensive experiments conducted on benchmark datasets show that USS-Graph consistently achieves state-of-the-art performances on inductive text classification tasks. Additionally, extended experiments are conducted to deeply analyze the characteristics of USS-Graph and the effectiveness of our proposed optimization measures for further knowledge integration and information complementation.\",\"PeriodicalId\":10524,\"journal\":{\"name\":\"Complex & Intelligent Systems\",\"volume\":\"31 1\",\"pages\":\"0\"},\"PeriodicalIF\":5.0000,\"publicationDate\":\"2023-09-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Complex & Intelligent Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s40747-023-01228-8\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Complex & Intelligent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s40747-023-01228-8","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

基于图的神经网络和无监督预训练模型都是前沿的文本表示方法，它们分别具有捕获全局信息和上下文化信息的出色能力。然而，这两种表示方法在进一步提高性能方面都遇到了障碍。一方面，基于图的神经网络在全局信息交互过程中缺乏知识导向来指导文本解释。另一方面，无监督预训练模型隐含着丰富的语义和句法知识，但缺乏足够的归纳和表达。因此，如何有效地将基于图的全局信息与无监督的上下文化语义和句法信息相结合，实现更好的文本表示是一个亟待解决的重要问题。在本文中，我们提出了一种将无监督语义和语法深度集成到异构图(USS-Graph)中的表示方法，用于归纳文本分类。us - graph通过构建一个边缘和节点完全由无监督预训练模型的知识生成的异构图，在双向加权图结构下协调信息的两个视角，从而实现基于图的全局信息与无监督的上下文化语义和句法信息的内融合。在USS-Graph的基础上，提出了一系列优化措施，进一步提高知识集成和表示性能。在基准数据集上进行的大量实验表明，USS-Graph在归纳文本分类任务上始终达到最先进的性能。此外，我们还进行了扩展实验，深入分析了USS-Graph的特点以及我们提出的优化措施的有效性，以进一步进行知识整合和信息互补。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Deeply integrating unsupervised semantics and syntax into heterogeneous graphs for inductive text classification

查看原文本刊更多论文

Deeply integrating unsupervised semantics and syntax into heterogeneous graphs for inductive text classification

Abstract Graph-based neural networks and unsupervised pre-trained models are both cutting-edge text representation methods, given their outstanding ability to capture global information and contextualized information, respectively. However, both representation methods meet obstacles to further performance improvements. On one hand, graph-based neural networks lack knowledge orientation to guide textual interpretation during global information interaction. On the other hand, unsupervised pre-trained models imply rich semantic and syntactic knowledge which lacks sufficient induction and expression. Therefore, how to effectively integrate graph-based global information and unsupervised contextualized semantic and syntactic information to achieve better text representation is an important issue pending for solution. In this paper, we propose a representation method that deeply integrates Unsupervised Semantics and Syntax into heterogeneous Graphs (USS-Graph) for inductive text classification. By constructing a heterogeneous graph whose edges and nodes are totally generated by knowledge from unsupervised pre-trained models, USS-Graph can harmonize the two perspectives of information under a bidirectionally weighted graph structure and thereby realizing the intra-fusion of graph-based global information and unsupervised contextualized semantic and syntactic information. Based on USS-Graph, we also propose a series of optimization measures to further improve the knowledge integration and representation performance. Extensive experiments conducted on benchmark datasets show that USS-Graph consistently achieves state-of-the-art performances on inductive text classification tasks. Additionally, extended experiments are conducted to deeply analyze the characteristics of USS-Graph and the effectiveness of our proposed optimization measures for further knowledge integration and information complementation.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Complex & Intelligent Systems COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-

CiteScore

9.60

自引率

10.30%

发文量

297

期刊介绍： Complex & Intelligent Systems aims to provide a forum for presenting and discussing novel approaches, tools and techniques meant for attaining a cross-fertilization between the broad fields of complex systems, computational simulation, and intelligent analytics and visualization. The transdisciplinary research that the journal focuses on will expand the boundaries of our understanding by investigating the principles and processes that underlie many of the most profound problems facing society today.