{"title":"Topic-Weighted Kernels: Text Kernels Integrating Topic Weights and Deep Word Embeddings for Semantic Text Analytics","authors":"Nikhil V. Chandran;V. S. Anoop;S. Asharaf","doi":"10.1109/ACCESS.2025.3565816","DOIUrl":null,"url":null,"abstract":"Traditional text classification models, such as text kernels, primarily consider the syntactic aspects of text data. This paper introduces Topic-Weighted Kernels, a new text analytics framework that combines global topical themes with word-level semantics in a text kernel architecture. Three new text kernels are proposed to improve text analysis - (a) the Topic-Weighted Base Kernel, (b) the Topic-Weighted Word2Vec kernel, and (c) the Topic-Weighted BERT (Bidirectional Encoder Representations from Transformers) kernel. These kernels leverage topic modeling and deep word embeddings to capture thematic and semantic information within textual data. Text kernels consider global and local semantics for text analysis tasks and improve model performance. Experiments on diverse datasets demonstrate that Topic-Weighted Kernels outperforms existing methods for text analysis tasks. The Topic-Weighted BERT Kernel achieves top-tier performance, with F1 scores reaching 99% on lighter datasets and significantly boosting performance on more complex datasets. For the tasks of multi-label text classification on the Reuters-90 dataset and sentiment analysis on the IMDB dataset, the model achieves F1 scores of 90.76% and 96.66%, respectively, demonstrating state-of-the-art performance. The Topic-Weighted Kernel approach improves the performance while enabling a better contextual representation for various text analysis tasks such as single and multi-label classification and sentiment analysis. The proposed framework integrates semantics from word embeddings and topic models to text kernels for capturing intricate patterns in textual data that aid in more contextual text analytics.","PeriodicalId":13079,"journal":{"name":"IEEE Access","volume":"13 ","pages":"77918-77930"},"PeriodicalIF":3.4000,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10980292","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Access","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10980292/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Traditional text classification models, such as text kernels, primarily consider the syntactic aspects of text data. This paper introduces Topic-Weighted Kernels, a new text analytics framework that combines global topical themes with word-level semantics in a text kernel architecture. Three new text kernels are proposed to improve text analysis - (a) the Topic-Weighted Base Kernel, (b) the Topic-Weighted Word2Vec kernel, and (c) the Topic-Weighted BERT (Bidirectional Encoder Representations from Transformers) kernel. These kernels leverage topic modeling and deep word embeddings to capture thematic and semantic information within textual data. Text kernels consider global and local semantics for text analysis tasks and improve model performance. Experiments on diverse datasets demonstrate that Topic-Weighted Kernels outperforms existing methods for text analysis tasks. The Topic-Weighted BERT Kernel achieves top-tier performance, with F1 scores reaching 99% on lighter datasets and significantly boosting performance on more complex datasets. For the tasks of multi-label text classification on the Reuters-90 dataset and sentiment analysis on the IMDB dataset, the model achieves F1 scores of 90.76% and 96.66%, respectively, demonstrating state-of-the-art performance. The Topic-Weighted Kernel approach improves the performance while enabling a better contextual representation for various text analysis tasks such as single and multi-label classification and sentiment analysis. The proposed framework integrates semantics from word embeddings and topic models to text kernels for capturing intricate patterns in textual data that aid in more contextual text analytics.
IEEE AccessCOMPUTER SCIENCE, INFORMATION SYSTEMSENGIN-ENGINEERING, ELECTRICAL & ELECTRONIC
CiteScore
9.80
自引率
7.70%
发文量
6673
审稿时长
6 weeks
期刊介绍:
IEEE Access® is a multidisciplinary, open access (OA), applications-oriented, all-electronic archival journal that continuously presents the results of original research or development across all of IEEE''s fields of interest.
IEEE Access will publish articles that are of high interest to readers, original, technically correct, and clearly presented. Supported by author publication charges (APC), its hallmarks are a rapid peer review and publication process with open access to all readers. Unlike IEEE''s traditional Transactions or Journals, reviews are "binary", in that reviewers will either Accept or Reject an article in the form it is submitted in order to achieve rapid turnaround. Especially encouraged are submissions on:
Multidisciplinary topics, or applications-oriented articles and negative results that do not fit within the scope of IEEE''s traditional journals.
Practical articles discussing new experiments or measurement techniques, interesting solutions to engineering.
Development of new or improved fabrication or manufacturing techniques.
Reviews or survey articles of new or evolving fields oriented to assist others in understanding the new area.