A hybrid classification method via keywords screening and attention mechanisms in extreme short text

IF 0.8 4区计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Intelligent Data Analysis Pub Date : 2023-08-10 DOI:10.3233/ida-220417

Xinke Zhou, Yi Zhu, Yun Li, Jipeng Qiang, Yunhao Yuan, Xingdong Wu, Runmei Zhang

{"title":"A hybrid classification method via keywords screening and attention mechanisms in extreme short text","authors":"Xinke Zhou, Yi Zhu, Yun Li, Jipeng Qiang, Yunhao Yuan, Xingdong Wu, Runmei Zhang","doi":"10.3233/ida-220417","DOIUrl":null,"url":null,"abstract":"Short text classification has provoked a vast amount of attention and research in recent decades. However, most existing methods only focus on the short texts that contain dozens of words like Twitter and Microblog, while pay far less attention to the extreme short texts like news headline and search snippets. Meanwhile, contemporary short text classification methods that extend the features via external knowledge sources always introduce lots of useless concepts, which may be detrimental to classification performance. Moreover, unlike traditional short text classification methods, the classification results of extreme short texts are often determined by a few even one or two keywords. To address these problems, we propose a novel hybrid classification method via Keywords Screening and Attention Mechanisms in extreme short text, called KSAM. More specifically, firstly, the attention-based BiLSTM is introduced in our method to enhance the role of keywords. Secondly, we screen the keywords in the extreme short text for obtaining the true class label, and the concepts concerning the keywords are retrieved from external open knowledge sources like DBpedia. Thirdly, the attention mechanisms are introduced to acquire the weight of these retrieved concepts. Finally, conceptual information is utilized to assist the classification of the extreme short text. Extensive experiments have demonstrated the effectiveness of our method compared to other state-of-the-art methods.","PeriodicalId":50355,"journal":{"name":"Intelligent Data Analysis","volume":" ","pages":""},"PeriodicalIF":0.8000,"publicationDate":"2023-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligent Data Analysis","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.3233/ida-220417","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Short text classification has provoked a vast amount of attention and research in recent decades. However, most existing methods only focus on the short texts that contain dozens of words like Twitter and Microblog, while pay far less attention to the extreme short texts like news headline and search snippets. Meanwhile, contemporary short text classification methods that extend the features via external knowledge sources always introduce lots of useless concepts, which may be detrimental to classification performance. Moreover, unlike traditional short text classification methods, the classification results of extreme short texts are often determined by a few even one or two keywords. To address these problems, we propose a novel hybrid classification method via Keywords Screening and Attention Mechanisms in extreme short text, called KSAM. More specifically, firstly, the attention-based BiLSTM is introduced in our method to enhance the role of keywords. Secondly, we screen the keywords in the extreme short text for obtaining the true class label, and the concepts concerning the keywords are retrieved from external open knowledge sources like DBpedia. Thirdly, the attention mechanisms are introduced to acquire the weight of these retrieved concepts. Finally, conceptual information is utilized to assist the classification of the extreme short text. Extensive experiments have demonstrated the effectiveness of our method compared to other state-of-the-art methods.

查看原文本刊更多论文

基于关键词筛选和注意机制的超短文本混合分类方法

近几十年来，短文本分类引起了人们的广泛关注和研究。然而，现有的大多数方法只关注Twitter和微博等包含数十个单词的短文本，而对新闻标题和搜索片段等超短文本的关注要少得多。同时，当代通过外部知识源扩展特征的短文本分类方法总是引入大量无用的概念，这可能不利于分类性能。此外，与传统的短文本分类方法不同，超短文本的分类结果往往由几个甚至一两个关键词决定。为了解决这些问题，我们提出了一种新的基于关键词筛选和注意机制的超短文本混合分类方法，称为KSAM。更具体地说，首先，在我们的方法中引入了基于注意力的BiLSTM来增强关键词的作用。其次，我们对超短文本中的关键词进行筛选，以获得真正的类标签，并从DBpedia等外部开放知识源中检索与关键词相关的概念。第三，引入注意机制来获取这些检索到的概念的权重。最后，利用概念信息对超短文本进行分类。大量实验证明，与其他最先进的方法相比，我们的方法是有效的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Intelligent Data Analysis 工程技术-计算机：人工智能

CiteScore

2.20

自引率

5.90%

发文量

审稿时长

3.3 months

期刊介绍： Intelligent Data Analysis provides a forum for the examination of issues related to the research and applications of Artificial Intelligence techniques in data analysis across a variety of disciplines. These techniques include (but are not limited to): all areas of data visualization, data pre-processing (fusion, editing, transformation, filtering, sampling), data engineering, database mining techniques, tools and applications, use of domain knowledge in data analysis, big data applications, evolutionary algorithms, machine learning, neural nets, fuzzy logic, statistical pattern recognition, knowledge filtering, and post-processing. In particular, papers are preferred that discuss development of new AI related data analysis architectures, methodologies, and techniques and their applications to various domains.