A Survey of Multi-Label Topic Models

SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining Pub Date : 2019-11-26 DOI:10.1145/3373464.3373474

Sophie Burkhardt, S. Kramer

引用次数: 10

Abstract

Every day, an enormous amount of text data is produced. Sources of text data include news, social media, emails, text messages, medical reports, scientific publications and fiction. To keep track of this data, there are categories, key words, tags or labels that are assigned to each text. Automatically predicting such labels is the task of multi-label text classification. Often however, we are interested in more than just the pure classification: rather, we would like to understand which parts of a text belong to the label, which words are important for the label or which labels occur together. Because of this, topic models may be used for multi-label classification as an interpretable model that is flexible and easily extensible. This survey demonstrates the manifold possibilities and flexibility of the topic model framework for the complex setting of multi-label text classification by categorizing different variants of models.

查看原文本刊更多论文

多标签主题模型综述

每天都会产生大量的文本数据。文本数据的来源包括新闻、社交媒体、电子邮件、短信、医疗报告、科学出版物和小说。为了跟踪这些数据，为每个文本分配了类别、关键词、标签或标签。自动预测这些标签是多标签文本分类的任务。然而，通常我们感兴趣的不仅仅是纯粹的分类:相反，我们想要了解文本的哪些部分属于标签，哪些单词对标签很重要，或者哪些标签一起出现。因此，主题模型可以作为灵活且易于扩展的可解释模型用于多标签分类。本研究通过对模型的不同变体进行分类，展示了主题模型框架在复杂的多标签文本分类设置中的多种可能性和灵活性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining

自引率

0.00%

发文量