Towards Better Evaluation of Topic Model Quality

2022 32nd Conference of Open Innovations Association (FRUCT) Pub Date : 2022-11-09 DOI:10.23919/FRUCT56874.2022.9953874

M. Khodorchenko, N. Butakov, D. Nasonov

{"title":"Towards Better Evaluation of Topic Model Quality","authors":"M. Khodorchenko, N. Butakov, D. Nasonov","doi":"10.23919/FRUCT56874.2022.9953874","DOIUrl":null,"url":null,"abstract":"Topic modelling is a popular unsupervised method for text corpora processing to obtain interpreted knowledge of the data. However, there is an automatic quality measurement gap between existing metrics, human evaluation and performance on the target tasks. That is a big challenge for automatic hyperparameter tuning methods as they heavily rely on the output signal to define the optimization direction. Currently, this process of evaluating the effectiveness of the topic model faces a number of difficulties and keeps being a labour-intensive routine performed manually due to the absence of a universal metric that may show strong correspondence with human assessment. The development of a quality metric that may satisfy this condition is essential to provide valuable feedback for the optimization algorithm when working with flexible and complex models, such as models based on additive regularisation or neural networks. To address the quality measurement gap, we performed an experimental study of existing scores on a specially created dataset containing topic models for several different text corpora in two languages accompanied by evaluated existing metrics and scores obtained from human assessment. The study results show how the situation with automatic quality estimation may be improved and pave the way to metrics learning with ensembles of machine learning algorithms.","PeriodicalId":274664,"journal":{"name":"2022 32nd Conference of Open Innovations Association (FRUCT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 32nd Conference of Open Innovations Association (FRUCT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/FRUCT56874.2022.9953874","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Topic modelling is a popular unsupervised method for text corpora processing to obtain interpreted knowledge of the data. However, there is an automatic quality measurement gap between existing metrics, human evaluation and performance on the target tasks. That is a big challenge for automatic hyperparameter tuning methods as they heavily rely on the output signal to define the optimization direction. Currently, this process of evaluating the effectiveness of the topic model faces a number of difficulties and keeps being a labour-intensive routine performed manually due to the absence of a universal metric that may show strong correspondence with human assessment. The development of a quality metric that may satisfy this condition is essential to provide valuable feedback for the optimization algorithm when working with flexible and complex models, such as models based on additive regularisation or neural networks. To address the quality measurement gap, we performed an experimental study of existing scores on a specially created dataset containing topic models for several different text corpora in two languages accompanied by evaluated existing metrics and scores obtained from human assessment. The study results show how the situation with automatic quality estimation may be improved and pave the way to metrics learning with ensembles of machine learning algorithms.

查看原文本刊更多论文

更好地评价主题模型质量

主题建模是一种流行的无监督文本语料库处理方法，用于获取数据的解释知识。然而，在现有度量、人工评估和目标任务的性能之间存在自动质量度量差距。这对于自动超参数调谐方法来说是一个很大的挑战，因为它们严重依赖于输出信号来定义优化方向。目前，评估主题模型有效性的过程面临许多困难，并且由于缺乏可能显示与人类评估强烈对应的通用度量，因此一直是人工执行的劳动密集型例行程序。当处理灵活和复杂的模型(如基于加性正则化或神经网络的模型)时，开发可能满足这一条件的质量度量对于为优化算法提供有价值的反馈至关重要。为了解决质量测量差距，我们在一个特别创建的数据集上对现有分数进行了实验研究，该数据集包含两种语言的几种不同文本语料库的主题模型，并附有评估的现有指标和从人类评估中获得的分数。研究结果显示了如何改善自动质量估计的情况，并为机器学习算法集成的度量学习铺平了道路。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 32nd Conference of Open Innovations Association (FRUCT)

自引率

0.00%

发文量