Selecting the Number and Labels of Topics in Topic Modeling: A Tutorial

IF 15.6 1区 心理学 Q1 PSYCHOLOGY
S. Weston, Ian Shryock, Ryan Light, Phillip A. Fisher
{"title":"Selecting the Number and Labels of Topics in Topic Modeling: A Tutorial","authors":"S. Weston, Ian Shryock, Ryan Light, Phillip A. Fisher","doi":"10.1177/25152459231160105","DOIUrl":null,"url":null,"abstract":"Topic modeling is a type of text analysis that identifies clusters of co-occurring words, or latent topics. A challenging step of topic modeling is determining the number of topics to extract. This tutorial describes tools researchers can use to identify the number and labels of topics in topic modeling. First, we outline the procedure for narrowing down a large range of models to a select number of candidate models. This procedure involves comparing the large set on fit metrics, including exclusivity, residuals, variational lower bound, and semantic coherence. Next, we describe the comparison of a small number of models using project goals as a guide and information about topic representative and solution congruence. Finally, we describe tools for labeling topics, including frequent and exclusive words, key examples, and correlations among topics.","PeriodicalId":55645,"journal":{"name":"Advances in Methods and Practices in Psychological Science","volume":" ","pages":""},"PeriodicalIF":15.6000,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in Methods and Practices in Psychological Science","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1177/25152459231160105","RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHOLOGY","Score":null,"Total":0}
引用次数: 3

Abstract

Topic modeling is a type of text analysis that identifies clusters of co-occurring words, or latent topics. A challenging step of topic modeling is determining the number of topics to extract. This tutorial describes tools researchers can use to identify the number and labels of topics in topic modeling. First, we outline the procedure for narrowing down a large range of models to a select number of candidate models. This procedure involves comparing the large set on fit metrics, including exclusivity, residuals, variational lower bound, and semantic coherence. Next, we describe the comparison of a small number of models using project goals as a guide and information about topic representative and solution congruence. Finally, we describe tools for labeling topics, including frequent and exclusive words, key examples, and correlations among topics.
在主题建模中选择主题的编号和标签:教程
主题建模是一种文本分析,用于识别共现单词或潜在主题的聚类。主题建模的一个具有挑战性的步骤是确定要提取的主题的数量。本教程介绍了研究人员可以用来识别主题建模中主题的数量和标签的工具。首先,我们概述了将大量模型缩小到选定数量的候选模型的过程。该过程涉及比较拟合度量的大集合,包括排他性、残差、变分下界和语义一致性。接下来,我们描述了以项目目标为指导的少数模型的比较,以及关于主题代表性和解决方案一致性的信息。最后,我们描述了标记主题的工具,包括常用词和专有词、关键示例以及主题之间的相关性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
21.20
自引率
0.70%
发文量
16
期刊介绍: In 2021, Advances in Methods and Practices in Psychological Science will undergo a transition to become an open access journal. This journal focuses on publishing innovative developments in research methods, practices, and conduct within the field of psychological science. It embraces a wide range of areas and topics and encourages the integration of methodological and analytical questions. The aim of AMPPS is to bring the latest methodological advances to researchers from various disciplines, even those who are not methodological experts. Therefore, the journal seeks submissions that are accessible to readers with different research interests and that represent the diverse research trends within the field of psychological science. The types of content that AMPPS welcomes include articles that communicate advancements in methods, practices, and metascience, as well as empirical scientific best practices. Additionally, tutorials, commentaries, and simulation studies on new techniques and research tools are encouraged. The journal also aims to publish papers that bring advances from specialized subfields to a broader audience. Lastly, AMPPS accepts Registered Replication Reports, which focus on replicating important findings from previously published studies. Overall, the transition of Advances in Methods and Practices in Psychological Science to an open access journal aims to increase accessibility and promote the dissemination of new developments in research methods and practices within the field of psychological science.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信