从文本数据中提取可操作的见解:一种稳定的主题模型方法

IF 7 2区 管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS
Yi Yang and Ramanath Subramanyam
{"title":"从文本数据中提取可操作的见解:一种稳定的主题模型方法","authors":"Yi Yang and Ramanath Subramanyam","doi":"10.25300/misq/2022/16957","DOIUrl":null,"url":null,"abstract":"<style>#html-body [data-pb-style=HT8IJA3]{justify-content:flex-start;display:flex;flex-direction:column;background-position:left top;background-size:cover;background-repeat:no-repeat;background-attachment:scroll}</style>Topic models are becoming a frequently employed tool in the empirical methods repertoire of information systems and management scholars. Given textual corpora, such as consumer reviews and online discussion forums, researchers and business practitioners often use topic modeling to either explore data in an unsupervised fashion or generate variables of interest for subsequent econometric analysis. However, one important concern stems from the fact that topic models can be notorious for their instability, i.e., the generated results could be inconsistent and irreproducible at different times, even on the same dataset. Therefore, researchers might arrive at potentially unreliable results regarding the theoretical relationships that they are testing or developing. In this paper, we attempt to highlight this problem and suggest a potential approach to addressing it. First, we empirically define and evaluate the stability problem of topic models using four textual datasets. Next, to alleviate the problem and with the goal of extracting actionable insights from textual data, we propose a new method, Stable LDA, which incorporates topical word clusters into the topic model to steer the model inference toward consistent results. We show that the proposed Stable LDA approach can significantly improve model stability while maintaining or even improving the topic model quality. Further, employing two case studies related to an online knowledge community and online consumer reviews, we demonstrate that the variables generated from Stable LDA can lead to more consistent estimations in econometric analyses. We believe that our work can further enhance management scholars’ collective toolkit to analyze ever-growing textual data.","PeriodicalId":49807,"journal":{"name":"Mis Quarterly","volume":"19 2","pages":""},"PeriodicalIF":7.0000,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Extracting Actionable Insights from Text Data: A Stable Topic Model Approach\",\"authors\":\"Yi Yang and Ramanath Subramanyam\",\"doi\":\"10.25300/misq/2022/16957\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<style>#html-body [data-pb-style=HT8IJA3]{justify-content:flex-start;display:flex;flex-direction:column;background-position:left top;background-size:cover;background-repeat:no-repeat;background-attachment:scroll}</style>Topic models are becoming a frequently employed tool in the empirical methods repertoire of information systems and management scholars. Given textual corpora, such as consumer reviews and online discussion forums, researchers and business practitioners often use topic modeling to either explore data in an unsupervised fashion or generate variables of interest for subsequent econometric analysis. However, one important concern stems from the fact that topic models can be notorious for their instability, i.e., the generated results could be inconsistent and irreproducible at different times, even on the same dataset. Therefore, researchers might arrive at potentially unreliable results regarding the theoretical relationships that they are testing or developing. In this paper, we attempt to highlight this problem and suggest a potential approach to addressing it. First, we empirically define and evaluate the stability problem of topic models using four textual datasets. Next, to alleviate the problem and with the goal of extracting actionable insights from textual data, we propose a new method, Stable LDA, which incorporates topical word clusters into the topic model to steer the model inference toward consistent results. We show that the proposed Stable LDA approach can significantly improve model stability while maintaining or even improving the topic model quality. Further, employing two case studies related to an online knowledge community and online consumer reviews, we demonstrate that the variables generated from Stable LDA can lead to more consistent estimations in econometric analyses. We believe that our work can further enhance management scholars’ collective toolkit to analyze ever-growing textual data.\",\"PeriodicalId\":49807,\"journal\":{\"name\":\"Mis Quarterly\",\"volume\":\"19 2\",\"pages\":\"\"},\"PeriodicalIF\":7.0000,\"publicationDate\":\"2023-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Mis Quarterly\",\"FirstCategoryId\":\"91\",\"ListUrlMain\":\"https://doi.org/10.25300/misq/2022/16957\",\"RegionNum\":2,\"RegionCategory\":\"管理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mis Quarterly","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.25300/misq/2022/16957","RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

#html-body [data- pbstyle =HT8IJA3]{justify-content:flex-start;display:flex;flex-direction:column;background-position:left top;background-size:cover;background-repeat: not -repeat;给定文本语料库,例如消费者评论和在线讨论论坛,研究人员和业务实践者经常使用主题建模以无监督的方式探索数据,或者为随后的计量经济分析生成感兴趣的变量。然而,一个重要的问题源于这样一个事实,即主题模型可能因其不稳定性而臭名昭著,即生成的结果可能在不同时间不一致且不可复制,即使在相同的数据集上也是如此。因此,对于他们正在测试或发展的理论关系,研究人员可能会得出潜在不可靠的结果。在本文中,我们试图强调这一问题,并提出解决这一问题的潜在方法。首先,我们使用四个文本数据集对主题模型的稳定性问题进行了实证定义和评估。接下来,为了缓解这一问题,并以从文本数据中提取可操作的见解为目标,我们提出了一种新的方法——稳定LDA,它将主题词聚类纳入主题模型,以引导模型推理朝着一致的结果发展。我们证明了所提出的稳定LDA方法可以显著提高模型的稳定性,同时保持甚至提高主题模型的质量。此外,采用两个与在线知识社区和在线消费者评论相关的案例研究,我们证明了稳定LDA产生的变量可以在计量经济学分析中导致更一致的估计。我们相信,我们的工作可以进一步增强管理学者的集体工具包,以分析不断增长的文本数据。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Extracting Actionable Insights from Text Data: A Stable Topic Model Approach
Topic models are becoming a frequently employed tool in the empirical methods repertoire of information systems and management scholars. Given textual corpora, such as consumer reviews and online discussion forums, researchers and business practitioners often use topic modeling to either explore data in an unsupervised fashion or generate variables of interest for subsequent econometric analysis. However, one important concern stems from the fact that topic models can be notorious for their instability, i.e., the generated results could be inconsistent and irreproducible at different times, even on the same dataset. Therefore, researchers might arrive at potentially unreliable results regarding the theoretical relationships that they are testing or developing. In this paper, we attempt to highlight this problem and suggest a potential approach to addressing it. First, we empirically define and evaluate the stability problem of topic models using four textual datasets. Next, to alleviate the problem and with the goal of extracting actionable insights from textual data, we propose a new method, Stable LDA, which incorporates topical word clusters into the topic model to steer the model inference toward consistent results. We show that the proposed Stable LDA approach can significantly improve model stability while maintaining or even improving the topic model quality. Further, employing two case studies related to an online knowledge community and online consumer reviews, we demonstrate that the variables generated from Stable LDA can lead to more consistent estimations in econometric analyses. We believe that our work can further enhance management scholars’ collective toolkit to analyze ever-growing textual data.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Mis Quarterly
Mis Quarterly 工程技术-计算机:信息系统
CiteScore
13.30
自引率
4.10%
发文量
36
审稿时长
6-12 weeks
期刊介绍: Journal Name: MIS Quarterly Editorial Objective: The editorial objective of MIS Quarterly is focused on: Enhancing and communicating knowledge related to: Development of IT-based services Management of IT resources Use, impact, and economics of IT with managerial, organizational, and societal implications Addressing professional issues affecting the Information Systems (IS) field as a whole Key Focus Areas: Development of IT-based services Management of IT resources Use, impact, and economics of IT with managerial, organizational, and societal implications Professional issues affecting the IS field as a whole
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信