将结构主题建模纳入短文本分析

IF 0.1 0 LANGUAGE & LINGUISTICS
Po-Ya Angela Wang, S. Hsieh
{"title":"将结构主题建模纳入短文本分析","authors":"Po-Ya Angela Wang, S. Hsieh","doi":"10.1075/consl.22026.wan","DOIUrl":null,"url":null,"abstract":"\nThe past few decades have seen the rapid development of topic modeling. So far, research has been more concerned with determining the ideal number of topics or meaningful topic clustering words than with applying topic modeling techniques to evaluate linguistic theories. This study proposes the Structural Topic Model (STM)-led framework to facilitate the interpretation of topic modeling results and standardize text analysis. STM encompasses various model training mechanisms, thereby requiring systematic designs to properly combine language studies. “Structural” in STM refers to the inclusion of metadata structure. Unlike the corpus-based keyness approach, STM can capture contextual cues and meta-information for the interpretation of topical results. Besides, STM can make cross-corpora comparisons via topical contrast, a challenging task for corpus-driven related models such as the Biterm Topic Model (BTM). Stylistic variations in song lyrics are taken as an illustration to show how to use the suggested framework to delve into the linguistic theory proposed by Pennebaker (2013). The topical model and iterable model in the proposed paradigm can clarify how pronouns affect style distinction. We believe the proposed STM-led framework can shed light on text analysis by conducting a reproducible cross-corpora comparison on short texts.","PeriodicalId":41887,"journal":{"name":"Concentric-Studies in Linguistics","volume":"1 1","pages":""},"PeriodicalIF":0.1000,"publicationDate":"2023-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Incorporating structural topic modeling into short text analysis\",\"authors\":\"Po-Ya Angela Wang, S. Hsieh\",\"doi\":\"10.1075/consl.22026.wan\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\nThe past few decades have seen the rapid development of topic modeling. So far, research has been more concerned with determining the ideal number of topics or meaningful topic clustering words than with applying topic modeling techniques to evaluate linguistic theories. This study proposes the Structural Topic Model (STM)-led framework to facilitate the interpretation of topic modeling results and standardize text analysis. STM encompasses various model training mechanisms, thereby requiring systematic designs to properly combine language studies. “Structural” in STM refers to the inclusion of metadata structure. Unlike the corpus-based keyness approach, STM can capture contextual cues and meta-information for the interpretation of topical results. Besides, STM can make cross-corpora comparisons via topical contrast, a challenging task for corpus-driven related models such as the Biterm Topic Model (BTM). Stylistic variations in song lyrics are taken as an illustration to show how to use the suggested framework to delve into the linguistic theory proposed by Pennebaker (2013). The topical model and iterable model in the proposed paradigm can clarify how pronouns affect style distinction. We believe the proposed STM-led framework can shed light on text analysis by conducting a reproducible cross-corpora comparison on short texts.\",\"PeriodicalId\":41887,\"journal\":{\"name\":\"Concentric-Studies in Linguistics\",\"volume\":\"1 1\",\"pages\":\"\"},\"PeriodicalIF\":0.1000,\"publicationDate\":\"2023-05-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Concentric-Studies in Linguistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1075/consl.22026.wan\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"0\",\"JCRName\":\"LANGUAGE & LINGUISTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Concentric-Studies in Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1075/consl.22026.wan","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}
引用次数: 0

摘要

在过去的几十年里,主题建模得到了快速发展。到目前为止,研究更多的是确定理想的主题数量或有意义的主题聚类词,而不是应用主题建模技术来评估语言学理论。本研究提出了以结构主题模型(STM)为主导的框架,以便于解释主题建模结果并规范文本分析。STM包含各种模型训练机制,因此需要系统的设计来适当地结合语言研究。STM中的“结构化”是指包含元数据结构。与基于语料库的主题方法不同,STM可以捕获上下文线索和元信息,用于解释主题结果。此外,STM可以通过主题对比进行跨语料库比较,这对于语料库驱动的相关模型(如Biterm Topic Model(BTM))来说是一项具有挑战性的任务。以歌词中的风格变化为例,展示了如何使用所建议的框架来深入研究Pennebaker(2013)提出的语言学理论。该范式中的主题模型和可迭代模型可以阐明代词如何影响风格区分。我们相信,所提出的STM引导的框架可以通过对短文本进行可复制的跨语料库比较来阐明文本分析。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Incorporating structural topic modeling into short text analysis
The past few decades have seen the rapid development of topic modeling. So far, research has been more concerned with determining the ideal number of topics or meaningful topic clustering words than with applying topic modeling techniques to evaluate linguistic theories. This study proposes the Structural Topic Model (STM)-led framework to facilitate the interpretation of topic modeling results and standardize text analysis. STM encompasses various model training mechanisms, thereby requiring systematic designs to properly combine language studies. “Structural” in STM refers to the inclusion of metadata structure. Unlike the corpus-based keyness approach, STM can capture contextual cues and meta-information for the interpretation of topical results. Besides, STM can make cross-corpora comparisons via topical contrast, a challenging task for corpus-driven related models such as the Biterm Topic Model (BTM). Stylistic variations in song lyrics are taken as an illustration to show how to use the suggested framework to delve into the linguistic theory proposed by Pennebaker (2013). The topical model and iterable model in the proposed paradigm can clarify how pronouns affect style distinction. We believe the proposed STM-led framework can shed light on text analysis by conducting a reproducible cross-corpora comparison on short texts.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Concentric-Studies in Linguistics
Concentric-Studies in Linguistics LANGUAGE & LINGUISTICS-
CiteScore
0.20
自引率
0.00%
发文量
5
期刊介绍: Concentric: Studies in Linguistics is a refereed, biannual journal, publishing research articles on all aspects of linguistic studies on the languages in the Asia-Pacific region. Review articles and book reviews with solid argumentation are also considered. The journal is indexed in Scopus, Emerging Sources Citation Index (ESCI), Modern Language Association (MLA) Directory of Periodicals, MLA International Bibliography, Linguistics & Language Behavior Abstracts (LLBA), EBSCOhost, Communication & Mass Media Complete (CMMC), Airiti Library (AL), Taiwan Citation Index-Humanities and Social Sciences, and Taiwan Humanities Citation Index(THCI)-Level 1. First published in 1964 under the title,The Concentric, the journal aimed to promote academic research in the fields of linguistics and English literature, and to provide an avenue for researchers to share results of their investigations with other researchers and practitioners. Later in 1976, the journal was renamed as Studies in English Literature and Linguistics, and in 2001 was further renamed as Concentric: Studies in English Literature and Linguistics. As the quantity of research in the fields of theoretical linguistics, applied linguistics, and English literature has increased greatly in recent years, the journal has evolved into two publications. Beginning in 2004, these two journals have been published under the titles Concentric: Studies in Linguistics and Concentric: Literary and Cultural Studies respectively.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信