{"title":"将结构主题建模纳入短文本分析","authors":"Po-Ya Angela Wang, S. Hsieh","doi":"10.1075/consl.22026.wan","DOIUrl":null,"url":null,"abstract":"\nThe past few decades have seen the rapid development of topic modeling. So far, research has been more concerned with determining the ideal number of topics or meaningful topic clustering words than with applying topic modeling techniques to evaluate linguistic theories. This study proposes the Structural Topic Model (STM)-led framework to facilitate the interpretation of topic modeling results and standardize text analysis. STM encompasses various model training mechanisms, thereby requiring systematic designs to properly combine language studies. “Structural” in STM refers to the inclusion of metadata structure. Unlike the corpus-based keyness approach, STM can capture contextual cues and meta-information for the interpretation of topical results. Besides, STM can make cross-corpora comparisons via topical contrast, a challenging task for corpus-driven related models such as the Biterm Topic Model (BTM). Stylistic variations in song lyrics are taken as an illustration to show how to use the suggested framework to delve into the linguistic theory proposed by Pennebaker (2013). The topical model and iterable model in the proposed paradigm can clarify how pronouns affect style distinction. We believe the proposed STM-led framework can shed light on text analysis by conducting a reproducible cross-corpora comparison on short texts.","PeriodicalId":41887,"journal":{"name":"Concentric-Studies in Linguistics","volume":"1 1","pages":""},"PeriodicalIF":0.1000,"publicationDate":"2023-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Incorporating structural topic modeling into short text analysis\",\"authors\":\"Po-Ya Angela Wang, S. Hsieh\",\"doi\":\"10.1075/consl.22026.wan\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\nThe past few decades have seen the rapid development of topic modeling. So far, research has been more concerned with determining the ideal number of topics or meaningful topic clustering words than with applying topic modeling techniques to evaluate linguistic theories. This study proposes the Structural Topic Model (STM)-led framework to facilitate the interpretation of topic modeling results and standardize text analysis. STM encompasses various model training mechanisms, thereby requiring systematic designs to properly combine language studies. “Structural” in STM refers to the inclusion of metadata structure. Unlike the corpus-based keyness approach, STM can capture contextual cues and meta-information for the interpretation of topical results. Besides, STM can make cross-corpora comparisons via topical contrast, a challenging task for corpus-driven related models such as the Biterm Topic Model (BTM). Stylistic variations in song lyrics are taken as an illustration to show how to use the suggested framework to delve into the linguistic theory proposed by Pennebaker (2013). The topical model and iterable model in the proposed paradigm can clarify how pronouns affect style distinction. We believe the proposed STM-led framework can shed light on text analysis by conducting a reproducible cross-corpora comparison on short texts.\",\"PeriodicalId\":41887,\"journal\":{\"name\":\"Concentric-Studies in Linguistics\",\"volume\":\"1 1\",\"pages\":\"\"},\"PeriodicalIF\":0.1000,\"publicationDate\":\"2023-05-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Concentric-Studies in Linguistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1075/consl.22026.wan\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"0\",\"JCRName\":\"LANGUAGE & LINGUISTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Concentric-Studies in Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1075/consl.22026.wan","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}
Incorporating structural topic modeling into short text analysis
The past few decades have seen the rapid development of topic modeling. So far, research has been more concerned with determining the ideal number of topics or meaningful topic clustering words than with applying topic modeling techniques to evaluate linguistic theories. This study proposes the Structural Topic Model (STM)-led framework to facilitate the interpretation of topic modeling results and standardize text analysis. STM encompasses various model training mechanisms, thereby requiring systematic designs to properly combine language studies. “Structural” in STM refers to the inclusion of metadata structure. Unlike the corpus-based keyness approach, STM can capture contextual cues and meta-information for the interpretation of topical results. Besides, STM can make cross-corpora comparisons via topical contrast, a challenging task for corpus-driven related models such as the Biterm Topic Model (BTM). Stylistic variations in song lyrics are taken as an illustration to show how to use the suggested framework to delve into the linguistic theory proposed by Pennebaker (2013). The topical model and iterable model in the proposed paradigm can clarify how pronouns affect style distinction. We believe the proposed STM-led framework can shed light on text analysis by conducting a reproducible cross-corpora comparison on short texts.
期刊介绍:
Concentric: Studies in Linguistics is a refereed, biannual journal, publishing research articles on all aspects of linguistic studies on the languages in the Asia-Pacific region. Review articles and book reviews with solid argumentation are also considered. The journal is indexed in Scopus, Emerging Sources Citation Index (ESCI), Modern Language Association (MLA) Directory of Periodicals, MLA International Bibliography, Linguistics & Language Behavior Abstracts (LLBA), EBSCOhost, Communication & Mass Media Complete (CMMC), Airiti Library (AL), Taiwan Citation Index-Humanities and Social Sciences, and Taiwan Humanities Citation Index(THCI)-Level 1. First published in 1964 under the title,The Concentric, the journal aimed to promote academic research in the fields of linguistics and English literature, and to provide an avenue for researchers to share results of their investigations with other researchers and practitioners. Later in 1976, the journal was renamed as Studies in English Literature and Linguistics, and in 2001 was further renamed as Concentric: Studies in English Literature and Linguistics. As the quantity of research in the fields of theoretical linguistics, applied linguistics, and English literature has increased greatly in recent years, the journal has evolved into two publications. Beginning in 2004, these two journals have been published under the titles Concentric: Studies in Linguistics and Concentric: Literary and Cultural Studies respectively.