Hannah Béchara, Alexander Herzog, Slava Jankin, Peter John
{"title":"Transfer learning for topic labeling: Analysis of the UK House of Commons speeches 1935–2014","authors":"Hannah Béchara, Alexander Herzog, Slava Jankin, Peter John","doi":"10.1177/20531680211022206","DOIUrl":null,"url":null,"abstract":"Topic models are widely used in natural language processing, allowing researchers to estimate the underlying themes in a collection of documents. Most topic models require the additional step of attaching meaningful labels to estimated topics, a process that is not scalable, suffers from human bias, and is difficult to replicate. We present a transfer topic labeling method that seeks to remedy these problems, using domain-specific codebooks as the knowledge base to automatically label estimated topics. We demonstrate our approach with a large-scale topic model analysis of the complete corpus of UK House of Commons speeches from 1935 to 2014, using the coding instructions of the Comparative Agendas Project to label topics. We evaluated our results using human expert coding and compared our approach with more current state-of-the-art neural methods. Our approach was simple to implement, compared favorably to expert judgments, and outperformed the neural networks model for a majority of the topics we estimated.","PeriodicalId":37327,"journal":{"name":"Research and Politics","volume":" ","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/20531680211022206","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Research and Politics","FirstCategoryId":"90","ListUrlMain":"https://doi.org/10.1177/20531680211022206","RegionNum":3,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"POLITICAL SCIENCE","Score":null,"Total":0}
引用次数: 2
Abstract
Topic models are widely used in natural language processing, allowing researchers to estimate the underlying themes in a collection of documents. Most topic models require the additional step of attaching meaningful labels to estimated topics, a process that is not scalable, suffers from human bias, and is difficult to replicate. We present a transfer topic labeling method that seeks to remedy these problems, using domain-specific codebooks as the knowledge base to automatically label estimated topics. We demonstrate our approach with a large-scale topic model analysis of the complete corpus of UK House of Commons speeches from 1935 to 2014, using the coding instructions of the Comparative Agendas Project to label topics. We evaluated our results using human expert coding and compared our approach with more current state-of-the-art neural methods. Our approach was simple to implement, compared favorably to expert judgments, and outperformed the neural networks model for a majority of the topics we estimated.
期刊介绍:
Research & Politics aims to advance systematic peer-reviewed research in political science and related fields through the open access publication of the very best cutting-edge research and policy analysis. The journal provides a venue for scholars to communicate rapidly and succinctly important new insights to the broadest possible audience while maintaining the highest standards of quality control.