政治文本的跨领域主题分类

IF 4.7 2区 社会学 Q1 POLITICAL SCIENCE
Moritz Osnabrügge, Elliott Ash, M. Morelli
{"title":"政治文本的跨领域主题分类","authors":"Moritz Osnabrügge, Elliott Ash, M. Morelli","doi":"10.1017/pan.2021.37","DOIUrl":null,"url":null,"abstract":"Abstract We introduce and assess the use of supervised learning in cross-domain topic classification. In this approach, an algorithm learns to classify topics in a labeled source corpus and then extrapolates topics in an unlabeled target corpus from another domain. The ability to use existing training data makes this method significantly more efficient than within-domain supervised learning. It also has three advantages over unsupervised topic models: the method can be more specifically targeted to a research question and the resulting topics are easier to validate and interpret. We demonstrate the method using the case of labeled party platforms (source corpus) and unlabeled parliamentary speeches (target corpus). In addition to the standard within-domain error metrics, we further validate the cross-domain performance by labeling a subset of target-corpus documents. We find that the classifier accurately assigns topics in the parliamentary speeches, although accuracy varies substantially by topic. We also propose tools diagnosing cross-domain classification. To illustrate the usefulness of the method, we present two case studies on how electoral rules and the gender of parliamentarians influence the choice of speech topics.","PeriodicalId":48270,"journal":{"name":"Political Analysis","volume":null,"pages":null},"PeriodicalIF":4.7000,"publicationDate":"2021-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":"{\"title\":\"Cross-Domain Topic Classification for Political Texts\",\"authors\":\"Moritz Osnabrügge, Elliott Ash, M. Morelli\",\"doi\":\"10.1017/pan.2021.37\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract We introduce and assess the use of supervised learning in cross-domain topic classification. In this approach, an algorithm learns to classify topics in a labeled source corpus and then extrapolates topics in an unlabeled target corpus from another domain. The ability to use existing training data makes this method significantly more efficient than within-domain supervised learning. It also has three advantages over unsupervised topic models: the method can be more specifically targeted to a research question and the resulting topics are easier to validate and interpret. We demonstrate the method using the case of labeled party platforms (source corpus) and unlabeled parliamentary speeches (target corpus). In addition to the standard within-domain error metrics, we further validate the cross-domain performance by labeling a subset of target-corpus documents. We find that the classifier accurately assigns topics in the parliamentary speeches, although accuracy varies substantially by topic. We also propose tools diagnosing cross-domain classification. To illustrate the usefulness of the method, we present two case studies on how electoral rules and the gender of parliamentarians influence the choice of speech topics.\",\"PeriodicalId\":48270,\"journal\":{\"name\":\"Political Analysis\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.7000,\"publicationDate\":\"2021-10-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Political Analysis\",\"FirstCategoryId\":\"90\",\"ListUrlMain\":\"https://doi.org/10.1017/pan.2021.37\",\"RegionNum\":2,\"RegionCategory\":\"社会学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"POLITICAL SCIENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Political Analysis","FirstCategoryId":"90","ListUrlMain":"https://doi.org/10.1017/pan.2021.37","RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"POLITICAL SCIENCE","Score":null,"Total":0}
引用次数: 17

摘要

摘要我们介绍并评估了监督学习在跨领域主题分类中的应用。在这种方法中,算法学习对标记的源语料库中的主题进行分类,然后从另一个领域推断未标记的目标语料库中的话题。使用现有训练数据的能力使该方法比域内监督学习更有效。与无监督主题模型相比,它还有三个优点:该方法可以更具体地针对研究问题,并且生成的主题更容易验证和解释。我们使用标记的政党纲领(源语料库)和未标记的议会演讲(目标语料库)来演示该方法。除了标准的域内错误度量外,我们还通过标记目标语料库文档的子集来进一步验证跨域性能。我们发现,分类器准确地分配了议会演讲中的主题,尽管准确性因主题而异。我们还提出了诊断跨领域分类的工具。为了说明该方法的有用性,我们提出了两个关于选举规则和议员性别如何影响演讲主题选择的案例研究。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Cross-Domain Topic Classification for Political Texts
Abstract We introduce and assess the use of supervised learning in cross-domain topic classification. In this approach, an algorithm learns to classify topics in a labeled source corpus and then extrapolates topics in an unlabeled target corpus from another domain. The ability to use existing training data makes this method significantly more efficient than within-domain supervised learning. It also has three advantages over unsupervised topic models: the method can be more specifically targeted to a research question and the resulting topics are easier to validate and interpret. We demonstrate the method using the case of labeled party platforms (source corpus) and unlabeled parliamentary speeches (target corpus). In addition to the standard within-domain error metrics, we further validate the cross-domain performance by labeling a subset of target-corpus documents. We find that the classifier accurately assigns topics in the parliamentary speeches, although accuracy varies substantially by topic. We also propose tools diagnosing cross-domain classification. To illustrate the usefulness of the method, we present two case studies on how electoral rules and the gender of parliamentarians influence the choice of speech topics.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Political Analysis
Political Analysis POLITICAL SCIENCE-
CiteScore
8.80
自引率
3.70%
发文量
30
期刊介绍: Political Analysis chronicles these exciting developments by publishing the most sophisticated scholarship in the field. It is the place to learn new methods, to find some of the best empirical scholarship, and to publish your best research.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信