Collecting Large-scale Comparative Text Data on Legislative Debates

The Politics of Legislative Debates Pub Date : 2021-10-08 DOI:10.1093/oso/9780198849063.003.0006

Jan Schwalbach, Christian Rauh

引用次数: 1

Abstract

Parliamentary speeches present one of the most consistently available sources of information about the political priorities, actor positions, and conflict structures in democratic states. Recent advances of automated text analysis offer more and more tools to tap into this information reservoir in a systematic manner. However, collecting the high-quality text data needed for unleashing the comparative potential of the various text analysis algorithms out there is a costly endeavor and faces various pragmatic hurdles. Against this challenge, this chapter offers three contributions. First, we outline best practice guidelines and useful tools for researchers wishing to collect or to extend existing legislative debate corpora. Second, we present an extended version of the ParlSpeech Corpus. Third, we highlight the difficulties of comparing text-as-data outputs across different parliaments, pointing to varying languages, varying traditions and conventions, and varying metadata availability.

查看原文本刊更多论文

收集立法辩论的大规模文本比较数据

议会演讲提供了关于民主国家的政治优先事项、行动者立场和冲突结构的最一致的信息来源之一。自动文本分析的最新进展提供了越来越多的工具，以系统的方式挖掘这个信息库。然而，收集高质量的文本数据以释放各种文本分析算法的比较潜力是一项昂贵的工作，并且面临各种实际障碍。针对这一挑战，本章提供了三点贡献。首先，我们为希望收集或扩展现有立法辩论语料库的研究人员概述了最佳实践指南和有用的工具。其次，我们提出了一个扩展版本的ParlSpeech语料库。第三，我们强调了比较不同议会的文本即数据输出的困难，指出不同的语言、不同的传统和惯例以及不同的元数据可用性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

The Politics of Legislative Debates

自引率

0.00%

发文量