数据驱动的原旨主义

IF 2.5 2区社会学 Q1 Social Sciences

University of Pennsylvania Law Review Pub Date : 2018-01-27 DOI:10.2139/SSRN.3036206

Thomas R. Lee, J. Phillips

{"title":"数据驱动的原旨主义","authors":"Thomas R. Lee, J. Phillips","doi":"10.2139/SSRN.3036206","DOIUrl":null,"url":null,"abstract":"The threshold question for all originalist methodologies concerns the original communicative content of the words of the Constitution. For too long this inquiry has been pursued through tools that are ill-suited to the task. Dictionaries generally just define individual words; they don’t typically define phrases or allow for the consideration of broader linguistic context. And while dictionaries can provide a list of possible senses, they can’t tell us which sense is the most ordinary (or common). Founding-era dictionaries, moreover, were generally the work of one individual, tended to plagiarize each other, and relied on famous, often dated examples of English usage (from Shakespeare or the King James Bible). \n \nOriginalists have also turned to examples of usage in founding-era documents. This approach can address some of the shortcomings of dictionaries; a careful inquiry into sample sentences from founding-era literature can consider a wide range of semantic context. Yet even here the standard inquiry falls short. Originalists tend to turn only to certain sources, such as the Federalist Papers or the records of the state constitutional conventions, and those sources may not fully reflect how ordinary users of English of the day would have understood the Constitution (or at least have used language). Second, the number of founding-era documents relied on is often rather small, especially for generalizing about an entire country (or profession, in the case of lawyers). This opens originalists up to criticisms of cherry-picking, and even if that is not the case, sample sizes are just too small to confidently answer originalist questions. \n \nBut all is not lost. Big data, and the tools of linguists, have the potential to bring greater rigor and transparency to the practice of originalism. This article will explore the application of corpus linguistic methodology to aid originalism’s inquiry into the original communicative content of the Constitution. We propose to improve this inquiry by use of a newly released corpus (or database) of founding-era texts: the beta version of the Corpus of Founding-Era American English. The initial beta version will contain approximately 150 million words, derived from the Evans Early American Imprint Series (books, pamphlets and broadsides by all types of Americans on all types of subjects), the National Archives Founders Online Project (the papers of Washington, Franklin, Adams, Jefferson, Madison, and Hamilton, including correspondence to them), and Hein Online’s Legal Database (cases, statutes, legislative debates, etc.). \n \nThe paper will showcase how typical tools of a corpus—concordance lines, collocation, clusters (or n-grams), and frequency data—can aid in the search for original communicative content. We will also show how corpus data can help determine whether a word or phrase in question is best thought of as an ordinary one or a legal term of art. To showcase corpus linguistic methodology, the paper will analyze important clauses in the Constitution that have generated litigation and controversy over the years (commerce, public use, and natural born) and another whose original meaning has been presumed to be clear (domestic violence). We propose best practices, and also discuss the limitations of corpus linguistic methodology for originalism. \n \nLarry Solum has predicted that “corpus linguistics will revolutionize statutory and constitutional interpretation.”* Our paper seeks to chart out the first steps of that revolution so that others may follow.","PeriodicalId":48012,"journal":{"name":"University of Pennsylvania Law Review","volume":"167 1","pages":"261"},"PeriodicalIF":2.5000,"publicationDate":"2018-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Data-Driven Originalism\",\"authors\":\"Thomas R. Lee, J. Phillips\",\"doi\":\"10.2139/SSRN.3036206\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The threshold question for all originalist methodologies concerns the original communicative content of the words of the Constitution. For too long this inquiry has been pursued through tools that are ill-suited to the task. Dictionaries generally just define individual words; they don’t typically define phrases or allow for the consideration of broader linguistic context. And while dictionaries can provide a list of possible senses, they can’t tell us which sense is the most ordinary (or common). Founding-era dictionaries, moreover, were generally the work of one individual, tended to plagiarize each other, and relied on famous, often dated examples of English usage (from Shakespeare or the King James Bible). \\n \\nOriginalists have also turned to examples of usage in founding-era documents. This approach can address some of the shortcomings of dictionaries; a careful inquiry into sample sentences from founding-era literature can consider a wide range of semantic context. Yet even here the standard inquiry falls short. Originalists tend to turn only to certain sources, such as the Federalist Papers or the records of the state constitutional conventions, and those sources may not fully reflect how ordinary users of English of the day would have understood the Constitution (or at least have used language). Second, the number of founding-era documents relied on is often rather small, especially for generalizing about an entire country (or profession, in the case of lawyers). This opens originalists up to criticisms of cherry-picking, and even if that is not the case, sample sizes are just too small to confidently answer originalist questions. \\n \\nBut all is not lost. Big data, and the tools of linguists, have the potential to bring greater rigor and transparency to the practice of originalism. This article will explore the application of corpus linguistic methodology to aid originalism’s inquiry into the original communicative content of the Constitution. We propose to improve this inquiry by use of a newly released corpus (or database) of founding-era texts: the beta version of the Corpus of Founding-Era American English. The initial beta version will contain approximately 150 million words, derived from the Evans Early American Imprint Series (books, pamphlets and broadsides by all types of Americans on all types of subjects), the National Archives Founders Online Project (the papers of Washington, Franklin, Adams, Jefferson, Madison, and Hamilton, including correspondence to them), and Hein Online’s Legal Database (cases, statutes, legislative debates, etc.). \\n \\nThe paper will showcase how typical tools of a corpus—concordance lines, collocation, clusters (or n-grams), and frequency data—can aid in the search for original communicative content. We will also show how corpus data can help determine whether a word or phrase in question is best thought of as an ordinary one or a legal term of art. To showcase corpus linguistic methodology, the paper will analyze important clauses in the Constitution that have generated litigation and controversy over the years (commerce, public use, and natural born) and another whose original meaning has been presumed to be clear (domestic violence). We propose best practices, and also discuss the limitations of corpus linguistic methodology for originalism. \\n \\nLarry Solum has predicted that “corpus linguistics will revolutionize statutory and constitutional interpretation.”* Our paper seeks to chart out the first steps of that revolution so that others may follow.\",\"PeriodicalId\":48012,\"journal\":{\"name\":\"University of Pennsylvania Law Review\",\"volume\":\"167 1\",\"pages\":\"261\"},\"PeriodicalIF\":2.5000,\"publicationDate\":\"2018-01-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"University of Pennsylvania Law Review\",\"FirstCategoryId\":\"90\",\"ListUrlMain\":\"https://doi.org/10.2139/SSRN.3036206\",\"RegionNum\":2,\"RegionCategory\":\"社会学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Social Sciences\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"University of Pennsylvania Law Review","FirstCategoryId":"90","ListUrlMain":"https://doi.org/10.2139/SSRN.3036206","RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Social Sciences","Score":null,"Total":0}

引用次数: 3

摘要

所有原旨主义方法论的门槛问题都与宪法文字的原创性交流内容有关。长期以来，这种调查一直是通过不适合这项任务的工具进行的。字典通常只定义单个单词;它们通常不定义短语，也不考虑更广泛的语言语境。虽然字典可以提供一系列可能的意思，但它们不能告诉我们哪种意思是最普通的。此外，建国时期的词典通常是一个人的作品，往往相互抄袭，并依赖于著名的、往往过时的英语用法(来自莎士比亚或钦定版《圣经》)。原意主义者还将目光转向了开国时期文件中的用法。这种方法可以解决字典的一些缺点;仔细研究建国时期文献中的样句可以考虑广泛的语义语境。然而，即使在这方面，标准的调查也不足。原意主义者倾向于只求助于某些来源，如《联邦党人文集》或各州制宪会议的记录，而这些来源可能不能完全反映当时普通英语使用者如何理解宪法(或至少是如何使用语言)。其次，人们所依赖的建国时期文件的数量通常相当少，尤其是在概括整个国家(或职业，就律师而言)时。这让原旨主义者容易受到挑剔的批评，即使事实并非如此，样本量也太小了，无法自信地回答原旨主义者的问题。但并非一切都完了。大数据和语言学家的工具有可能为原创主义的实践带来更大的严谨性和透明度。本文将探讨语料库语言学方法的应用，以帮助原旨主义对宪法原语交际内容的探究。我们建议通过使用新发布的开国时代文本语料库(或数据库)来改进这一调查:开国时代美国英语语料库的测试版。最初的测试版将包含大约1.5亿字，来自埃文斯早期美国印记系列(所有类型的美国人关于所有类型主题的书籍，小册子和海报)，国家档案馆创始人在线项目(华盛顿，富兰克林，亚当斯，杰斐逊，麦迪逊和汉密尔顿的论文，包括与他们的通信)，以及Hein在线法律数据库(案例，法规，立法辩论等)。本文将展示语料库的典型工具-一致性线，搭配，聚类(或n-grams)和频率数据-如何帮助搜索原始交际内容。我们还将展示语料库数据如何帮助确定所讨论的单词或短语是最好被视为普通单词或短语还是法律术语。为了展示语料库语言学方法，本文将分析多年来引起诉讼和争议的宪法重要条款(商业、公共使用、自然出生)和另一个原意被推定为明确的条款(家庭暴力)。我们提出了最佳实践，并讨论了原创性语料库语言学方法的局限性。拉里·索伦曾预言“语料库语言学将彻底改变法律和宪法的解释。我们的论文试图描绘出这场革命的第一步，以便其他人可以效仿。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Data-Driven Originalism

The threshold question for all originalist methodologies concerns the original communicative content of the words of the Constitution. For too long this inquiry has been pursued through tools that are ill-suited to the task. Dictionaries generally just define individual words; they don’t typically define phrases or allow for the consideration of broader linguistic context. And while dictionaries can provide a list of possible senses, they can’t tell us which sense is the most ordinary (or common). Founding-era dictionaries, moreover, were generally the work of one individual, tended to plagiarize each other, and relied on famous, often dated examples of English usage (from Shakespeare or the King James Bible). Originalists have also turned to examples of usage in founding-era documents. This approach can address some of the shortcomings of dictionaries; a careful inquiry into sample sentences from founding-era literature can consider a wide range of semantic context. Yet even here the standard inquiry falls short. Originalists tend to turn only to certain sources, such as the Federalist Papers or the records of the state constitutional conventions, and those sources may not fully reflect how ordinary users of English of the day would have understood the Constitution (or at least have used language). Second, the number of founding-era documents relied on is often rather small, especially for generalizing about an entire country (or profession, in the case of lawyers). This opens originalists up to criticisms of cherry-picking, and even if that is not the case, sample sizes are just too small to confidently answer originalist questions. But all is not lost. Big data, and the tools of linguists, have the potential to bring greater rigor and transparency to the practice of originalism. This article will explore the application of corpus linguistic methodology to aid originalism’s inquiry into the original communicative content of the Constitution. We propose to improve this inquiry by use of a newly released corpus (or database) of founding-era texts: the beta version of the Corpus of Founding-Era American English. The initial beta version will contain approximately 150 million words, derived from the Evans Early American Imprint Series (books, pamphlets and broadsides by all types of Americans on all types of subjects), the National Archives Founders Online Project (the papers of Washington, Franklin, Adams, Jefferson, Madison, and Hamilton, including correspondence to them), and Hein Online’s Legal Database (cases, statutes, legislative debates, etc.). The paper will showcase how typical tools of a corpus—concordance lines, collocation, clusters (or n-grams), and frequency data—can aid in the search for original communicative content. We will also show how corpus data can help determine whether a word or phrase in question is best thought of as an ordinary one or a legal term of art. To showcase corpus linguistic methodology, the paper will analyze important clauses in the Constitution that have generated litigation and controversy over the years (commerce, public use, and natural born) and another whose original meaning has been presumed to be clear (domestic violence). We propose best practices, and also discuss the limitations of corpus linguistic methodology for originalism. Larry Solum has predicted that “corpus linguistics will revolutionize statutory and constitutional interpretation.”* Our paper seeks to chart out the first steps of that revolution so that others may follow.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

University of Pennsylvania Law Review LAW-

CiteScore

2.90

自引率

0.00%

发文量