Corpus of Slovak Legislative Documents

Journal of Linguistics/Jazykovedný casopis Pub Date : 2022-09-01 DOI:10.2478/jazcas-2023-0004

R. Garabík

引用次数: 0

Abstract

Abstract The article describes the construction of the corpus of Slovak legislative documents. By analyzing several statistical values of the source metadata and documents, we efficiently improve corpus quality. We describe the methods used to clean up small variations in metadata, length based discrimination of document and examine the effectiveness of several strategies of deduplication. The corpus is a part of a comparable corpus of legislative documents of seven languages, created in the Multilingual Resources for CEF.AT in the Legal Domain (MARCELL) project.

查看原文本刊更多论文

斯洛伐克立法文件文集

摘要本文介绍了斯洛伐克立法文件语料库的构建。通过分析源元数据和文档的几个统计值，有效地提高了语料库的质量。我们描述了用于清理元数据中的小变化的方法，基于长度的文档区分，并检查了几种重复数据删除策略的有效性。该语料库是一个类似的七种语言的立法文件语料库的一部分，该语料库是在基金的多语言资源中创建的。法律领域(MARCELL)项目中的AT。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Linguistics/Jazykovedný casopis

自引率

0.00%

发文量