可理解性与自动化：数字化时代的通俗语言

IF 1.2 Q2 LAW

TalTech Journal of European Studies Pub Date : 2022-12-01 DOI:10.2478/bjes-2022-0012

István Üveges

{"title":"可理解性与自动化：数字化时代的通俗语言","authors":"István Üveges","doi":"10.2478/bjes-2022-0012","DOIUrl":null,"url":null,"abstract":"Abstract The current article briefly presents a pilot machine-learning experiment on the classification of official texts addressed to lay readers with the use of support vector machine as a baseline and fastText models. For this purpose, a hand-crafted corpus was used, created by the experts of the National Tax and Customs Administration of Hungary under the office’s Public Accessibility Programme. The corpus contained sentences that were paraphrased or completely rewritten by the experts to make them more readable for lay people, as well their original counter pairs. The aim was to automatically distinguish between these two classes by using supervised machine-learning algorithms. If successful, such a machine-learning-based model could be used to draw the attention of experts involved in making the texts of official bodies more comprehensible to the average reader to the potentially problematic points of a text. Therefore, the process of rephrasing such texts could be sped up drastically. Such a rephrasing (considering, above all, the needs of the average reader) can improve the overall comprehensibility of official (mostly legal) texts, and therefore supports access to justice, the transparency of governmental organizations and, most importantly, improves the rule of law in a given country.","PeriodicalId":29836,"journal":{"name":"TalTech Journal of European Studies","volume":"12 1","pages":"64 - 86"},"PeriodicalIF":1.2000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comprehensibility and Automation: Plain Language in the Era of Digitalization\",\"authors\":\"István Üveges\",\"doi\":\"10.2478/bjes-2022-0012\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract The current article briefly presents a pilot machine-learning experiment on the classification of official texts addressed to lay readers with the use of support vector machine as a baseline and fastText models. For this purpose, a hand-crafted corpus was used, created by the experts of the National Tax and Customs Administration of Hungary under the office’s Public Accessibility Programme. The corpus contained sentences that were paraphrased or completely rewritten by the experts to make them more readable for lay people, as well their original counter pairs. The aim was to automatically distinguish between these two classes by using supervised machine-learning algorithms. If successful, such a machine-learning-based model could be used to draw the attention of experts involved in making the texts of official bodies more comprehensible to the average reader to the potentially problematic points of a text. Therefore, the process of rephrasing such texts could be sped up drastically. Such a rephrasing (considering, above all, the needs of the average reader) can improve the overall comprehensibility of official (mostly legal) texts, and therefore supports access to justice, the transparency of governmental organizations and, most importantly, improves the rule of law in a given country.\",\"PeriodicalId\":29836,\"journal\":{\"name\":\"TalTech Journal of European Studies\",\"volume\":\"12 1\",\"pages\":\"64 - 86\"},\"PeriodicalIF\":1.2000,\"publicationDate\":\"2022-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"TalTech Journal of European Studies\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2478/bjes-2022-0012\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"LAW\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"TalTech Journal of European Studies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2478/bjes-2022-0012","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"LAW","Score":null,"Total":0}

引用次数: 0

摘要

摘要本文简要介绍了一个基于支持向量机(support vector machine)作为基线和fastText模型的面向普通读者的官方文本分类的试点机器学习实验。为此目的，使用了由匈牙利国家税务和海关总署的专家根据该办公室的公共无障碍方案制作的手工语料库。语料库包含由专家改写或完全重写的句子，以使它们更容易被外行人阅读，以及它们的原始counter对。目的是通过使用监督机器学习算法自动区分这两个类别。如果成功，这样一个基于机器学习的模型可以用来吸引专家的注意力，让普通读者更容易理解官方机构的文本，了解文本中潜在的问题点。因此，改写这些案文的进程可以大大加快。这种改写(首先考虑到普通读者的需要)可以提高官方(主要是法律)文本的总体可理解性，从而支持诉诸司法，政府组织的透明度，最重要的是，改善特定国家的法治。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Comprehensibility and Automation: Plain Language in the Era of Digitalization

Abstract The current article briefly presents a pilot machine-learning experiment on the classification of official texts addressed to lay readers with the use of support vector machine as a baseline and fastText models. For this purpose, a hand-crafted corpus was used, created by the experts of the National Tax and Customs Administration of Hungary under the office’s Public Accessibility Programme. The corpus contained sentences that were paraphrased or completely rewritten by the experts to make them more readable for lay people, as well their original counter pairs. The aim was to automatically distinguish between these two classes by using supervised machine-learning algorithms. If successful, such a machine-learning-based model could be used to draw the attention of experts involved in making the texts of official bodies more comprehensible to the average reader to the potentially problematic points of a text. Therefore, the process of rephrasing such texts could be sped up drastically. Such a rephrasing (considering, above all, the needs of the average reader) can improve the overall comprehensibility of official (mostly legal) texts, and therefore supports access to justice, the transparency of governmental organizations and, most importantly, improves the rule of law in a given country.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

TalTech Journal of European Studies POLITICAL SCIENCE-

CiteScore

1.90

自引率

62.50%

发文量