{"title":"可理解性与自动化:数字化时代的通俗语言","authors":"István Üveges","doi":"10.2478/bjes-2022-0012","DOIUrl":null,"url":null,"abstract":"Abstract The current article briefly presents a pilot machine-learning experiment on the classification of official texts addressed to lay readers with the use of support vector machine as a baseline and fastText models. For this purpose, a hand-crafted corpus was used, created by the experts of the National Tax and Customs Administration of Hungary under the office’s Public Accessibility Programme. The corpus contained sentences that were paraphrased or completely rewritten by the experts to make them more readable for lay people, as well their original counter pairs. The aim was to automatically distinguish between these two classes by using supervised machine-learning algorithms. If successful, such a machine-learning-based model could be used to draw the attention of experts involved in making the texts of official bodies more comprehensible to the average reader to the potentially problematic points of a text. Therefore, the process of rephrasing such texts could be sped up drastically. Such a rephrasing (considering, above all, the needs of the average reader) can improve the overall comprehensibility of official (mostly legal) texts, and therefore supports access to justice, the transparency of governmental organizations and, most importantly, improves the rule of law in a given country.","PeriodicalId":29836,"journal":{"name":"TalTech Journal of European Studies","volume":"12 1","pages":"64 - 86"},"PeriodicalIF":0.6000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comprehensibility and Automation: Plain Language in the Era of Digitalization\",\"authors\":\"István Üveges\",\"doi\":\"10.2478/bjes-2022-0012\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract The current article briefly presents a pilot machine-learning experiment on the classification of official texts addressed to lay readers with the use of support vector machine as a baseline and fastText models. For this purpose, a hand-crafted corpus was used, created by the experts of the National Tax and Customs Administration of Hungary under the office’s Public Accessibility Programme. The corpus contained sentences that were paraphrased or completely rewritten by the experts to make them more readable for lay people, as well their original counter pairs. The aim was to automatically distinguish between these two classes by using supervised machine-learning algorithms. If successful, such a machine-learning-based model could be used to draw the attention of experts involved in making the texts of official bodies more comprehensible to the average reader to the potentially problematic points of a text. Therefore, the process of rephrasing such texts could be sped up drastically. Such a rephrasing (considering, above all, the needs of the average reader) can improve the overall comprehensibility of official (mostly legal) texts, and therefore supports access to justice, the transparency of governmental organizations and, most importantly, improves the rule of law in a given country.\",\"PeriodicalId\":29836,\"journal\":{\"name\":\"TalTech Journal of European Studies\",\"volume\":\"12 1\",\"pages\":\"64 - 86\"},\"PeriodicalIF\":0.6000,\"publicationDate\":\"2022-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"TalTech Journal of European Studies\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2478/bjes-2022-0012\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"LAW\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"TalTech Journal of European Studies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2478/bjes-2022-0012","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"LAW","Score":null,"Total":0}
Comprehensibility and Automation: Plain Language in the Era of Digitalization
Abstract The current article briefly presents a pilot machine-learning experiment on the classification of official texts addressed to lay readers with the use of support vector machine as a baseline and fastText models. For this purpose, a hand-crafted corpus was used, created by the experts of the National Tax and Customs Administration of Hungary under the office’s Public Accessibility Programme. The corpus contained sentences that were paraphrased or completely rewritten by the experts to make them more readable for lay people, as well their original counter pairs. The aim was to automatically distinguish between these two classes by using supervised machine-learning algorithms. If successful, such a machine-learning-based model could be used to draw the attention of experts involved in making the texts of official bodies more comprehensible to the average reader to the potentially problematic points of a text. Therefore, the process of rephrasing such texts could be sped up drastically. Such a rephrasing (considering, above all, the needs of the average reader) can improve the overall comprehensibility of official (mostly legal) texts, and therefore supports access to justice, the transparency of governmental organizations and, most importantly, improves the rule of law in a given country.