用于自动识别欧盟指令的国家实施的统一相似性度量

Proceedings of the 16th edition of the International Conference on Articial Intelligence and Law Pub Date : 2017-06-12 DOI:10.1145/3086512.3086527

Rohan Nanda, Luigi Di Caro, G. Boella, Hristo Konstantinov, Tenyo Tyankov, Daniel Traykov, H. Hristov, F. Costamagna, Llio Humphreys, L. Robaldo, Michele Romano

{"title":"用于自动识别欧盟指令的国家实施的统一相似性度量","authors":"Rohan Nanda, Luigi Di Caro, G. Boella, Hristo Konstantinov, Tenyo Tyankov, Daniel Traykov, H. Hristov, F. Costamagna, Llio Humphreys, L. Robaldo, Michele Romano","doi":"10.1145/3086512.3086527","DOIUrl":null,"url":null,"abstract":"This paper presents a unifying text similarity measure (USM) for automated identification of national implementations of European Union (EU) directives. The proposed model retrieves the transposed provisions of national law at a fine-grained level for each article of the directive. USM incorporates methods for matching common words, common sequences of words and approximate string matching. It was used for identifying transpositions on a multilingual corpus of four directives and their corresponding national implementing measures (NIMs) in three different languages : English, French and Italian. We further utilized a corpus of four additional directives and their corresponding NIMs in English language for a thorough test of the USM approach. We evaluated the model by comparing our results with a gold standard consisting of official correlation tables (where available) or correspondences manually identified by domain experts. Our results indicate that USM was able to identify transpositions with average F-score values of 0.808, 0.736 and 0.708 for French, Italian and English Directive-NIM pairs respectively in the multilingual corpus. A comparison with state-of-the-art methods for text similarity illustrates that USM achieves a higher F-score and recall across both the corpora.","PeriodicalId":425187,"journal":{"name":"Proceedings of the 16th edition of the International Conference on Articial Intelligence and Law","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"A unifying similarity measure for automated identification of national implementations of european union directives\",\"authors\":\"Rohan Nanda, Luigi Di Caro, G. Boella, Hristo Konstantinov, Tenyo Tyankov, Daniel Traykov, H. Hristov, F. Costamagna, Llio Humphreys, L. Robaldo, Michele Romano\",\"doi\":\"10.1145/3086512.3086527\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a unifying text similarity measure (USM) for automated identification of national implementations of European Union (EU) directives. The proposed model retrieves the transposed provisions of national law at a fine-grained level for each article of the directive. USM incorporates methods for matching common words, common sequences of words and approximate string matching. It was used for identifying transpositions on a multilingual corpus of four directives and their corresponding national implementing measures (NIMs) in three different languages : English, French and Italian. We further utilized a corpus of four additional directives and their corresponding NIMs in English language for a thorough test of the USM approach. We evaluated the model by comparing our results with a gold standard consisting of official correlation tables (where available) or correspondences manually identified by domain experts. Our results indicate that USM was able to identify transpositions with average F-score values of 0.808, 0.736 and 0.708 for French, Italian and English Directive-NIM pairs respectively in the multilingual corpus. A comparison with state-of-the-art methods for text similarity illustrates that USM achieves a higher F-score and recall across both the corpora.\",\"PeriodicalId\":425187,\"journal\":{\"name\":\"Proceedings of the 16th edition of the International Conference on Articial Intelligence and Law\",\"volume\":\"37 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-06-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 16th edition of the International Conference on Articial Intelligence and Law\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3086512.3086527\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 16th edition of the International Conference on Articial Intelligence and Law","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3086512.3086527","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

摘要

本文提出了一个统一的文本相似度度量(USM)，用于自动识别欧盟(EU)指令的国家实施。所提出的模型在细粒度级别上为指令的每一条检索国家法律的转置条款。USM包含了常用单词、常用单词序列和近似字符串匹配的方法。它被用于识别四项指令及其相应的三种不同语言的国家实施措施(NIMs)的多语言语料库上的换位:英语、法语和意大利语。我们进一步利用了四个额外的指令及其相应的英语nim语料库，对USM方法进行了彻底的测试。我们通过将我们的结果与由官方相关表(在可用的地方)或领域专家手动识别的通信组成的金标准进行比较来评估模型。结果表明，USM能够识别多语语料库中法语、意大利语和英语指令- nim对的平均f值分别为0.808、0.736和0.708。与最先进的文本相似度方法的比较表明，USM在两种语料库中都达到了更高的f分和召回率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A unifying similarity measure for automated identification of national implementations of european union directives

This paper presents a unifying text similarity measure (USM) for automated identification of national implementations of European Union (EU) directives. The proposed model retrieves the transposed provisions of national law at a fine-grained level for each article of the directive. USM incorporates methods for matching common words, common sequences of words and approximate string matching. It was used for identifying transpositions on a multilingual corpus of four directives and their corresponding national implementing measures (NIMs) in three different languages : English, French and Italian. We further utilized a corpus of four additional directives and their corresponding NIMs in English language for a thorough test of the USM approach. We evaluated the model by comparing our results with a gold standard consisting of official correlation tables (where available) or correspondences manually identified by domain experts. Our results indicate that USM was able to identify transpositions with average F-score values of 0.808, 0.736 and 0.708 for French, Italian and English Directive-NIM pairs respectively in the multilingual corpus. A comparison with state-of-the-art methods for text similarity illustrates that USM achieves a higher F-score and recall across both the corpora.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 16th edition of the International Conference on Articial Intelligence and Law

自引率

0.00%

发文量