Rohan Nanda, Luigi Di Caro, G. Boella, Hristo Konstantinov, Tenyo Tyankov, Daniel Traykov, H. Hristov, F. Costamagna, Llio Humphreys, L. Robaldo, Michele Romano
{"title":"用于自动识别欧盟指令的国家实施的统一相似性度量","authors":"Rohan Nanda, Luigi Di Caro, G. Boella, Hristo Konstantinov, Tenyo Tyankov, Daniel Traykov, H. Hristov, F. Costamagna, Llio Humphreys, L. Robaldo, Michele Romano","doi":"10.1145/3086512.3086527","DOIUrl":null,"url":null,"abstract":"This paper presents a unifying text similarity measure (USM) for automated identification of national implementations of European Union (EU) directives. The proposed model retrieves the transposed provisions of national law at a fine-grained level for each article of the directive. USM incorporates methods for matching common words, common sequences of words and approximate string matching. It was used for identifying transpositions on a multilingual corpus of four directives and their corresponding national implementing measures (NIMs) in three different languages : English, French and Italian. We further utilized a corpus of four additional directives and their corresponding NIMs in English language for a thorough test of the USM approach. We evaluated the model by comparing our results with a gold standard consisting of official correlation tables (where available) or correspondences manually identified by domain experts. Our results indicate that USM was able to identify transpositions with average F-score values of 0.808, 0.736 and 0.708 for French, Italian and English Directive-NIM pairs respectively in the multilingual corpus. A comparison with state-of-the-art methods for text similarity illustrates that USM achieves a higher F-score and recall across both the corpora.","PeriodicalId":425187,"journal":{"name":"Proceedings of the 16th edition of the International Conference on Articial Intelligence and Law","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"A unifying similarity measure for automated identification of national implementations of european union directives\",\"authors\":\"Rohan Nanda, Luigi Di Caro, G. Boella, Hristo Konstantinov, Tenyo Tyankov, Daniel Traykov, H. Hristov, F. Costamagna, Llio Humphreys, L. Robaldo, Michele Romano\",\"doi\":\"10.1145/3086512.3086527\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a unifying text similarity measure (USM) for automated identification of national implementations of European Union (EU) directives. The proposed model retrieves the transposed provisions of national law at a fine-grained level for each article of the directive. USM incorporates methods for matching common words, common sequences of words and approximate string matching. It was used for identifying transpositions on a multilingual corpus of four directives and their corresponding national implementing measures (NIMs) in three different languages : English, French and Italian. We further utilized a corpus of four additional directives and their corresponding NIMs in English language for a thorough test of the USM approach. We evaluated the model by comparing our results with a gold standard consisting of official correlation tables (where available) or correspondences manually identified by domain experts. Our results indicate that USM was able to identify transpositions with average F-score values of 0.808, 0.736 and 0.708 for French, Italian and English Directive-NIM pairs respectively in the multilingual corpus. A comparison with state-of-the-art methods for text similarity illustrates that USM achieves a higher F-score and recall across both the corpora.\",\"PeriodicalId\":425187,\"journal\":{\"name\":\"Proceedings of the 16th edition of the International Conference on Articial Intelligence and Law\",\"volume\":\"37 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-06-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 16th edition of the International Conference on Articial Intelligence and Law\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3086512.3086527\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 16th edition of the International Conference on Articial Intelligence and Law","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3086512.3086527","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A unifying similarity measure for automated identification of national implementations of european union directives
This paper presents a unifying text similarity measure (USM) for automated identification of national implementations of European Union (EU) directives. The proposed model retrieves the transposed provisions of national law at a fine-grained level for each article of the directive. USM incorporates methods for matching common words, common sequences of words and approximate string matching. It was used for identifying transpositions on a multilingual corpus of four directives and their corresponding national implementing measures (NIMs) in three different languages : English, French and Italian. We further utilized a corpus of four additional directives and their corresponding NIMs in English language for a thorough test of the USM approach. We evaluated the model by comparing our results with a gold standard consisting of official correlation tables (where available) or correspondences manually identified by domain experts. Our results indicate that USM was able to identify transpositions with average F-score values of 0.808, 0.736 and 0.708 for French, Italian and English Directive-NIM pairs respectively in the multilingual corpus. A comparison with state-of-the-art methods for text similarity illustrates that USM achieves a higher F-score and recall across both the corpora.