一种评估树库解析难度的信息论方法

ACM Trans. Speech Lang. Process. Pub Date : 1900-01-01 DOI:10.1145/2407736.2407737

A. Corazza, A. Lavelli, G. Satta

{"title":"一种评估树库解析难度的信息论方法","authors":"A. Corazza, A. Lavelli, G. Satta","doi":"10.1145/2407736.2407737","DOIUrl":null,"url":null,"abstract":"With the growing interest in statistical parsing, special attention has recently been devoted to the problem of comparing different treebanks to assess which languages or domains are more difficult to parse relative to a given model. A common methodology for comparing parsing difficulty across treebanks is based on the use of the standard labeled precision and recall measures. As an alternative, in this article we propose an information-theoretic measure, called the expected conditional cross-entropy (ECC). One important advantage with respect to standard performance measures is that ECC can be directly expressed as a function of the parameters of the model. We evaluate ECC across several treebanks for English, French, German, and Italian, and show that ECC is an effective measure of parsing difficulty, with an increase in ECC always accompanied by a degradation in parsing accuracy.","PeriodicalId":412532,"journal":{"name":"ACM Trans. Speech Lang. Process.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"An information-theoretic measure to evaluate parsing difficulty across treebanks\",\"authors\":\"A. Corazza, A. Lavelli, G. Satta\",\"doi\":\"10.1145/2407736.2407737\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the growing interest in statistical parsing, special attention has recently been devoted to the problem of comparing different treebanks to assess which languages or domains are more difficult to parse relative to a given model. A common methodology for comparing parsing difficulty across treebanks is based on the use of the standard labeled precision and recall measures. As an alternative, in this article we propose an information-theoretic measure, called the expected conditional cross-entropy (ECC). One important advantage with respect to standard performance measures is that ECC can be directly expressed as a function of the parameters of the model. We evaluate ECC across several treebanks for English, French, German, and Italian, and show that ECC is an effective measure of parsing difficulty, with an increase in ECC always accompanied by a degradation in parsing accuracy.\",\"PeriodicalId\":412532,\"journal\":{\"name\":\"ACM Trans. Speech Lang. Process.\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Trans. Speech Lang. Process.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2407736.2407737\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Trans. Speech Lang. Process.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2407736.2407737","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

随着对统计解析的兴趣日益浓厚，最近人们特别关注比较不同树库的问题，以评估相对于给定模型，哪些语言或领域更难解析。比较树库解析难度的一种常用方法是基于标准标记精度和召回度量的使用。作为替代方案，在本文中我们提出了一种信息论度量，称为期望条件交叉熵(ECC)。关于标准性能度量的一个重要优点是，ECC可以直接表示为模型参数的函数。我们在英语、法语、德语和意大利语的几个树库中评估了ECC，并表明ECC是分析难度的有效度量，ECC的增加总是伴随着解析精度的降低。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An information-theoretic measure to evaluate parsing difficulty across treebanks

With the growing interest in statistical parsing, special attention has recently been devoted to the problem of comparing different treebanks to assess which languages or domains are more difficult to parse relative to a given model. A common methodology for comparing parsing difficulty across treebanks is based on the use of the standard labeled precision and recall measures. As an alternative, in this article we propose an information-theoretic measure, called the expected conditional cross-entropy (ECC). One important advantage with respect to standard performance measures is that ECC can be directly expressed as a function of the parameters of the model. We evaluate ECC across several treebanks for English, French, German, and Italian, and show that ECC is an effective measure of parsing difficulty, with an increase in ECC always accompanied by a degradation in parsing accuracy.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Trans. Speech Lang. Process.

自引率

0.00%

发文量