ntcirr -10 RITE-2中文数据集的秩相关分析及评价指标

2013 IEEE 14th International Conference on Information Reuse & Integration (IRI) Pub Date : 2013-10-24 DOI:10.1109/IRI.2013.6642454

Chuan-Jie Lin, Cheng-Wei Lee, Cheng-Wei Shih, W. Hsu

{"title":"ntcirr -10 RITE-2中文数据集的秩相关分析及评价指标","authors":"Chuan-Jie Lin, Cheng-Wei Lee, Cheng-Wei Shih, W. Hsu","doi":"10.1109/IRI.2013.6642454","DOIUrl":null,"url":null,"abstract":"Textual Entailment (TE) is the task of recognizing entailment, paraphrase, and contradiction relations between a given text pair. The goal of textual entailment research is to develop a core inference component that can be applied to various domains such as QA. We observed several rank correlations on the data and system results in the NTCIR-10 RITE-2 task, trying to find out correlations between datasets and evaluation metrics. We also constructed RITE4QA datasets in the RITE-2 task under the scenario of QA in order to see the applicability of RITE systems in QA. We find that datasets created from different sources and different ways can hardly predict each other. However, the system ranking on the dataset consisting of expert-made artificial pairs has moderate correlation with the ranking on QA metrics. Both RITE metrics and QA metrics are stable in terms of their own subtasks.","PeriodicalId":418492,"journal":{"name":"2013 IEEE 14th International Conference on Information Reuse & Integration (IRI)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Rank correlation analysis of NTCIR-10 RITE-2 Chinese datasets and evaluation metrics\",\"authors\":\"Chuan-Jie Lin, Cheng-Wei Lee, Cheng-Wei Shih, W. Hsu\",\"doi\":\"10.1109/IRI.2013.6642454\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Textual Entailment (TE) is the task of recognizing entailment, paraphrase, and contradiction relations between a given text pair. The goal of textual entailment research is to develop a core inference component that can be applied to various domains such as QA. We observed several rank correlations on the data and system results in the NTCIR-10 RITE-2 task, trying to find out correlations between datasets and evaluation metrics. We also constructed RITE4QA datasets in the RITE-2 task under the scenario of QA in order to see the applicability of RITE systems in QA. We find that datasets created from different sources and different ways can hardly predict each other. However, the system ranking on the dataset consisting of expert-made artificial pairs has moderate correlation with the ranking on QA metrics. Both RITE metrics and QA metrics are stable in terms of their own subtasks.\",\"PeriodicalId\":418492,\"journal\":{\"name\":\"2013 IEEE 14th International Conference on Information Reuse & Integration (IRI)\",\"volume\":\"40 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-10-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 IEEE 14th International Conference on Information Reuse & Integration (IRI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IRI.2013.6642454\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE 14th International Conference on Information Reuse & Integration (IRI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IRI.2013.6642454","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

文本蕴涵(TE)是识别给定文本对之间的蕴涵、释义和矛盾关系的任务。文本蕴涵研究的目标是开发一个可以应用于各种领域的核心推理组件，如QA。我们在ntir -10 RITE-2任务中观察到数据和系统结果之间的几个等级相关性，试图找出数据集和评估指标之间的相关性。为了验证RITE系统在QA中的适用性，我们还在QA场景下的RITE-2任务中构建了RITE4QA数据集。我们发现从不同来源和不同方式创建的数据集很难相互预测。然而，系统在由专家人工配对组成的数据集上的排名与在QA指标上的排名有适度的相关性。RITE指标和QA指标在它们自己的子任务方面都是稳定的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Rank correlation analysis of NTCIR-10 RITE-2 Chinese datasets and evaluation metrics

Textual Entailment (TE) is the task of recognizing entailment, paraphrase, and contradiction relations between a given text pair. The goal of textual entailment research is to develop a core inference component that can be applied to various domains such as QA. We observed several rank correlations on the data and system results in the NTCIR-10 RITE-2 task, trying to find out correlations between datasets and evaluation metrics. We also constructed RITE4QA datasets in the RITE-2 task under the scenario of QA in order to see the applicability of RITE systems in QA. We find that datasets created from different sources and different ways can hardly predict each other. However, the system ranking on the dataset consisting of expert-made artificial pairs has moderate correlation with the ranking on QA metrics. Both RITE metrics and QA metrics are stable in terms of their own subtasks.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2013 IEEE 14th International Conference on Information Reuse & Integration (IRI)

自引率

0.00%

发文量