Chuan-Jie Lin, Cheng-Wei Lee, Cheng-Wei Shih, W. Hsu
{"title":"ntcirr -10 RITE-2中文数据集的秩相关分析及评价指标","authors":"Chuan-Jie Lin, Cheng-Wei Lee, Cheng-Wei Shih, W. Hsu","doi":"10.1109/IRI.2013.6642454","DOIUrl":null,"url":null,"abstract":"Textual Entailment (TE) is the task of recognizing entailment, paraphrase, and contradiction relations between a given text pair. The goal of textual entailment research is to develop a core inference component that can be applied to various domains such as QA. We observed several rank correlations on the data and system results in the NTCIR-10 RITE-2 task, trying to find out correlations between datasets and evaluation metrics. We also constructed RITE4QA datasets in the RITE-2 task under the scenario of QA in order to see the applicability of RITE systems in QA. We find that datasets created from different sources and different ways can hardly predict each other. However, the system ranking on the dataset consisting of expert-made artificial pairs has moderate correlation with the ranking on QA metrics. Both RITE metrics and QA metrics are stable in terms of their own subtasks.","PeriodicalId":418492,"journal":{"name":"2013 IEEE 14th International Conference on Information Reuse & Integration (IRI)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Rank correlation analysis of NTCIR-10 RITE-2 Chinese datasets and evaluation metrics\",\"authors\":\"Chuan-Jie Lin, Cheng-Wei Lee, Cheng-Wei Shih, W. Hsu\",\"doi\":\"10.1109/IRI.2013.6642454\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Textual Entailment (TE) is the task of recognizing entailment, paraphrase, and contradiction relations between a given text pair. The goal of textual entailment research is to develop a core inference component that can be applied to various domains such as QA. We observed several rank correlations on the data and system results in the NTCIR-10 RITE-2 task, trying to find out correlations between datasets and evaluation metrics. We also constructed RITE4QA datasets in the RITE-2 task under the scenario of QA in order to see the applicability of RITE systems in QA. We find that datasets created from different sources and different ways can hardly predict each other. However, the system ranking on the dataset consisting of expert-made artificial pairs has moderate correlation with the ranking on QA metrics. Both RITE metrics and QA metrics are stable in terms of their own subtasks.\",\"PeriodicalId\":418492,\"journal\":{\"name\":\"2013 IEEE 14th International Conference on Information Reuse & Integration (IRI)\",\"volume\":\"40 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-10-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 IEEE 14th International Conference on Information Reuse & Integration (IRI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IRI.2013.6642454\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE 14th International Conference on Information Reuse & Integration (IRI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IRI.2013.6642454","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Rank correlation analysis of NTCIR-10 RITE-2 Chinese datasets and evaluation metrics
Textual Entailment (TE) is the task of recognizing entailment, paraphrase, and contradiction relations between a given text pair. The goal of textual entailment research is to develop a core inference component that can be applied to various domains such as QA. We observed several rank correlations on the data and system results in the NTCIR-10 RITE-2 task, trying to find out correlations between datasets and evaluation metrics. We also constructed RITE4QA datasets in the RITE-2 task under the scenario of QA in order to see the applicability of RITE systems in QA. We find that datasets created from different sources and different ways can hardly predict each other. However, the system ranking on the dataset consisting of expert-made artificial pairs has moderate correlation with the ranking on QA metrics. Both RITE metrics and QA metrics are stable in terms of their own subtasks.