基准测试中的所有数据集都是必要的吗?文本分类中数据集评价的初步研究

North American Chapter of the Association for Computational Linguistics Pub Date : 2022-05-04 DOI:10.48550/arXiv.2205.02129

Yanghua Xiao, Jinlan Fu, See-Kiong Ng, Pengfei Liu

{"title":"基准测试中的所有数据集都是必要的吗?文本分类中数据集评价的初步研究","authors":"Yanghua Xiao, Jinlan Fu, See-Kiong Ng, Pengfei Liu","doi":"10.48550/arXiv.2205.02129","DOIUrl":null,"url":null,"abstract":"In this paper, we ask the research question of whether all the datasets in the benchmark are necessary. We approach this by first characterizing the distinguishability of datasets when comparing different systems. Experiments on 9 datasets and 36 systems show that several existing benchmark datasets contribute little to discriminating top-scoring systems, while those less used datasets exhibit impressive discriminative power. We further, taking the text classification task as a case study, investigate the possibility of predicting dataset discrimination based on its properties (e.g., average sentence length). Our preliminary experiments promisingly show that given a sufficient number of training experimental records, a meaningful predictor can be learned to estimate dataset discrimination over unseen datasets. We released all datasets with features explored in this work on DataLab.","PeriodicalId":382084,"journal":{"name":"North American Chapter of the Association for Computational Linguistics","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Are All the Datasets in Benchmark Necessary? A Pilot Study of Dataset Evaluation for Text Classification\",\"authors\":\"Yanghua Xiao, Jinlan Fu, See-Kiong Ng, Pengfei Liu\",\"doi\":\"10.48550/arXiv.2205.02129\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we ask the research question of whether all the datasets in the benchmark are necessary. We approach this by first characterizing the distinguishability of datasets when comparing different systems. Experiments on 9 datasets and 36 systems show that several existing benchmark datasets contribute little to discriminating top-scoring systems, while those less used datasets exhibit impressive discriminative power. We further, taking the text classification task as a case study, investigate the possibility of predicting dataset discrimination based on its properties (e.g., average sentence length). Our preliminary experiments promisingly show that given a sufficient number of training experimental records, a meaningful predictor can be learned to estimate dataset discrimination over unseen datasets. We released all datasets with features explored in this work on DataLab.\",\"PeriodicalId\":382084,\"journal\":{\"name\":\"North American Chapter of the Association for Computational Linguistics\",\"volume\":\"4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"North American Chapter of the Association for Computational Linguistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2205.02129\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"North American Chapter of the Association for Computational Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2205.02129","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在本文中，我们提出的研究问题是，是否所有的数据集的基准是必要的。我们通过在比较不同系统时首先描述数据集的可区别性来解决这个问题。在9个数据集和36个系统上的实验表明，一些现有的基准数据集对判别得分最高的系统贡献不大，而那些较少使用的数据集则表现出令人印象深刻的判别能力。我们进一步以文本分类任务为例，研究了基于其属性(如平均句子长度)预测数据集歧视的可能性。我们的初步实验很有希望地表明，给定足够数量的训练实验记录，可以学习一个有意义的预测器来估计未见数据集的数据集歧视。我们在DataLab上发布了所有具有本文所探索的特性的数据集。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Are All the Datasets in Benchmark Necessary? A Pilot Study of Dataset Evaluation for Text Classification

In this paper, we ask the research question of whether all the datasets in the benchmark are necessary. We approach this by first characterizing the distinguishability of datasets when comparing different systems. Experiments on 9 datasets and 36 systems show that several existing benchmark datasets contribute little to discriminating top-scoring systems, while those less used datasets exhibit impressive discriminative power. We further, taking the text classification task as a case study, investigate the possibility of predicting dataset discrimination based on its properties (e.g., average sentence length). Our preliminary experiments promisingly show that given a sufficient number of training experimental records, a meaningful predictor can be learned to estimate dataset discrimination over unseen datasets. We released all datasets with features explored in this work on DataLab.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

North American Chapter of the Association for Computational Linguistics

自引率

0.00%

发文量