Dataset versus reality: Understanding model performance from the perspective of information need

IF 2.8 2区管理学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of the Association for Information Science and Technology Pub Date : 2023-08-18 DOI:10.1002/asi.24825

Mengying Yu, Aixin Sun

{"title":"Dataset versus reality: Understanding model performance from the perspective of information need","authors":"Mengying Yu, Aixin Sun","doi":"10.1002/asi.24825","DOIUrl":null,"url":null,"abstract":"Deep learning technologies have brought us many models that outperform human beings on a few benchmarks. An interesting question is: can these models well solve real-world problems with similar settings (e.g., identical input/output) to the benchmark datasets? We argue that a model is trained to answer the same information need in a similar context (e.g., the information available), for which the training dataset is created. The trained model may be used to solve real-world problems for a similar information need in a similar context. However, information need is independent of the format of dataset input/output. Although some datasets may share high structural similarities, they may represent different research tasks aiming for answering different information needs. Examples are question–answer pairs for the question answering (QA) task, and image-caption pairs for the image captioning (IC) task. In this paper, we use the QA task and IC task as two case studies and compare their widely used benchmark datasets. From the perspective of information need in the context of information retrieval, we show the differences in the dataset creation processes and the differences in morphosyntactic properties between datasets. The differences in these datasets can be attributed to the different information needs and contexts of the specific research tasks. We encourage all researchers to consider the information need perspective of a research task when selecting the appropriate datasets to train a model. Likewise, while creating a dataset, researchers may also incorporate the information need perspective as a factor to determine the degree to which the dataset accurately reflects the real-world problem or the research task they intend to tackle.","PeriodicalId":48810,"journal":{"name":"Journal of the Association for Information Science and Technology","volume":"74 11","pages":"1293-1306"},"PeriodicalIF":2.8000,"publicationDate":"2023-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the Association for Information Science and Technology","FirstCategoryId":"91","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/asi.24825","RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 2

Abstract

Deep learning technologies have brought us many models that outperform human beings on a few benchmarks. An interesting question is: can these models well solve real-world problems with similar settings (e.g., identical input/output) to the benchmark datasets? We argue that a model is trained to answer the same information need in a similar context (e.g., the information available), for which the training dataset is created. The trained model may be used to solve real-world problems for a similar information need in a similar context. However, information need is independent of the format of dataset input/output. Although some datasets may share high structural similarities, they may represent different research tasks aiming for answering different information needs. Examples are question–answer pairs for the question answering (QA) task, and image-caption pairs for the image captioning (IC) task. In this paper, we use the QA task and IC task as two case studies and compare their widely used benchmark datasets. From the perspective of information need in the context of information retrieval, we show the differences in the dataset creation processes and the differences in morphosyntactic properties between datasets. The differences in these datasets can be attributed to the different information needs and contexts of the specific research tasks. We encourage all researchers to consider the information need perspective of a research task when selecting the appropriate datasets to train a model. Likewise, while creating a dataset, researchers may also incorporate the information need perspective as a factor to determine the degree to which the dataset accurately reflects the real-world problem or the research task they intend to tackle.

查看原文本刊更多论文

数据集与现实：从信息需求的角度理解模型性能

深度学习技术为我们带来了许多在一些基准上优于人类的模型。一个有趣的问题是:这些模型能很好地解决与基准数据集相似设置(例如，相同的输入/输出)的现实世界问题吗?我们认为，训练模型是为了在类似的上下文中回答相同的信息需求(例如，可用的信息)，为此创建了训练数据集。经过训练的模型可用于解决现实世界中类似背景下的类似信息需求问题。然而，信息需求与数据集输入/输出的格式无关。尽管一些数据集可能具有很高的结构相似性，但它们可能代表不同的研究任务，旨在回答不同的信息需求。例如用于问答(QA)任务的问题-答案对，以及用于图像说明(IC)任务的图像-标题对。在本文中，我们使用QA任务和IC任务作为两个案例研究，并比较了它们广泛使用的基准数据集。从信息检索背景下的信息需求角度，分析了数据集创建过程的差异和数据集形态句法特性的差异。这些数据集的差异可归因于不同的信息需求和具体研究任务的背景。我们鼓励所有研究人员在选择合适的数据集来训练模型时考虑研究任务的信息需求角度。同样，在创建数据集时，研究人员也可以将信息需求视角作为一个因素，以确定数据集准确反映现实世界问题或他们打算解决的研究任务的程度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of the Association for Information Science and Technology COMPUTER SCIENCE, INFORMATION SYSTEMS-

CiteScore

8.30

自引率

8.60%

发文量

115

期刊介绍： The Journal of the Association for Information Science and Technology (JASIST) is a leading international forum for peer-reviewed research in information science. For more than half a century, JASIST has provided intellectual leadership by publishing original research that focuses on the production, discovery, recording, storage, representation, retrieval, presentation, manipulation, dissemination, use, and evaluation of information and on the tools and techniques associated with these processes. The Journal welcomes rigorous work of an empirical, experimental, ethnographic, conceptual, historical, socio-technical, policy-analytic, or critical-theoretical nature. JASIST also commissions in-depth review articles (“Advances in Information Science”) and reviews of print and other media.