评价学：评估科学与工程

BenchCouncil Transactions on Benchmarks, Standards and Evaluations Pub Date : 2024-03-01 DOI:10.1016/j.tbench.2024.100162

Jianfeng Zhan , Lei Wang , Wanling Gao , Hongxiao Li , Chenxi Wang , Yunyou Huang , Yatao Li , Zhengxin Yang , Guoxin Kang , Chunjie Luo , Hainan Ye , Shaopeng Dai , Zhifei Zhang

{"title":"评价学：评估科学与工程","authors":"Jianfeng Zhan , Lei Wang , Wanling Gao , Hongxiao Li , Chenxi Wang , Yunyou Huang , Yatao Li , Zhengxin Yang , Guoxin Kang , Chunjie Luo , Hainan Ye , Shaopeng Dai , Zhifei Zhang","doi":"10.1016/j.tbench.2024.100162","DOIUrl":null,"url":null,"abstract":"<div><p>Evaluation is a crucial aspect of human existence and plays a vital role in each field. However, it is often approached in an empirical and ad-hoc manner, lacking consensus on universal concepts, terminologies, theories, and methodologies. This lack of agreement has significant consequences. This article aims to formally introduce the discipline of evaluatology, which encompasses the science and engineering of evaluation. We propose a universal framework for evaluation, encompassing concepts, terminologies, theories, and methodologies that can be applied across various disciplines, if not all disciplines.</p><p>Our research reveals that the essence of evaluation lies in conducting experiments that intentionally apply a well-defined evaluation condition to individuals or systems under scrutiny, which we refer to as the <em>subjects</em>. This process allows for the creation of an evaluation system or model. By measuring and/or testing this evaluation system or model, we can infer the impact of different subjects. Derived from the essence of evaluation, we propose five axioms focusing on key aspects of evaluation outcomes as the foundational evaluation theory. These axioms serve as the bedrock upon which we build universal evaluation theories and methodologies. When evaluating a single subject, it is crucial to create evaluation conditions with different levels of equivalency. By applying these conditions to diverse subjects, we can establish reference evaluation models. These models allow us to alter a single independent variable at a time while keeping all other variables as controls. When evaluating complex scenarios, the key lies in establishing a series of evaluation models that maintain transitivity. Building upon the science of evaluation, we propose a formal definition of a benchmark as a simplified and sampled evaluation condition that guarantees different levels of equivalency. This concept serves as the cornerstone for a universal benchmark-based engineering approach to evaluation across various disciplines, which we refer to as benchmarkology.</p></div>","PeriodicalId":100155,"journal":{"name":"BenchCouncil Transactions on Benchmarks, Standards and Evaluations","volume":"4 1","pages":"Article 100162"},"PeriodicalIF":0.0000,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772485924000140/pdfft?md5=31c7470bd845fb50d0580585f84133b4&pid=1-s2.0-S2772485924000140-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Evaluatology: The science and engineering of evaluation\",\"authors\":\"Jianfeng Zhan , Lei Wang , Wanling Gao , Hongxiao Li , Chenxi Wang , Yunyou Huang , Yatao Li , Zhengxin Yang , Guoxin Kang , Chunjie Luo , Hainan Ye , Shaopeng Dai , Zhifei Zhang\",\"doi\":\"10.1016/j.tbench.2024.100162\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Evaluation is a crucial aspect of human existence and plays a vital role in each field. However, it is often approached in an empirical and ad-hoc manner, lacking consensus on universal concepts, terminologies, theories, and methodologies. This lack of agreement has significant consequences. This article aims to formally introduce the discipline of evaluatology, which encompasses the science and engineering of evaluation. We propose a universal framework for evaluation, encompassing concepts, terminologies, theories, and methodologies that can be applied across various disciplines, if not all disciplines.</p><p>Our research reveals that the essence of evaluation lies in conducting experiments that intentionally apply a well-defined evaluation condition to individuals or systems under scrutiny, which we refer to as the <em>subjects</em>. This process allows for the creation of an evaluation system or model. By measuring and/or testing this evaluation system or model, we can infer the impact of different subjects. Derived from the essence of evaluation, we propose five axioms focusing on key aspects of evaluation outcomes as the foundational evaluation theory. These axioms serve as the bedrock upon which we build universal evaluation theories and methodologies. When evaluating a single subject, it is crucial to create evaluation conditions with different levels of equivalency. By applying these conditions to diverse subjects, we can establish reference evaluation models. These models allow us to alter a single independent variable at a time while keeping all other variables as controls. When evaluating complex scenarios, the key lies in establishing a series of evaluation models that maintain transitivity. Building upon the science of evaluation, we propose a formal definition of a benchmark as a simplified and sampled evaluation condition that guarantees different levels of equivalency. This concept serves as the cornerstone for a universal benchmark-based engineering approach to evaluation across various disciplines, which we refer to as benchmarkology.</p></div>\",\"PeriodicalId\":100155,\"journal\":{\"name\":\"BenchCouncil Transactions on Benchmarks, Standards and Evaluations\",\"volume\":\"4 1\",\"pages\":\"Article 100162\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2772485924000140/pdfft?md5=31c7470bd845fb50d0580585f84133b4&pid=1-s2.0-S2772485924000140-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BenchCouncil Transactions on Benchmarks, Standards and Evaluations\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2772485924000140\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BenchCouncil Transactions on Benchmarks, Standards and Evaluations","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772485924000140","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

评价是人类生存的一个重要方面，在各个领域都发挥着至关重要的作用。然而，人们往往以经验主义和临时性的方式来对待它，对普遍的概念、术语、理论和方法缺乏共识。这种缺乏共识的现象造成了严重后果。本文旨在正式介绍评价学这一学科，它包括评价的科学和工程。我们提出了一个通用的评价框架，其中包含的概念、术语、理论和方法即使不能适用于所有学科，也可以适用于各个学科。我们的研究揭示了评价的本质在于进行实验，有意识地对被审查的个人或系统（我们称之为被试）施加一个定义明确的评价条件。通过这一过程，可以创建一个评价系统或模型。通过测量和/或测试这个评价系统或模型，我们可以推断出不同主体的影响。从评价的本质出发，我们提出了五个公理，作为评价的基础理论，这些公理集中在评价结果的关键方面。这些公理是我们建立通用评价理论和方法的基石。在评价单一科目时，关键是要创造不同等效水平的评价条件。通过将这些条件应用于不同的主题，我们可以建立参考评价模型。通过这些模型，我们可以一次改变一个独立变量，同时保留所有其他变量作为对照。在对复杂的情况进行评估时，关键在于建立一系列能够保持反向性的评估模型。在评估科学的基础上，我们提出了基准的正式定义，即保证不同等效水平的简化和抽样评估条件。这一概念是基于基准的通用工程评估方法的基石，适用于各个学科，我们称之为基准学。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Evaluatology: The science and engineering of evaluation

Evaluation is a crucial aspect of human existence and plays a vital role in each field. However, it is often approached in an empirical and ad-hoc manner, lacking consensus on universal concepts, terminologies, theories, and methodologies. This lack of agreement has significant consequences. This article aims to formally introduce the discipline of evaluatology, which encompasses the science and engineering of evaluation. We propose a universal framework for evaluation, encompassing concepts, terminologies, theories, and methodologies that can be applied across various disciplines, if not all disciplines.

Our research reveals that the essence of evaluation lies in conducting experiments that intentionally apply a well-defined evaluation condition to individuals or systems under scrutiny, which we refer to as the subjects. This process allows for the creation of an evaluation system or model. By measuring and/or testing this evaluation system or model, we can infer the impact of different subjects. Derived from the essence of evaluation, we propose five axioms focusing on key aspects of evaluation outcomes as the foundational evaluation theory. These axioms serve as the bedrock upon which we build universal evaluation theories and methodologies. When evaluating a single subject, it is crucial to create evaluation conditions with different levels of equivalency. By applying these conditions to diverse subjects, we can establish reference evaluation models. These models allow us to alter a single independent variable at a time while keeping all other variables as controls. When evaluating complex scenarios, the key lies in establishing a series of evaluation models that maintain transitivity. Building upon the science of evaluation, we propose a formal definition of a benchmark as a simplified and sampled evaluation condition that guarantees different levels of equivalency. This concept serves as the cornerstone for a universal benchmark-based engineering approach to evaluation across various disciplines, which we refer to as benchmarkology.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

BenchCouncil Transactions on Benchmarks, Standards and Evaluations

CiteScore

4.80

自引率

0.00%

发文量