开发基准:过程和新范式的重要性

R. Ordelman
{"title":"开发基准:过程和新范式的重要性","authors":"R. Ordelman","doi":"10.1145/2983554.2983562","DOIUrl":null,"url":null,"abstract":"The value and importance of Benchmark Evaluations is widely acknowledged. Benchmarks play a key role in many research projects. It takes time, a well-balanced team of domain specialists preferably with links to the user community and industry, and a strong involvement of the research community itself to establish a sound evaluation framework that includes (annotated) data sets, well-defined tasks that reflect the needs in the 'real world', a proper evaluation methodology, ground-truth, including a strategy for repetitive assessments, and last but not least, funding. Although the benefits of an evaluation framework are typically reviewed from a perspective of 'research output' --e.g., a scientific publication demonstrating an advance of a certain methodology-- it is important to be aware of the value of the process of creating a benchmark itself: it increases significantly the understanding of the problem we want to address and as a consequence also the impact of the evaluation outcomes. In this talk I will overview the history of a series of tasks focusing on audiovisual search emphasizing its 'multimodal' aspects, starting in 2006 with the workshop on 'Searching Spontaneous Conversational Speech' that led to tasks in CLEF and MediaEval (\"Search and Hyperlinking\"), and recently also TRECVid (\"Video Hyperlinking\"). The focus of my talk will be on the process rather than on the results of these evaluations themselves, and will address cross-benchmark connections, and new benchmark paradigms, specifically the integration of benchmarking in industrial 'living labs' that are becoming popular in some domains.","PeriodicalId":340803,"journal":{"name":"Proceedings of the 2016 ACM Workshop on Multimedia COMMONS","volume":"99 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Developing Benchmarks: The Importance of the Process and New Paradigms\",\"authors\":\"R. Ordelman\",\"doi\":\"10.1145/2983554.2983562\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The value and importance of Benchmark Evaluations is widely acknowledged. Benchmarks play a key role in many research projects. It takes time, a well-balanced team of domain specialists preferably with links to the user community and industry, and a strong involvement of the research community itself to establish a sound evaluation framework that includes (annotated) data sets, well-defined tasks that reflect the needs in the 'real world', a proper evaluation methodology, ground-truth, including a strategy for repetitive assessments, and last but not least, funding. Although the benefits of an evaluation framework are typically reviewed from a perspective of 'research output' --e.g., a scientific publication demonstrating an advance of a certain methodology-- it is important to be aware of the value of the process of creating a benchmark itself: it increases significantly the understanding of the problem we want to address and as a consequence also the impact of the evaluation outcomes. In this talk I will overview the history of a series of tasks focusing on audiovisual search emphasizing its 'multimodal' aspects, starting in 2006 with the workshop on 'Searching Spontaneous Conversational Speech' that led to tasks in CLEF and MediaEval (\\\"Search and Hyperlinking\\\"), and recently also TRECVid (\\\"Video Hyperlinking\\\"). The focus of my talk will be on the process rather than on the results of these evaluations themselves, and will address cross-benchmark connections, and new benchmark paradigms, specifically the integration of benchmarking in industrial 'living labs' that are becoming popular in some domains.\",\"PeriodicalId\":340803,\"journal\":{\"name\":\"Proceedings of the 2016 ACM Workshop on Multimedia COMMONS\",\"volume\":\"99 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-10-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2016 ACM Workshop on Multimedia COMMONS\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2983554.2983562\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2016 ACM Workshop on Multimedia COMMONS","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2983554.2983562","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

基准评估的价值和重要性已得到广泛认可。基准在许多研究项目中发挥着关键作用。建立一个健全的评估框架需要时间,一个由领域专家组成的平衡良好的团队(最好与用户社区和行业有联系),以及研究界本身的强烈参与,该框架包括(注释的)数据集,反映“现实世界”需求的定义良好的任务,适当的评估方法,基本事实,包括重复评估的策略,最后但并非最不重要的是,资金。虽然评估框架的好处通常是从“研究产出”的角度来评估的——例如,科学出版物展示了某种方法的进步——但重要的是要意识到创建基准过程本身的价值:它显著地增加了对我们想要解决的问题的理解,因此也增加了评估结果的影响。在这次演讲中,我将概述一系列专注于视听搜索的任务的历史,强调其“多模态”方面,从2006年的“搜索自发对话语音”研讨会开始,该研讨会导致了CLEF和MediaEval(“搜索和超链接”)的任务,以及最近的TRECVid(“视频超链接”)。我演讲的重点将放在过程上,而不是这些评估本身的结果上,并将讨论跨基准连接和新的基准范例,特别是在某些领域变得流行的工业“生活实验室”中的基准整合。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Developing Benchmarks: The Importance of the Process and New Paradigms
The value and importance of Benchmark Evaluations is widely acknowledged. Benchmarks play a key role in many research projects. It takes time, a well-balanced team of domain specialists preferably with links to the user community and industry, and a strong involvement of the research community itself to establish a sound evaluation framework that includes (annotated) data sets, well-defined tasks that reflect the needs in the 'real world', a proper evaluation methodology, ground-truth, including a strategy for repetitive assessments, and last but not least, funding. Although the benefits of an evaluation framework are typically reviewed from a perspective of 'research output' --e.g., a scientific publication demonstrating an advance of a certain methodology-- it is important to be aware of the value of the process of creating a benchmark itself: it increases significantly the understanding of the problem we want to address and as a consequence also the impact of the evaluation outcomes. In this talk I will overview the history of a series of tasks focusing on audiovisual search emphasizing its 'multimodal' aspects, starting in 2006 with the workshop on 'Searching Spontaneous Conversational Speech' that led to tasks in CLEF and MediaEval ("Search and Hyperlinking"), and recently also TRECVid ("Video Hyperlinking"). The focus of my talk will be on the process rather than on the results of these evaluations themselves, and will address cross-benchmark connections, and new benchmark paradigms, specifically the integration of benchmarking in industrial 'living labs' that are becoming popular in some domains.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信