{"title":"最新技术:标杆文化的时间秩序。","authors":"Alexander Campolo","doi":"10.1007/s44206-025-00190-x","DOIUrl":null,"url":null,"abstract":"<p><p>This commentary situates the epistemic values of machine learning's culture of benchmarking and evaluation within larger temporal structures. Beyond questions of validity, whether model comparisons are statistically valid or whether benchmarks adequately represent meaningful tasks or capabilities, it asks how benchmarks produce certain temporal values and expectations. It articulates two hypotheses in response: the first, termed normalizing research, seeks to characterize how benchmarking simultaneously serves a disciplining and motivating function in research, with the effect of minimizing conflict. The second, termed extrapolation, argues that the incremental, progressive rhythm of benchmarking is oriented less towards the future than towards a present state-of-the-art (SOTA). Together, these hypotheses inform a diagnosis of the presentist temporality of benchmarking and evaluation in machine learning.</p>","PeriodicalId":72819,"journal":{"name":"Digital society : ethics, socio-legal and governance of digital technology","volume":"4 2","pages":"35"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12048445/pdf/","citationCount":"0","resultStr":"{\"title\":\"State-of-the-Art: The Temporal Order of Benchmarking Culture.\",\"authors\":\"Alexander Campolo\",\"doi\":\"10.1007/s44206-025-00190-x\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>This commentary situates the epistemic values of machine learning's culture of benchmarking and evaluation within larger temporal structures. Beyond questions of validity, whether model comparisons are statistically valid or whether benchmarks adequately represent meaningful tasks or capabilities, it asks how benchmarks produce certain temporal values and expectations. It articulates two hypotheses in response: the first, termed normalizing research, seeks to characterize how benchmarking simultaneously serves a disciplining and motivating function in research, with the effect of minimizing conflict. The second, termed extrapolation, argues that the incremental, progressive rhythm of benchmarking is oriented less towards the future than towards a present state-of-the-art (SOTA). Together, these hypotheses inform a diagnosis of the presentist temporality of benchmarking and evaluation in machine learning.</p>\",\"PeriodicalId\":72819,\"journal\":{\"name\":\"Digital society : ethics, socio-legal and governance of digital technology\",\"volume\":\"4 2\",\"pages\":\"35\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12048445/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Digital society : ethics, socio-legal and governance of digital technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s44206-025-00190-x\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/5/2 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital society : ethics, socio-legal and governance of digital technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s44206-025-00190-x","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/5/2 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
State-of-the-Art: The Temporal Order of Benchmarking Culture.
This commentary situates the epistemic values of machine learning's culture of benchmarking and evaluation within larger temporal structures. Beyond questions of validity, whether model comparisons are statistically valid or whether benchmarks adequately represent meaningful tasks or capabilities, it asks how benchmarks produce certain temporal values and expectations. It articulates two hypotheses in response: the first, termed normalizing research, seeks to characterize how benchmarking simultaneously serves a disciplining and motivating function in research, with the effect of minimizing conflict. The second, termed extrapolation, argues that the incremental, progressive rhythm of benchmarking is oriented less towards the future than towards a present state-of-the-art (SOTA). Together, these hypotheses inform a diagnosis of the presentist temporality of benchmarking and evaluation in machine learning.