{"title":"Benchmarking deep neural representations for synthetic data evaluation","authors":"Nuno Bento, Joana Rebelo, Marília Barandas","doi":"10.1016/j.iswa.2025.200580","DOIUrl":null,"url":null,"abstract":"<div><div>Robust and accurate evaluation metrics are crucial to test generative models and ensure their practical utility. However, the most common metrics heavily rely on the selected data representation and may not be strongly correlated with the ground truth, which itself can be difficult to obtain. This paper attempts to simplify this process by proposing a benchmark to compare data representations in an automatic manner, i.e. without relying on human evaluators. This is achieved through a simple test based on the assumption that samples with higher quality should lead to improved metric scores. Furthermore, we apply this benchmark on small, low-resolution image datasets to explore various representations, including embeddings finetuned either on the same dataset or on different datasets. An extensive evaluation shows the superiority of pretrained embeddings over randomly initialized representations, as well as evidence that embeddings trained on external, more diverse datasets outperform task-specific ones.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"28 ","pages":"Article 200580"},"PeriodicalIF":4.3000,"publicationDate":"2025-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligent Systems with Applications","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667305325001061","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Robust and accurate evaluation metrics are crucial to test generative models and ensure their practical utility. However, the most common metrics heavily rely on the selected data representation and may not be strongly correlated with the ground truth, which itself can be difficult to obtain. This paper attempts to simplify this process by proposing a benchmark to compare data representations in an automatic manner, i.e. without relying on human evaluators. This is achieved through a simple test based on the assumption that samples with higher quality should lead to improved metric scores. Furthermore, we apply this benchmark on small, low-resolution image datasets to explore various representations, including embeddings finetuned either on the same dataset or on different datasets. An extensive evaluation shows the superiority of pretrained embeddings over randomly initialized representations, as well as evidence that embeddings trained on external, more diverse datasets outperform task-specific ones.