The Impact of Combining Datasets on the Robustness of Deep Learning Architectures: A Cross-Dataset Analysis

IF 3.6 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Access Pub Date : 2025-09-01 DOI:10.1109/ACCESS.2025.3604689

Ricardo Buettner;Susan Bertram;Leopold Fischer-Brandies

{"title":"The Impact of Combining Datasets on the Robustness of Deep Learning Architectures: A Cross-Dataset Analysis","authors":"Ricardo Buettner;Susan Bertram;Leopold Fischer-Brandies","doi":"10.1109/ACCESS.2025.3604689","DOIUrl":null,"url":null,"abstract":"Convolutional Neural Networks have become a widely used technology. Typically, the performance of CNN models is measured using an accuracy score obtained on a single dataset. This often results in systems that perform significantly worse in real-world applications. The robustness of the model to unseen environments is still underresearched. Specifically, a research gap remains on how training data affects the robustness of deep learning systems. This research article investigates the impact of combining data in training datasets on the robustness and performance of deep learning models through a cross-dataset analysis. We employ a transfer learning approach to train deep learning models based on four popular architectures and two different datasets, as well as a combination of both datasets. Our results demonstrate that combining two datasets can improve robustness, but the specific effects on performance can vary between architectures, leading to a slight decrease in accuracy in most observed cases, or even an accuracy gain. Furthermore, we find that training on more complex datasets tends to outperform training on simpler datasets in cross-evaluation settings, indicating that models trained on more complex training datasets are more robust. However, we also observe that a simpler architecture fails to generalize when trained on the combined training data, indicating the need for caution and extensive evaluation when combining datasets during the development cycle of deep learning systems.","PeriodicalId":13079,"journal":{"name":"IEEE Access","volume":"13 ","pages":"151993-152009"},"PeriodicalIF":3.6000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11145443","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Access","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11145443/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Convolutional Neural Networks have become a widely used technology. Typically, the performance of CNN models is measured using an accuracy score obtained on a single dataset. This often results in systems that perform significantly worse in real-world applications. The robustness of the model to unseen environments is still underresearched. Specifically, a research gap remains on how training data affects the robustness of deep learning systems. This research article investigates the impact of combining data in training datasets on the robustness and performance of deep learning models through a cross-dataset analysis. We employ a transfer learning approach to train deep learning models based on four popular architectures and two different datasets, as well as a combination of both datasets. Our results demonstrate that combining two datasets can improve robustness, but the specific effects on performance can vary between architectures, leading to a slight decrease in accuracy in most observed cases, or even an accuracy gain. Furthermore, we find that training on more complex datasets tends to outperform training on simpler datasets in cross-evaluation settings, indicating that models trained on more complex training datasets are more robust. However, we also observe that a simpler architecture fails to generalize when trained on the combined training data, indicating the need for caution and extensive evaluation when combining datasets during the development cycle of deep learning systems.

查看原文本刊更多论文

组合数据集对深度学习架构鲁棒性的影响：跨数据集分析

卷积神经网络已经成为一种广泛应用的技术。通常，CNN模型的性能是使用在单个数据集上获得的精度分数来衡量的。这通常会导致系统在实际应用程序中的表现明显变差。该模型对未知环境的鲁棒性仍未得到充分研究。具体来说，关于训练数据如何影响深度学习系统的鲁棒性的研究仍然存在空白。本文通过跨数据集分析，探讨了训练数据集中的数据组合对深度学习模型鲁棒性和性能的影响。我们采用迁移学习方法来训练基于四种流行架构和两种不同数据集的深度学习模型，以及两种数据集的组合。我们的结果表明，结合两个数据集可以提高鲁棒性，但对性能的具体影响可能因架构而异，在大多数观察到的情况下，导致准确性略有下降，甚至准确性增加。此外，我们发现在交叉评估设置中，在更复杂数据集上的训练往往优于在更简单数据集上的训练，这表明在更复杂的训练数据集上训练的模型更具鲁棒性。然而，我们也观察到，当在组合的训练数据上进行训练时，更简单的架构无法泛化，这表明在深度学习系统的开发周期中，在组合数据集时需要谨慎和广泛的评估。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Access COMPUTER SCIENCE, INFORMATION SYSTEMSENGIN-ENGINEERING, ELECTRICAL & ELECTRONIC

CiteScore

9.80

自引率

7.70%

发文量

6673

审稿时长

6 weeks

期刊介绍： IEEE Access® is a multidisciplinary, open access (OA), applications-oriented, all-electronic archival journal that continuously presents the results of original research or development across all of IEEE''s fields of interest. IEEE Access will publish articles that are of high interest to readers, original, technically correct, and clearly presented. Supported by author publication charges (APC), its hallmarks are a rapid peer review and publication process with open access to all readers. Unlike IEEE''s traditional Transactions or Journals, reviews are "binary", in that reviewers will either Accept or Reject an article in the form it is submitted in order to achieve rapid turnaround. Especially encouraged are submissions on: Multidisciplinary topics, or applications-oriented articles and negative results that do not fit within the scope of IEEE''s traditional journals. Practical articles discussing new experiments or measurement techniques, interesting solutions to engineering. Development of new or improved fabrication or manufacturing techniques. Reviews or survey articles of new or evolving fields oriented to assist others in understanding the new area.