Odd Erik Gundersen, Saeid Shamsaliei, H. S. Kjærnli, H. Langseth
{"title":"On Reporting Robust and Trustworthy Conclusions from Model Comparison Studies Involving Neural Networks and Randomness","authors":"Odd Erik Gundersen, Saeid Shamsaliei, H. S. Kjærnli, H. Langseth","doi":"10.1145/3589806.3600044","DOIUrl":null,"url":null,"abstract":"The performance of neural networks differ when the only difference is the seed initializing the pseudo-random number generator that generates random numbers for their training. In this paper we are concerned with how random initialization affect the conclusions that we draw from experiments with neural networks. We run a high number of repeated experiments using state of the art models for time-series prediction and image classification to investigate this statistical phenomenon. Our investigations show that erroneous conclusions can easily be drawn from such experiments. Based on these observations we propose several measures that will improve the robustness and trustworthiness of conclusions inferred from model comparison studies with small absolute effect sizes.","PeriodicalId":393751,"journal":{"name":"Proceedings of the 2023 ACM Conference on Reproducibility and Replicability","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 ACM Conference on Reproducibility and Replicability","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3589806.3600044","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
The performance of neural networks differ when the only difference is the seed initializing the pseudo-random number generator that generates random numbers for their training. In this paper we are concerned with how random initialization affect the conclusions that we draw from experiments with neural networks. We run a high number of repeated experiments using state of the art models for time-series prediction and image classification to investigate this statistical phenomenon. Our investigations show that erroneous conclusions can easily be drawn from such experiments. Based on these observations we propose several measures that will improve the robustness and trustworthiness of conclusions inferred from model comparison studies with small absolute effect sizes.