{"title":"变数据频率下海事模拟器训练性能评估的预测精度","authors":"Ziaul Haque Munim , Fabian Kjeldsberg , Tae-Eun Kim , Morten Bustgaard","doi":"10.1016/j.array.2025.100489","DOIUrl":null,"url":null,"abstract":"<div><div>This study investigates how varying data sampling frequencies affect the classification accuracy of Machine Learning (ML) models when predicting student performance in maritime simulator training. ML-driven performance prediction is an essential part of Predictive Learning Analytics (PLA). If acceptable prediction accuracy can be achieved by using lower frequency data with larger time intervals between recorded data points, valuable resources in terms of data storage, handling, and computational cost, can be potentially saved. This study utilizes simulator log data from navigation students performing a <em>Williamson Turn</em> in both Ballast and Loaded ship conditions on a desktop simulator. Data frequencies ranging from 01 to 09 second intervals are examined. Results are evaluated by Area Under the Curve (AUC), Accuracy, Log Loss, Precision, Recall, and F1 Scores. The eXtreme Gradient Boosted Trees, variants of Keras Residual Neural Network, and Light Gradient Boosted Trees are trained on 87.5 % and tested on 12.5 % of the data. The best accuracy measurement scores are achieved on the 1-s frequency intervals in both ballast and loaded condition analysis. Further, the 1-s frequency intervals models are also the fastest and require less Random Access Memory (RAM). With reducing data frequency intervals, the model evaluation metrics deteriorate.</div></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"27 ","pages":"Article 100489"},"PeriodicalIF":4.5000,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Prediction accuracy in maritime simulator training performance assessment with varying data frequency\",\"authors\":\"Ziaul Haque Munim , Fabian Kjeldsberg , Tae-Eun Kim , Morten Bustgaard\",\"doi\":\"10.1016/j.array.2025.100489\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>This study investigates how varying data sampling frequencies affect the classification accuracy of Machine Learning (ML) models when predicting student performance in maritime simulator training. ML-driven performance prediction is an essential part of Predictive Learning Analytics (PLA). If acceptable prediction accuracy can be achieved by using lower frequency data with larger time intervals between recorded data points, valuable resources in terms of data storage, handling, and computational cost, can be potentially saved. This study utilizes simulator log data from navigation students performing a <em>Williamson Turn</em> in both Ballast and Loaded ship conditions on a desktop simulator. Data frequencies ranging from 01 to 09 second intervals are examined. Results are evaluated by Area Under the Curve (AUC), Accuracy, Log Loss, Precision, Recall, and F1 Scores. The eXtreme Gradient Boosted Trees, variants of Keras Residual Neural Network, and Light Gradient Boosted Trees are trained on 87.5 % and tested on 12.5 % of the data. The best accuracy measurement scores are achieved on the 1-s frequency intervals in both ballast and loaded condition analysis. Further, the 1-s frequency intervals models are also the fastest and require less Random Access Memory (RAM). With reducing data frequency intervals, the model evaluation metrics deteriorate.</div></div>\",\"PeriodicalId\":8417,\"journal\":{\"name\":\"Array\",\"volume\":\"27 \",\"pages\":\"Article 100489\"},\"PeriodicalIF\":4.5000,\"publicationDate\":\"2025-08-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Array\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S259000562500116X\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Array","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S259000562500116X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
Prediction accuracy in maritime simulator training performance assessment with varying data frequency
This study investigates how varying data sampling frequencies affect the classification accuracy of Machine Learning (ML) models when predicting student performance in maritime simulator training. ML-driven performance prediction is an essential part of Predictive Learning Analytics (PLA). If acceptable prediction accuracy can be achieved by using lower frequency data with larger time intervals between recorded data points, valuable resources in terms of data storage, handling, and computational cost, can be potentially saved. This study utilizes simulator log data from navigation students performing a Williamson Turn in both Ballast and Loaded ship conditions on a desktop simulator. Data frequencies ranging from 01 to 09 second intervals are examined. Results are evaluated by Area Under the Curve (AUC), Accuracy, Log Loss, Precision, Recall, and F1 Scores. The eXtreme Gradient Boosted Trees, variants of Keras Residual Neural Network, and Light Gradient Boosted Trees are trained on 87.5 % and tested on 12.5 % of the data. The best accuracy measurement scores are achieved on the 1-s frequency intervals in both ballast and loaded condition analysis. Further, the 1-s frequency intervals models are also the fastest and require less Random Access Memory (RAM). With reducing data frequency intervals, the model evaluation metrics deteriorate.