Exploring traffic pattern variability in vehicular federated learning

IF 4.3 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Computer Communications Pub Date : 2025-07-19 DOI:10.1016/j.comcom.2025.108279

Giuliano Fittipaldi , Rodrigo S. Couto , Luís H.M.K. Costa

{"title":"Exploring traffic pattern variability in vehicular federated learning","authors":"Giuliano Fittipaldi , Rodrigo S. Couto , Luís H.M.K. Costa","doi":"10.1016/j.comcom.2025.108279","DOIUrl":null,"url":null,"abstract":"<div><div>The emergence of software-defined vehicles has brought machine learning into the vehicular domain. To support these data-driven applications, techniques to incentivize users to share their vehicle data are crucial. Federated learning trains machine learning models in a distributed manner, leveraging client data without compromising its privacy. Nonetheless, in vehicular networks, the dynamic behavior of nodes affects client availability and the global model’s performance. Accordingly, this paper evaluates federated learning (FL) in a realistic vehicular network topology, accounting for real vehicle traffic in two Brazilian urban areas. The network simulation covers <span><math><mrow><mn>3</mn><mo>.</mo><mn>7</mn><mspace></mspace><msup><mrow><mi>km</mi></mrow><mrow><mn>2</mn></mrow></msup></mrow></math></span> with 1290 vehicles per hour and road speeds, based on real data. Our paper provides a comprehensive analysis of the impact that different traffic behaviors can yield during the training phase of a federated learning model. We observe that there is a performance decay in urban areas with longer vehicle permanence. Interestingly, longer vehicle participation in FL training leads to a biased final model with reduced generalization. We propose a novel approach to verify vehicle variability over time, by using the Dice-Sørensen coefficient to compare the set of clients participating in different rounds of training. By maintaining the vehicle variability over the rounds we can reduce the effect of the bias on the model, and – with a 47% reduction of the communication overhead – achieve faster learning, higher convergence in the first 15 rounds, and an equivalent final accuracy. Additionally, we extend our analysis by conducting simulations under more extreme traffic scenarios across multiple datasets, using a MobileNetV3. The results confirm that sustaining high vehicle variability – in scenarios with a brief participation of vehicles in the training – yields comparable model performance while saving up to 83.5 GB in communication costs.</div></div>","PeriodicalId":55224,"journal":{"name":"Computer Communications","volume":"242 ","pages":"Article 108279"},"PeriodicalIF":4.3000,"publicationDate":"2025-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Communications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0140366425002361","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

The emergence of software-defined vehicles has brought machine learning into the vehicular domain. To support these data-driven applications, techniques to incentivize users to share their vehicle data are crucial. Federated learning trains machine learning models in a distributed manner, leveraging client data without compromising its privacy. Nonetheless, in vehicular networks, the dynamic behavior of nodes affects client availability and the global model’s performance. Accordingly, this paper evaluates federated learning (FL) in a realistic vehicular network topology, accounting for real vehicle traffic in two Brazilian urban areas. The network simulation covers

3.7 {km}^{2}

with 1290 vehicles per hour and road speeds, based on real data. Our paper provides a comprehensive analysis of the impact that different traffic behaviors can yield during the training phase of a federated learning model. We observe that there is a performance decay in urban areas with longer vehicle permanence. Interestingly, longer vehicle participation in FL training leads to a biased final model with reduced generalization. We propose a novel approach to verify vehicle variability over time, by using the Dice-Sørensen coefficient to compare the set of clients participating in different rounds of training. By maintaining the vehicle variability over the rounds we can reduce the effect of the bias on the model, and – with a 47% reduction of the communication overhead – achieve faster learning, higher convergence in the first 15 rounds, and an equivalent final accuracy. Additionally, we extend our analysis by conducting simulations under more extreme traffic scenarios across multiple datasets, using a MobileNetV3. The results confirm that sustaining high vehicle variability – in scenarios with a brief participation of vehicles in the training – yields comparable model performance while saving up to 83.5 GB in communication costs.

查看原文本刊更多论文

探索车辆联合学习中交通模式的可变性

软件定义车辆的出现将机器学习带入了车辆领域。为了支持这些数据驱动的应用程序，激励用户共享其车辆数据的技术至关重要。联邦学习以分布式方式训练机器学习模型，利用客户端数据而不损害其隐私。然而，在车载网络中，节点的动态行为会影响客户端可用性和全局模型的性能。因此，本文在一个真实的车辆网络拓扑中评估了联邦学习（FL），考虑了两个巴西城市地区的真实车辆交通。网络模拟面积3.7平方公里，以1290辆车/小时和道路速度为基础，以真实数据为基础。本文全面分析了在联邦学习模型的训练阶段，不同的交通行为可能产生的影响。我们观察到，在车辆寿命较长的城市地区，车辆的性能会下降。有趣的是，车辆在FL训练中的参与时间越长，最终模型的泛化程度越低。我们提出了一种新的方法来验证车辆随时间的变化，通过使用Dice-Sørensen系数来比较参加不同轮次培训的客户集。通过保持车辆的可变性，我们可以减少偏差对模型的影响，并且-减少47%的通信开销-实现更快的学习，在前15轮中实现更高的收敛性，以及等效的最终精度。此外，我们通过使用MobileNetV3在多个数据集上进行更极端的交通场景模拟来扩展我们的分析。结果证实，在车辆短暂参与训练的情况下，保持高车辆可变性可以产生相当的模型性能，同时节省高达83.5 GB的通信成本。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computer Communications 工程技术-电信学

CiteScore

14.10

自引率

5.00%

发文量

397

审稿时长

66 days

期刊介绍： Computer and Communications networks are key infrastructures of the information society with high socio-economic value as they contribute to the correct operations of many critical services (from healthcare to finance and transportation). Internet is the core of today''s computer-communication infrastructures. This has transformed the Internet, from a robust network for data transfer between computers, to a global, content-rich, communication and information system where contents are increasingly generated by the users, and distributed according to human social relations. Next-generation network technologies, architectures and protocols are therefore required to overcome the limitations of the legacy Internet and add new capabilities and services. The future Internet should be ubiquitous, secure, resilient, and closer to human communication paradigms. Computer Communications is a peer-reviewed international journal that publishes high-quality scientific articles (both theory and practice) and survey papers covering all aspects of future computer communication networks (on all layers, except the physical layer), with a special attention to the evolution of the Internet architecture, protocols, services, and applications.