Giuliano Fittipaldi , Rodrigo S. Couto , Luís H.M.K. Costa
{"title":"Exploring traffic pattern variability in vehicular federated learning","authors":"Giuliano Fittipaldi , Rodrigo S. Couto , Luís H.M.K. Costa","doi":"10.1016/j.comcom.2025.108279","DOIUrl":null,"url":null,"abstract":"<div><div>The emergence of software-defined vehicles has brought machine learning into the vehicular domain. To support these data-driven applications, techniques to incentivize users to share their vehicle data are crucial. Federated learning trains machine learning models in a distributed manner, leveraging client data without compromising its privacy. Nonetheless, in vehicular networks, the dynamic behavior of nodes affects client availability and the global model’s performance. Accordingly, this paper evaluates federated learning (FL) in a realistic vehicular network topology, accounting for real vehicle traffic in two Brazilian urban areas. The network simulation covers <span><math><mrow><mn>3</mn><mo>.</mo><mn>7</mn><mspace></mspace><msup><mrow><mi>km</mi></mrow><mrow><mn>2</mn></mrow></msup></mrow></math></span> with 1290 vehicles per hour and road speeds, based on real data. Our paper provides a comprehensive analysis of the impact that different traffic behaviors can yield during the training phase of a federated learning model. We observe that there is a performance decay in urban areas with longer vehicle permanence. Interestingly, longer vehicle participation in FL training leads to a biased final model with reduced generalization. We propose a novel approach to verify vehicle variability over time, by using the Dice-Sørensen coefficient to compare the set of clients participating in different rounds of training. By maintaining the vehicle variability over the rounds we can reduce the effect of the bias on the model, and – with a 47% reduction of the communication overhead – achieve faster learning, higher convergence in the first 15 rounds, and an equivalent final accuracy. Additionally, we extend our analysis by conducting simulations under more extreme traffic scenarios across multiple datasets, using a MobileNetV3. The results confirm that sustaining high vehicle variability – in scenarios with a brief participation of vehicles in the training – yields comparable model performance while saving up to 83.5 GB in communication costs.</div></div>","PeriodicalId":55224,"journal":{"name":"Computer Communications","volume":"242 ","pages":"Article 108279"},"PeriodicalIF":4.3000,"publicationDate":"2025-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Communications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0140366425002361","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
The emergence of software-defined vehicles has brought machine learning into the vehicular domain. To support these data-driven applications, techniques to incentivize users to share their vehicle data are crucial. Federated learning trains machine learning models in a distributed manner, leveraging client data without compromising its privacy. Nonetheless, in vehicular networks, the dynamic behavior of nodes affects client availability and the global model’s performance. Accordingly, this paper evaluates federated learning (FL) in a realistic vehicular network topology, accounting for real vehicle traffic in two Brazilian urban areas. The network simulation covers with 1290 vehicles per hour and road speeds, based on real data. Our paper provides a comprehensive analysis of the impact that different traffic behaviors can yield during the training phase of a federated learning model. We observe that there is a performance decay in urban areas with longer vehicle permanence. Interestingly, longer vehicle participation in FL training leads to a biased final model with reduced generalization. We propose a novel approach to verify vehicle variability over time, by using the Dice-Sørensen coefficient to compare the set of clients participating in different rounds of training. By maintaining the vehicle variability over the rounds we can reduce the effect of the bias on the model, and – with a 47% reduction of the communication overhead – achieve faster learning, higher convergence in the first 15 rounds, and an equivalent final accuracy. Additionally, we extend our analysis by conducting simulations under more extreme traffic scenarios across multiple datasets, using a MobileNetV3. The results confirm that sustaining high vehicle variability – in scenarios with a brief participation of vehicles in the training – yields comparable model performance while saving up to 83.5 GB in communication costs.
期刊介绍:
Computer and Communications networks are key infrastructures of the information society with high socio-economic value as they contribute to the correct operations of many critical services (from healthcare to finance and transportation). Internet is the core of today''s computer-communication infrastructures. This has transformed the Internet, from a robust network for data transfer between computers, to a global, content-rich, communication and information system where contents are increasingly generated by the users, and distributed according to human social relations. Next-generation network technologies, architectures and protocols are therefore required to overcome the limitations of the legacy Internet and add new capabilities and services. The future Internet should be ubiquitous, secure, resilient, and closer to human communication paradigms.
Computer Communications is a peer-reviewed international journal that publishes high-quality scientific articles (both theory and practice) and survey papers covering all aspects of future computer communication networks (on all layers, except the physical layer), with a special attention to the evolution of the Internet architecture, protocols, services, and applications.