Penghui Li , Qianru Dong , Xiangjun Zhao , Chao Lu , Mengxia Hu , Xuedong Yan , Chunjiao Dong
{"title":"Clustering of freeway cut-in scenarios for automated vehicle development considering data dimensionality and imbalance","authors":"Penghui Li , Qianru Dong , Xiangjun Zhao , Chao Lu , Mengxia Hu , Xuedong Yan , Chunjiao Dong","doi":"10.1016/j.aap.2025.108151","DOIUrl":null,"url":null,"abstract":"<div><div>Representative driving scenarios derived by clustering of naturalistic driving data are guidelines for the function definition and algorithm development of automated vehicle. However, current clustering methods struggle with data dimensionality and imbalance, leading to significant biases. To tackle these issues, this study proposed a novel two-layer self-adaptive multiprototype-based competitive learning algorithm, and implemented it in clustering of freeway cut-in scenarios. Firstly, the extracted cut-in segments from naturalistic driving data included environmental, static, and dynamic vehicle elements, composed of discrete, continuous, and time series variables, posing a challenge in multi-dimensional parameter clustering. To tackle this, we utilized the K-medoids clustering method, based on dynamic time warping distance, to cluster variables such as cut-in vehicle velocity, converting them into discrete variables and applying one-hot encoding for easier clustering distance calculations. Secondly, to address the imbalance issue where minority sample categories were absorbed into majority types in naturalistic driving data clustering, we employed a multi-prototype clustering method in the second layer. Each cluster was represented by one or more sub-clusters to ensure adequate representation of minority clusters. Moreover, the inclusion of adaptive competitive learning allowed the algorithm to autonomously determine the optimal number of clusters, eliminating the need for manual parameter tuning. Consequently, the proposed algorithm produced eleven representative freeway cut-in scenarios from 2415 segments, with a better clustering goodness than the other traditional clustering methods. Moreover, four representative cut-in scenarios were frequently appeared in the dataset and commonly recognized by previous studies, whilst seven were rare in the dataset but common in real-world driving circumstances, such as at night, adverse weather conditions, and commercial vehicle cut-in scenarios. These findings suggest that the proposed clustering method effectively addresses the challenges of dimensionality and imbalance, indicating its potential for wide application in constructing representative scenarios for automated vehicles development.</div></div>","PeriodicalId":6926,"journal":{"name":"Accident; analysis and prevention","volume":"220 ","pages":"Article 108151"},"PeriodicalIF":5.7000,"publicationDate":"2025-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Accident; analysis and prevention","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0001457525002374","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ERGONOMICS","Score":null,"Total":0}
引用次数: 0
Abstract
Representative driving scenarios derived by clustering of naturalistic driving data are guidelines for the function definition and algorithm development of automated vehicle. However, current clustering methods struggle with data dimensionality and imbalance, leading to significant biases. To tackle these issues, this study proposed a novel two-layer self-adaptive multiprototype-based competitive learning algorithm, and implemented it in clustering of freeway cut-in scenarios. Firstly, the extracted cut-in segments from naturalistic driving data included environmental, static, and dynamic vehicle elements, composed of discrete, continuous, and time series variables, posing a challenge in multi-dimensional parameter clustering. To tackle this, we utilized the K-medoids clustering method, based on dynamic time warping distance, to cluster variables such as cut-in vehicle velocity, converting them into discrete variables and applying one-hot encoding for easier clustering distance calculations. Secondly, to address the imbalance issue where minority sample categories were absorbed into majority types in naturalistic driving data clustering, we employed a multi-prototype clustering method in the second layer. Each cluster was represented by one or more sub-clusters to ensure adequate representation of minority clusters. Moreover, the inclusion of adaptive competitive learning allowed the algorithm to autonomously determine the optimal number of clusters, eliminating the need for manual parameter tuning. Consequently, the proposed algorithm produced eleven representative freeway cut-in scenarios from 2415 segments, with a better clustering goodness than the other traditional clustering methods. Moreover, four representative cut-in scenarios were frequently appeared in the dataset and commonly recognized by previous studies, whilst seven were rare in the dataset but common in real-world driving circumstances, such as at night, adverse weather conditions, and commercial vehicle cut-in scenarios. These findings suggest that the proposed clustering method effectively addresses the challenges of dimensionality and imbalance, indicating its potential for wide application in constructing representative scenarios for automated vehicles development.
期刊介绍:
Accident Analysis & Prevention provides wide coverage of the general areas relating to accidental injury and damage, including the pre-injury and immediate post-injury phases. Published papers deal with medical, legal, economic, educational, behavioral, theoretical or empirical aspects of transportation accidents, as well as with accidents at other sites. Selected topics within the scope of the Journal may include: studies of human, environmental and vehicular factors influencing the occurrence, type and severity of accidents and injury; the design, implementation and evaluation of countermeasures; biomechanics of impact and human tolerance limits to injury; modelling and statistical analysis of accident data; policy, planning and decision-making in safety.