An approach to assess the role of features in detection of transportation modes

IF 3.5 2区 工程技术 Q1 ENGINEERING, CIVIL
Sajjad Sowlati, Rahim Ali Abbaspour, Alireza Chehreghan
{"title":"An approach to assess the role of features in detection of transportation modes","authors":"Sajjad Sowlati, Rahim Ali Abbaspour, Alireza Chehreghan","doi":"10.1007/s11116-024-10492-7","DOIUrl":null,"url":null,"abstract":"<p>One of the fundamental prerequisites for interpreting collected passive travel data to develop intelligent transportation systems is the detection of transportation modes. The literature has divided transportation mode detection into two parts: feature extraction and implementation of classification models. Selecting and employing influential features will help maximize the power of the classification model. Meanwhile, the interpretation and identification of influential features, which will be the focus of this study, have received less attention. Importantly, the influence of features varies depending on the nature of the input data and the choice of classification models. In many cases, the extracted features show interdependence, where their combined correlation significantly impacts specific outcomes. Consequently, evaluating the effectiveness of individual features in isolation may not produce accurate results, requiring the exploration of alternative methodologies. This study seeks to bridge these gaps through a comprehensive investigation. Three open-source datasets, Geolife, MTL Trajet 2017, and MTL Trajet 2016, were utilized to enhance reliability, validate the approach, and investigate the variability of influential features under various data collection conditions. Originally, various features were extracted and grouped for this purpose based on their kinematic, spatial, and contextual features. Then, three powerful classification models (Random Forest, LightGBM, and XGBoost) were utilized. A hybrid feature selection algorithm was employed to select a subset of features to analyze the variability of influential features across different classification models. The algorithm removed over half of the features with minimal or negative impact, thereby simplifying the process of classification identification. Since the features when combined in the form of a subset, would result in powerful identification, the influence of the features was analyzed within a set of features instead of analyzing each feature individually. Two approaches, “number of feature repetitions” and “Shapley Additive Explanations (SHAP) value,” were adopted to interpret the computation. After implementation, the “average velocity” with repetition in all datasets and classification models (nine repetitions) had the highest SHAP value, making it the most influential feature across all datasets and classification models. The “public stations indicator” was the most influential spatial feature with the highest SHAP value, appearing nine times, while “holiday” had the most repetitions among the contextual features.</p>","PeriodicalId":49419,"journal":{"name":"Transportation","volume":"28 1","pages":""},"PeriodicalIF":3.5000,"publicationDate":"2024-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transportation","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1007/s11116-024-10492-7","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, CIVIL","Score":null,"Total":0}
引用次数: 0

Abstract

One of the fundamental prerequisites for interpreting collected passive travel data to develop intelligent transportation systems is the detection of transportation modes. The literature has divided transportation mode detection into two parts: feature extraction and implementation of classification models. Selecting and employing influential features will help maximize the power of the classification model. Meanwhile, the interpretation and identification of influential features, which will be the focus of this study, have received less attention. Importantly, the influence of features varies depending on the nature of the input data and the choice of classification models. In many cases, the extracted features show interdependence, where their combined correlation significantly impacts specific outcomes. Consequently, evaluating the effectiveness of individual features in isolation may not produce accurate results, requiring the exploration of alternative methodologies. This study seeks to bridge these gaps through a comprehensive investigation. Three open-source datasets, Geolife, MTL Trajet 2017, and MTL Trajet 2016, were utilized to enhance reliability, validate the approach, and investigate the variability of influential features under various data collection conditions. Originally, various features were extracted and grouped for this purpose based on their kinematic, spatial, and contextual features. Then, three powerful classification models (Random Forest, LightGBM, and XGBoost) were utilized. A hybrid feature selection algorithm was employed to select a subset of features to analyze the variability of influential features across different classification models. The algorithm removed over half of the features with minimal or negative impact, thereby simplifying the process of classification identification. Since the features when combined in the form of a subset, would result in powerful identification, the influence of the features was analyzed within a set of features instead of analyzing each feature individually. Two approaches, “number of feature repetitions” and “Shapley Additive Explanations (SHAP) value,” were adopted to interpret the computation. After implementation, the “average velocity” with repetition in all datasets and classification models (nine repetitions) had the highest SHAP value, making it the most influential feature across all datasets and classification models. The “public stations indicator” was the most influential spatial feature with the highest SHAP value, appearing nine times, while “holiday” had the most repetitions among the contextual features.

Abstract Image

评估特征在检测交通模式中的作用的方法
解释所收集的被动式出行数据以开发智能交通系统的基本前提之一是检测交通模式。文献将交通模式检测分为两个部分:特征提取和分类模型的实施。选择和使用有影响力的特征有助于最大限度地提高分类模型的能力。与此同时,作为本研究的重点,如何解释和识别有影响力的特征却较少受到关注。重要的是,特征的影响因输入数据的性质和分类模型的选择而异。在很多情况下,提取的特征会表现出相互依存的关系,它们之间的综合相关性会对特定结果产生重大影响。因此,孤立地评估单个特征的有效性可能无法得出准确的结果,这就需要探索其他方法。本研究试图通过全面调查来弥补这些差距。研究利用了三个开源数据集:Geolife、MTL Trajet 2017 和 MTL Trajet 2016,以提高可靠性、验证方法并调查各种数据收集条件下有影响力特征的变化。首先,根据运动学特征、空间特征和上下文特征,对各种特征进行提取和分组。然后,利用三种强大的分类模型(随机森林、LightGBM 和 XGBoost)。我们采用了一种混合特征选择算法来选择特征子集,以分析不同分类模型中具有影响力的特征的差异性。该算法删除了一半以上影响最小或负面影响最小的特征,从而简化了分类识别过程。由于这些特征以子集的形式组合在一起会产生强大的识别效果,因此在一组特征中分析了特征的影响,而不是单独分析每个特征。计算中采用了 "特征重复次数 "和 "夏普利加法解释(SHAP)值 "两种方法。经过计算,在所有数据集和分类模型中重复出现的 "平均速度"(重复出现 9 次)的 SHAP 值最高,成为所有数据集和分类模型中影响最大的特征。公共台站指标 "是影响最大的空间特征,SHAP 值最高,出现了 9 次,而 "节假日 "在情境特征中重复次数最多。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Transportation
Transportation 工程技术-工程:土木
CiteScore
10.70
自引率
4.70%
发文量
94
审稿时长
6-12 weeks
期刊介绍: In our first issue, published in 1972, we explained that this Journal is intended to promote the free and vigorous exchange of ideas and experience among the worldwide community actively concerned with transportation policy, planning and practice. That continues to be our mission, with a clear focus on topics concerned with research and practice in transportation policy and planning, around the world. These four words, policy and planning, research and practice are our key words. While we have a particular focus on transportation policy analysis and travel behaviour in the context of ground transportation, we willingly consider all good quality papers that are highly relevant to transportation policy, planning and practice with a clear focus on innovation, on extending the international pool of knowledge and understanding. Our interest is not only with transportation policies - and systems and services – but also with their social, economic and environmental impacts, However, papers about the application of established procedures to, or the development of plans or policies for, specific locations are unlikely to prove acceptable unless they report experience which will be of real benefit those working elsewhere. Papers concerned with the engineering, safety and operational management of transportation systems are outside our scope.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信