使用概率聚类技术作为捕获选择模型异质性的规范工具

IF 7.6 1区工程技术 Q1 TRANSPORTATION SCIENCE & TECHNOLOGY

Transportation Research Part C-Emerging Technologies Pub Date : 2025-08-17 DOI:10.1016/j.trc.2025.105289

Panagiotis Tsoleridis, Charisma F. Choudhury, Stephane Hess

{"title":"使用概率聚类技术作为捕获选择模型异质性的规范工具","authors":"Panagiotis Tsoleridis, Charisma F. Choudhury, Stephane Hess","doi":"10.1016/j.trc.2025.105289","DOIUrl":null,"url":null,"abstract":"<div><div>In the era of big data, data-driven methods have emerged as strong competitors to traditional econometric models for analysing choice behaviour. In particular, data-driven models offer flexible classification methods that are well-suited to capturing the heterogeneity among decision makers and improving model fit. A key limitation of the purely data-driven models, however, is the difficulty in the calculation of welfare measures, such as the value of travel time estimates (VTT) that are essential for cost–benefit analyses. This motivates the current study which focuses on combining data mining based segmentation approaches used in ML with traditional discrete choice models (DCM) to get the best of both - a clustering-based component to capture the heterogeneity among the travellers and a utility-based choice component that is suitable for quantifying policy-relevant measures, such as VTT estimates. In the proposed hybrid framework, travellers are probabilistically allocated into clusters based on their degree of similarity from each cluster and cluster-specific random-utility-based mode choice models are estimated simultaneously. The proposed hybrid framework is tested on 2 RP datasets (a GPS diary and a traditional household survey) and on 3 different choice contexts, providing a range of different sample sizes and data complexity. The performance of the proposed hybrid model (H-LCCM) is compared with that of the traditional latent class choice models (LCCM), where both the class membership and mode choice components are based on utility-based frameworks and two other state-of-the-art ML-assisted LCCM frameworks. Results indicate that H-LCCM outperforms the remaining specifications in the majority of the contexts examined, while offering a more scalable approach for contexts with a large number of observations (which is the case for big data sources) and/or with large choice sets (which is typical in spatial choice contexts). The proposed framework is practically applicable for policy-making as it allows the calculation of VTT estimates, therefore not sacrificing the microeconomic interpretability of traditional DCMs. The results are promising, especially in the current era of big data and are expected to contribute to the emerging literature looking at cross-synergies between traditional econometric approaches and new data-driven methods.</div></div>","PeriodicalId":54417,"journal":{"name":"Transportation Research Part C-Emerging Technologies","volume":"179 ","pages":"Article 105289"},"PeriodicalIF":7.6000,"publicationDate":"2025-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Using probabilistic clustering techniques as a specification tool for capturing heterogeneity in choice models\",\"authors\":\"Panagiotis Tsoleridis, Charisma F. Choudhury, Stephane Hess\",\"doi\":\"10.1016/j.trc.2025.105289\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In the era of big data, data-driven methods have emerged as strong competitors to traditional econometric models for analysing choice behaviour. In particular, data-driven models offer flexible classification methods that are well-suited to capturing the heterogeneity among decision makers and improving model fit. A key limitation of the purely data-driven models, however, is the difficulty in the calculation of welfare measures, such as the value of travel time estimates (VTT) that are essential for cost–benefit analyses. This motivates the current study which focuses on combining data mining based segmentation approaches used in ML with traditional discrete choice models (DCM) to get the best of both - a clustering-based component to capture the heterogeneity among the travellers and a utility-based choice component that is suitable for quantifying policy-relevant measures, such as VTT estimates. In the proposed hybrid framework, travellers are probabilistically allocated into clusters based on their degree of similarity from each cluster and cluster-specific random-utility-based mode choice models are estimated simultaneously. The proposed hybrid framework is tested on 2 RP datasets (a GPS diary and a traditional household survey) and on 3 different choice contexts, providing a range of different sample sizes and data complexity. The performance of the proposed hybrid model (H-LCCM) is compared with that of the traditional latent class choice models (LCCM), where both the class membership and mode choice components are based on utility-based frameworks and two other state-of-the-art ML-assisted LCCM frameworks. Results indicate that H-LCCM outperforms the remaining specifications in the majority of the contexts examined, while offering a more scalable approach for contexts with a large number of observations (which is the case for big data sources) and/or with large choice sets (which is typical in spatial choice contexts). The proposed framework is practically applicable for policy-making as it allows the calculation of VTT estimates, therefore not sacrificing the microeconomic interpretability of traditional DCMs. The results are promising, especially in the current era of big data and are expected to contribute to the emerging literature looking at cross-synergies between traditional econometric approaches and new data-driven methods.</div></div>\",\"PeriodicalId\":54417,\"journal\":{\"name\":\"Transportation Research Part C-Emerging Technologies\",\"volume\":\"179 \",\"pages\":\"Article 105289\"},\"PeriodicalIF\":7.6000,\"publicationDate\":\"2025-08-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Transportation Research Part C-Emerging Technologies\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0968090X25002931\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"TRANSPORTATION SCIENCE & TECHNOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transportation Research Part C-Emerging Technologies","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0968090X25002931","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"TRANSPORTATION SCIENCE & TECHNOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

在大数据时代，数据驱动的方法已成为分析选择行为的传统计量经济学模型的有力竞争对手。特别是，数据驱动模型提供了灵活的分类方法，这些方法非常适合捕获决策者之间的异质性并改进模型拟合。然而，纯数据驱动模型的一个主要限制是难以计算福利措施，例如对成本效益分析至关重要的旅行时间估算值。这激发了当前的研究，该研究的重点是将ML中使用的基于数据挖掘的分割方法与传统的离散选择模型（DCM）相结合，以获得两者的最佳效果-基于聚类的组件用于捕获旅行者之间的异质性，基于效用的选择组件适用于量化政策相关措施，如VTT估计。在提出的混合框架中，根据旅行者与每个集群的相似程度，将其概率分配到集群中，同时估计基于集群特定随机效用的模式选择模型。所提出的混合框架在2个RP数据集（一个GPS日记和一个传统的家庭调查）和3个不同的选择背景下进行了测试，提供了一系列不同的样本量和数据复杂性。混合模型（H-LCCM）的性能与传统的潜在类别选择模型（LCCM）进行了比较，其中类别成员和模式选择组件都基于基于实用程序的框架和其他两个最先进的ml辅助LCCM框架。结果表明，H-LCCM在大多数被检查的上下文中优于其他规范，同时为具有大量观测值（大数据源的情况）和/或具有大选择集（这是典型的空间选择上下文）的上下文提供了更具可扩展性的方法。提议的框架实际上适用于政策制定，因为它允许计算VTT估计，因此不会牺牲传统dcm的微观经济可解释性。研究结果是有希望的，特别是在当前的大数据时代，预计将有助于新兴文献研究传统计量经济学方法和新的数据驱动方法之间的交叉协同作用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Using probabilistic clustering techniques as a specification tool for capturing heterogeneity in choice models

In the era of big data, data-driven methods have emerged as strong competitors to traditional econometric models for analysing choice behaviour. In particular, data-driven models offer flexible classification methods that are well-suited to capturing the heterogeneity among decision makers and improving model fit. A key limitation of the purely data-driven models, however, is the difficulty in the calculation of welfare measures, such as the value of travel time estimates (VTT) that are essential for cost–benefit analyses. This motivates the current study which focuses on combining data mining based segmentation approaches used in ML with traditional discrete choice models (DCM) to get the best of both - a clustering-based component to capture the heterogeneity among the travellers and a utility-based choice component that is suitable for quantifying policy-relevant measures, such as VTT estimates. In the proposed hybrid framework, travellers are probabilistically allocated into clusters based on their degree of similarity from each cluster and cluster-specific random-utility-based mode choice models are estimated simultaneously. The proposed hybrid framework is tested on 2 RP datasets (a GPS diary and a traditional household survey) and on 3 different choice contexts, providing a range of different sample sizes and data complexity. The performance of the proposed hybrid model (H-LCCM) is compared with that of the traditional latent class choice models (LCCM), where both the class membership and mode choice components are based on utility-based frameworks and two other state-of-the-art ML-assisted LCCM frameworks. Results indicate that H-LCCM outperforms the remaining specifications in the majority of the contexts examined, while offering a more scalable approach for contexts with a large number of observations (which is the case for big data sources) and/or with large choice sets (which is typical in spatial choice contexts). The proposed framework is practically applicable for policy-making as it allows the calculation of VTT estimates, therefore not sacrificing the microeconomic interpretability of traditional DCMs. The results are promising, especially in the current era of big data and are expected to contribute to the emerging literature looking at cross-synergies between traditional econometric approaches and new data-driven methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Transportation Research Part C-Emerging Technologies 工程技术-运输科技

CiteScore

15.80

自引率

12.00%

发文量

332

审稿时长

64 days

期刊介绍： Transportation Research: Part C (TR_C) is dedicated to showcasing high-quality, scholarly research that delves into the development, applications, and implications of transportation systems and emerging technologies. Our focus lies not solely on individual technologies, but rather on their broader implications for the planning, design, operation, control, maintenance, and rehabilitation of transportation systems, services, and components. In essence, the intellectual core of the journal revolves around the transportation aspect rather than the technology itself. We actively encourage the integration of quantitative methods from diverse fields such as operations research, control systems, complex networks, computer science, and artificial intelligence. Join us in exploring the intersection of transportation systems and emerging technologies to drive innovation and progress in the field.