Panagiotis Tsoleridis, Charisma F. Choudhury, Stephane Hess
{"title":"使用概率聚类技术作为捕获选择模型异质性的规范工具","authors":"Panagiotis Tsoleridis, Charisma F. Choudhury, Stephane Hess","doi":"10.1016/j.trc.2025.105289","DOIUrl":null,"url":null,"abstract":"<div><div>In the era of big data, data-driven methods have emerged as strong competitors to traditional econometric models for analysing choice behaviour. In particular, data-driven models offer flexible classification methods that are well-suited to capturing the heterogeneity among decision makers and improving model fit. A key limitation of the purely data-driven models, however, is the difficulty in the calculation of welfare measures, such as the value of travel time estimates (VTT) that are essential for cost–benefit analyses. This motivates the current study which focuses on combining data mining based segmentation approaches used in ML with traditional discrete choice models (DCM) to get the best of both - a clustering-based component to capture the heterogeneity among the travellers and a utility-based choice component that is suitable for quantifying policy-relevant measures, such as VTT estimates. In the proposed hybrid framework, travellers are probabilistically allocated into clusters based on their degree of similarity from each cluster and cluster-specific random-utility-based mode choice models are estimated simultaneously. The proposed hybrid framework is tested on 2 RP datasets (a GPS diary and a traditional household survey) and on 3 different choice contexts, providing a range of different sample sizes and data complexity. The performance of the proposed hybrid model (H-LCCM) is compared with that of the traditional latent class choice models (LCCM), where both the class membership and mode choice components are based on utility-based frameworks and two other state-of-the-art ML-assisted LCCM frameworks. Results indicate that H-LCCM outperforms the remaining specifications in the majority of the contexts examined, while offering a more scalable approach for contexts with a large number of observations (which is the case for big data sources) and/or with large choice sets (which is typical in spatial choice contexts). The proposed framework is practically applicable for policy-making as it allows the calculation of VTT estimates, therefore not sacrificing the microeconomic interpretability of traditional DCMs. The results are promising, especially in the current era of big data and are expected to contribute to the emerging literature looking at cross-synergies between traditional econometric approaches and new data-driven methods.</div></div>","PeriodicalId":54417,"journal":{"name":"Transportation Research Part C-Emerging Technologies","volume":"179 ","pages":"Article 105289"},"PeriodicalIF":7.6000,"publicationDate":"2025-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Using probabilistic clustering techniques as a specification tool for capturing heterogeneity in choice models\",\"authors\":\"Panagiotis Tsoleridis, Charisma F. Choudhury, Stephane Hess\",\"doi\":\"10.1016/j.trc.2025.105289\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In the era of big data, data-driven methods have emerged as strong competitors to traditional econometric models for analysing choice behaviour. In particular, data-driven models offer flexible classification methods that are well-suited to capturing the heterogeneity among decision makers and improving model fit. A key limitation of the purely data-driven models, however, is the difficulty in the calculation of welfare measures, such as the value of travel time estimates (VTT) that are essential for cost–benefit analyses. This motivates the current study which focuses on combining data mining based segmentation approaches used in ML with traditional discrete choice models (DCM) to get the best of both - a clustering-based component to capture the heterogeneity among the travellers and a utility-based choice component that is suitable for quantifying policy-relevant measures, such as VTT estimates. In the proposed hybrid framework, travellers are probabilistically allocated into clusters based on their degree of similarity from each cluster and cluster-specific random-utility-based mode choice models are estimated simultaneously. The proposed hybrid framework is tested on 2 RP datasets (a GPS diary and a traditional household survey) and on 3 different choice contexts, providing a range of different sample sizes and data complexity. The performance of the proposed hybrid model (H-LCCM) is compared with that of the traditional latent class choice models (LCCM), where both the class membership and mode choice components are based on utility-based frameworks and two other state-of-the-art ML-assisted LCCM frameworks. Results indicate that H-LCCM outperforms the remaining specifications in the majority of the contexts examined, while offering a more scalable approach for contexts with a large number of observations (which is the case for big data sources) and/or with large choice sets (which is typical in spatial choice contexts). The proposed framework is practically applicable for policy-making as it allows the calculation of VTT estimates, therefore not sacrificing the microeconomic interpretability of traditional DCMs. The results are promising, especially in the current era of big data and are expected to contribute to the emerging literature looking at cross-synergies between traditional econometric approaches and new data-driven methods.</div></div>\",\"PeriodicalId\":54417,\"journal\":{\"name\":\"Transportation Research Part C-Emerging Technologies\",\"volume\":\"179 \",\"pages\":\"Article 105289\"},\"PeriodicalIF\":7.6000,\"publicationDate\":\"2025-08-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Transportation Research Part C-Emerging Technologies\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0968090X25002931\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"TRANSPORTATION SCIENCE & TECHNOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transportation Research Part C-Emerging Technologies","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0968090X25002931","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"TRANSPORTATION SCIENCE & TECHNOLOGY","Score":null,"Total":0}
Using probabilistic clustering techniques as a specification tool for capturing heterogeneity in choice models
In the era of big data, data-driven methods have emerged as strong competitors to traditional econometric models for analysing choice behaviour. In particular, data-driven models offer flexible classification methods that are well-suited to capturing the heterogeneity among decision makers and improving model fit. A key limitation of the purely data-driven models, however, is the difficulty in the calculation of welfare measures, such as the value of travel time estimates (VTT) that are essential for cost–benefit analyses. This motivates the current study which focuses on combining data mining based segmentation approaches used in ML with traditional discrete choice models (DCM) to get the best of both - a clustering-based component to capture the heterogeneity among the travellers and a utility-based choice component that is suitable for quantifying policy-relevant measures, such as VTT estimates. In the proposed hybrid framework, travellers are probabilistically allocated into clusters based on their degree of similarity from each cluster and cluster-specific random-utility-based mode choice models are estimated simultaneously. The proposed hybrid framework is tested on 2 RP datasets (a GPS diary and a traditional household survey) and on 3 different choice contexts, providing a range of different sample sizes and data complexity. The performance of the proposed hybrid model (H-LCCM) is compared with that of the traditional latent class choice models (LCCM), where both the class membership and mode choice components are based on utility-based frameworks and two other state-of-the-art ML-assisted LCCM frameworks. Results indicate that H-LCCM outperforms the remaining specifications in the majority of the contexts examined, while offering a more scalable approach for contexts with a large number of observations (which is the case for big data sources) and/or with large choice sets (which is typical in spatial choice contexts). The proposed framework is practically applicable for policy-making as it allows the calculation of VTT estimates, therefore not sacrificing the microeconomic interpretability of traditional DCMs. The results are promising, especially in the current era of big data and are expected to contribute to the emerging literature looking at cross-synergies between traditional econometric approaches and new data-driven methods.
期刊介绍:
Transportation Research: Part C (TR_C) is dedicated to showcasing high-quality, scholarly research that delves into the development, applications, and implications of transportation systems and emerging technologies. Our focus lies not solely on individual technologies, but rather on their broader implications for the planning, design, operation, control, maintenance, and rehabilitation of transportation systems, services, and components. In essence, the intellectual core of the journal revolves around the transportation aspect rather than the technology itself. We actively encourage the integration of quantitative methods from diverse fields such as operations research, control systems, complex networks, computer science, and artificial intelligence. Join us in exploring the intersection of transportation systems and emerging technologies to drive innovation and progress in the field.