Comparing econometric and machine learning models to fare evasion prediction

IF 3.8 Q2 TRANSPORTATION

Transportation Research Interdisciplinary Perspectives Pub Date : 2025-09-18 DOI:10.1016/j.trip.2025.101636

Benedetto Barabino, Roberto Ventura

{"title":"Comparing econometric and machine learning models to fare evasion prediction","authors":"Benedetto Barabino, Roberto Ventura","doi":"10.1016/j.trip.2025.101636","DOIUrl":null,"url":null,"abstract":"<div><div>Fare evasion poses a significant financial threat to Transit Agencies (TAs) and Public Transport Companies (PTCs) globally, especially within Proof-of-Payment Transit Systems (POP-TSs). Understanding and estimating fare evasion frequency is crucial for developing targeted countermeasures. Traditionally, Econometric Models (EMs) have been employed for this purpose, linking fare evasion frequency to specific predictors to assess their effects and significance. However, Machine Learning Models (MLMs) have recently emerged as promising tools, offering the potential for enhanced accuracy through complex data analysis. Despite their strengths, a comprehensive comparison between EMs and MLMs for predicting fare evasion frequency has been lacking in the literature.</div><div>This study addresses this gap by developing, calibrating, and validating two alternative frequency estimation models—an EM based on a Generalised Linear Regression Model (GLRM) and an MLM based on an Artificial Neural Network Model (ANNM). Using 4,000- real-world records from an Italian mid-sized PTC, the models’ performances are quantitatively assessed through regression plots, error metrics, and fare evasion event ratios. The findings indicate that ANNM slightly outperforms GLRM on the considered dataset, showing a higher correlation coefficient, reduced margin of error, and a fare evasion event ratio closer to one. Moreover, the predictor effects were explored, an area where ANNM’s “black box” nature traditionally limits transparency. An overview of these effects shows that while both models identify similar key factors, each prioritises different aspects of fare evasion influences. These insights would help TAs/PTCs select models based on data, interpretability needs, and fare evasion patterns, supporting more effective, data-driven management strategies.</div></div>","PeriodicalId":36621,"journal":{"name":"Transportation Research Interdisciplinary Perspectives","volume":"34 ","pages":"Article 101636"},"PeriodicalIF":3.8000,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transportation Research Interdisciplinary Perspectives","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S259019822500315X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"TRANSPORTATION","Score":null,"Total":0}

引用次数: 0

Abstract

Fare evasion poses a significant financial threat to Transit Agencies (TAs) and Public Transport Companies (PTCs) globally, especially within Proof-of-Payment Transit Systems (POP-TSs). Understanding and estimating fare evasion frequency is crucial for developing targeted countermeasures. Traditionally, Econometric Models (EMs) have been employed for this purpose, linking fare evasion frequency to specific predictors to assess their effects and significance. However, Machine Learning Models (MLMs) have recently emerged as promising tools, offering the potential for enhanced accuracy through complex data analysis. Despite their strengths, a comprehensive comparison between EMs and MLMs for predicting fare evasion frequency has been lacking in the literature.

This study addresses this gap by developing, calibrating, and validating two alternative frequency estimation models—an EM based on a Generalised Linear Regression Model (GLRM) and an MLM based on an Artificial Neural Network Model (ANNM). Using 4,000- real-world records from an Italian mid-sized PTC, the models’ performances are quantitatively assessed through regression plots, error metrics, and fare evasion event ratios. The findings indicate that ANNM slightly outperforms GLRM on the considered dataset, showing a higher correlation coefficient, reduced margin of error, and a fare evasion event ratio closer to one. Moreover, the predictor effects were explored, an area where ANNM’s “black box” nature traditionally limits transparency. An overview of these effects shows that while both models identify similar key factors, each prioritises different aspects of fare evasion influences. These insights would help TAs/PTCs select models based on data, interpretability needs, and fare evasion patterns, supporting more effective, data-driven management strategies.

查看原文本刊更多论文

比较计量经济学和机器学习模型对逃票预测的影响

逃票对全球的交通运输机构（TAs）和公共交通公司（ptc）构成了重大的财务威胁，特别是在支付证明交通运输系统（POP-TSs）中。了解和估计逃票频率对于制定有针对性的对策至关重要。传统上，计量经济模型（EMs）已被用于此目的，将逃票频率与特定预测因子联系起来，以评估其影响和意义。然而，机器学习模型（MLMs）最近成为有前途的工具，通过复杂的数据分析提供了提高准确性的潜力。尽管新兴市场和传销有各自的优势，但在预测逃票频率方面，文献中缺乏对它们的全面比较。本研究通过开发、校准和验证两种可选的频率估计模型来解决这一差距——基于广义线性回归模型（GLRM）的EM和基于人工神经网络模型（ANNM）的MLM。使用来自意大利中型PTC的4000条真实记录，通过回归图、误差指标和逃票事件比率对模型的性能进行了定量评估。研究结果表明，在考虑的数据集上，ANNM略优于GLRM，显示出更高的相关系数、更小的误差范围和更接近1的逃票事件比率。此外，还探讨了预测效应，这是ANNM的“黑箱”性质传统上限制透明度的领域。对这些影响的概述表明，虽然两个模型都确定了相似的关键因素，但每个模型都优先考虑逃票影响的不同方面。这些见解将帮助运输公司/运输公司根据数据、可解释性需求和逃票模式选择模型，支持更有效的数据驱动管理策略。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊