Out-of-sample prediction and interpretation for random parameter generalized linear models

IF 6.2 1区工程技术 Q1 ERGONOMICS

Accident; analysis and prevention Pub Date : 2025-07-10 DOI:10.1016/j.aap.2025.108147

Jonathan S. Wood , Vikash Gayah

{"title":"Out-of-sample prediction and interpretation for random parameter generalized linear models","authors":"Jonathan S. Wood , Vikash Gayah","doi":"10.1016/j.aap.2025.108147","DOIUrl":null,"url":null,"abstract":"<div><div>Incorporating random parameters (RPs) into generalized linearized models (GLMs) – such as the negative binomial (NB) regression model used to predict crash frequencies – has been shown to improve model fit and better address issues such as unobserved heterogeneity. However, applying models with RPs to make predictions for observations outside the sample used to estimate the model is not straightforward. Recent studies have proposed various methods to incorporate RPs in out-of-sample predictions, but these tend to provide biased estimates or are computationally intensive to apply. This paper applies fundamental statistical theory to leverage properties of the underlying RP distributions incorporated into GLMs to provide more direct and accurate predictions, as well as directly estimate prediction variance for out-of-sample observations. Methods are provided for several common RP distributions – including the normal/Gaussian, lognormal, triangular, uniform, and gamma distributions – combined within log-link GLM framework. Additionally, closed-form equations for elasticities and marginal effects for the random parameters are provided. The proposed methods are tested using crash frequency prediction models developed using data from the Highway Safety Information System (HSIS). The results suggest that the proposed exact method provides more accurate predictions than the computational-intensive simulation-based approximation approaches while also being simple to apply. The method is suitable for the widespread use of RPs in research and in practical applications of GLMs.</div></div>","PeriodicalId":6926,"journal":{"name":"Accident; analysis and prevention","volume":"220 ","pages":"Article 108147"},"PeriodicalIF":6.2000,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Accident; analysis and prevention","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0001457525002337","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ERGONOMICS","Score":null,"Total":0}

引用次数: 0

Abstract

Incorporating random parameters (RPs) into generalized linearized models (GLMs) – such as the negative binomial (NB) regression model used to predict crash frequencies – has been shown to improve model fit and better address issues such as unobserved heterogeneity. However, applying models with RPs to make predictions for observations outside the sample used to estimate the model is not straightforward. Recent studies have proposed various methods to incorporate RPs in out-of-sample predictions, but these tend to provide biased estimates or are computationally intensive to apply. This paper applies fundamental statistical theory to leverage properties of the underlying RP distributions incorporated into GLMs to provide more direct and accurate predictions, as well as directly estimate prediction variance for out-of-sample observations. Methods are provided for several common RP distributions – including the normal/Gaussian, lognormal, triangular, uniform, and gamma distributions – combined within log-link GLM framework. Additionally, closed-form equations for elasticities and marginal effects for the random parameters are provided. The proposed methods are tested using crash frequency prediction models developed using data from the Highway Safety Information System (HSIS). The results suggest that the proposed exact method provides more accurate predictions than the computational-intensive simulation-based approximation approaches while also being simple to apply. The method is suitable for the widespread use of RPs in research and in practical applications of GLMs.

查看原文本刊更多论文

随机参数广义线性模型的样本外预测与解释

将随机参数（rp）纳入广义线性化模型（GLMs），例如用于预测碰撞频率的负二项（NB）回归模型，已被证明可以改善模型拟合并更好地解决诸如未观察到的异质性等问题。然而，应用具有rp的模型来预测用于估计模型的样本之外的观测结果并不简单。最近的研究提出了将rp纳入样本外预测的各种方法，但这些方法往往提供有偏差的估计，或者需要大量的计算。本文运用基本统计理论，利用纳入glm的底层RP分布的特性，提供更直接和准确的预测，并直接估计样本外观测值的预测方差。提供了几种常见RP分布的方法——包括正态/高斯分布、对数正态分布、三角形分布、均匀分布和伽玛分布——在日志链接GLM框架中组合。此外，还给出了弹性和随机参数边际效应的封闭方程。采用公路安全信息系统（HSIS）数据开发的碰撞频率预测模型对所提出的方法进行了测试。结果表明，所提出的精确方法比基于计算密集型模拟的近似方法提供了更准确的预测，同时也易于应用。该方法适合于rp在GLMs研究和实际应用中的广泛应用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Accident; analysis and prevention Multiple-

CiteScore

11.90

自引率

16.90%

发文量

264

审稿时长

48 days

期刊介绍： Accident Analysis & Prevention provides wide coverage of the general areas relating to accidental injury and damage, including the pre-injury and immediate post-injury phases. Published papers deal with medical, legal, economic, educational, behavioral, theoretical or empirical aspects of transportation accidents, as well as with accidents at other sites. Selected topics within the scope of the Journal may include: studies of human, environmental and vehicular factors influencing the occurrence, type and severity of accidents and injury; the design, implementation and evaluation of countermeasures; biomechanics of impact and human tolerance limits to injury; modelling and statistical analysis of accident data; policy, planning and decision-making in safety.