Out-of-sample prediction and interpretation for random parameter generalized linear models

IF 5.7 1区 工程技术 Q1 ERGONOMICS
Jonathan S. Wood , Vikash Gayah
{"title":"Out-of-sample prediction and interpretation for random parameter generalized linear models","authors":"Jonathan S. Wood ,&nbsp;Vikash Gayah","doi":"10.1016/j.aap.2025.108147","DOIUrl":null,"url":null,"abstract":"<div><div>Incorporating random parameters (RPs) into generalized linearized models (GLMs) – such as the negative binomial (NB) regression model used to predict crash frequencies – has been shown to improve model fit and better address issues such as unobserved heterogeneity. However, applying models with RPs to make predictions for observations outside the sample used to estimate the model is not straightforward. Recent studies have proposed various methods to incorporate RPs in out-of-sample predictions, but these tend to provide biased estimates or are computationally intensive to apply. This paper applies fundamental statistical theory to leverage properties of the underlying RP distributions incorporated into GLMs to provide more direct and accurate predictions, as well as directly estimate prediction variance for out-of-sample observations. Methods are provided for several common RP distributions – including the normal/Gaussian, lognormal, triangular, uniform, and gamma distributions – combined within log-link GLM framework. Additionally, closed-form equations for elasticities and marginal effects for the random parameters are provided. The proposed methods are tested using crash frequency prediction models developed using data from the Highway Safety Information System (HSIS). The results suggest that the proposed exact method provides more accurate predictions than the computational-intensive simulation-based approximation approaches while also being simple to apply. The method is suitable for the widespread use of RPs in research and in practical applications of GLMs.</div></div>","PeriodicalId":6926,"journal":{"name":"Accident; analysis and prevention","volume":"220 ","pages":"Article 108147"},"PeriodicalIF":5.7000,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Accident; analysis and prevention","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0001457525002337","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ERGONOMICS","Score":null,"Total":0}
引用次数: 0

Abstract

Incorporating random parameters (RPs) into generalized linearized models (GLMs) – such as the negative binomial (NB) regression model used to predict crash frequencies – has been shown to improve model fit and better address issues such as unobserved heterogeneity. However, applying models with RPs to make predictions for observations outside the sample used to estimate the model is not straightforward. Recent studies have proposed various methods to incorporate RPs in out-of-sample predictions, but these tend to provide biased estimates or are computationally intensive to apply. This paper applies fundamental statistical theory to leverage properties of the underlying RP distributions incorporated into GLMs to provide more direct and accurate predictions, as well as directly estimate prediction variance for out-of-sample observations. Methods are provided for several common RP distributions – including the normal/Gaussian, lognormal, triangular, uniform, and gamma distributions – combined within log-link GLM framework. Additionally, closed-form equations for elasticities and marginal effects for the random parameters are provided. The proposed methods are tested using crash frequency prediction models developed using data from the Highway Safety Information System (HSIS). The results suggest that the proposed exact method provides more accurate predictions than the computational-intensive simulation-based approximation approaches while also being simple to apply. The method is suitable for the widespread use of RPs in research and in practical applications of GLMs.
随机参数广义线性模型的样本外预测与解释
将随机参数(rp)纳入广义线性化模型(GLMs),例如用于预测碰撞频率的负二项(NB)回归模型,已被证明可以改善模型拟合并更好地解决诸如未观察到的异质性等问题。然而,应用具有rp的模型来预测用于估计模型的样本之外的观测结果并不简单。最近的研究提出了将rp纳入样本外预测的各种方法,但这些方法往往提供有偏差的估计,或者需要大量的计算。本文运用基本统计理论,利用纳入glm的底层RP分布的特性,提供更直接和准确的预测,并直接估计样本外观测值的预测方差。提供了几种常见RP分布的方法——包括正态/高斯分布、对数正态分布、三角形分布、均匀分布和伽玛分布——在日志链接GLM框架中组合。此外,还给出了弹性和随机参数边际效应的封闭方程。采用公路安全信息系统(HSIS)数据开发的碰撞频率预测模型对所提出的方法进行了测试。结果表明,所提出的精确方法比基于计算密集型模拟的近似方法提供了更准确的预测,同时也易于应用。该方法适合于rp在GLMs研究和实际应用中的广泛应用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
11.90
自引率
16.90%
发文量
264
审稿时长
48 days
期刊介绍: Accident Analysis & Prevention provides wide coverage of the general areas relating to accidental injury and damage, including the pre-injury and immediate post-injury phases. Published papers deal with medical, legal, economic, educational, behavioral, theoretical or empirical aspects of transportation accidents, as well as with accidents at other sites. Selected topics within the scope of the Journal may include: studies of human, environmental and vehicular factors influencing the occurrence, type and severity of accidents and injury; the design, implementation and evaluation of countermeasures; biomechanics of impact and human tolerance limits to injury; modelling and statistical analysis of accident data; policy, planning and decision-making in safety.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信