Semi and Nonparametric Conditional Probability Density, a Case Study of Pedestrian Crashes

Q3 Social Sciences

Open Transportation Journal Pub Date : 2021-12-31 DOI:10.2174/1874447802115010280

Mahdi Rezapour, K. Ksaibati

{"title":"Semi and Nonparametric Conditional Probability Density, a Case Study of Pedestrian Crashes","authors":"Mahdi Rezapour, K. Ksaibati","doi":"10.2174/1874447802115010280","DOIUrl":null,"url":null,"abstract":"\n \n Kernel-based methods have gained popularity as employed model residual’s distribution might not be defined by any classical parametric distribution. Kernel-based method has been extended to estimate conditional densities instead of conditional distributions when data incorporate both discrete and continuous attributes. The method often has been based on smoothing parameters to use optimal values for various attributes. Thus, in case of an explanatory variable being independent of the dependent variable, that attribute would be dropped in the nonparametric method by assigning a large smoothing parameter, giving them uniform distributions so their variances to the model’s variance would be minimal.\n \n \n \n The objective of this study was to identify factors to the severity of pedestrian crashes based on an unbiased method. Especially, this study was conducted to evaluate the applicability of kernel-based techniques of semi- and nonparametric methods on the crash dataset by means of confusion techniques.\n \n \n \n In this study, two non- and semi-parametric kernel-based methods were implemented to model the severity of pedestrian crashes. The estimation of the semi-parametric densities is based on the adoptive local smoothing and maximization of the quasi-likelihood function, which is similar somehow to the likelihood of the binary logit model. On the other hand, the nonparametric method is based on the selection of optimal smoothing parameters in estimation of the conditional probability density function to minimize mean integrated squared error (MISE). The performances of those models are evaluated by their prediction power. To have a benchmark for comparison, the standard logistic regression was also employed. Although those methods have been employed in other fields, this is one of the earliest studies that employed those techniques in the context of traffic safety.\n \n \n \n The results highlighted that the nonparametric kernel-based method outperforms the semi-parametric (single-index model) and the standard logit model based on the confusion matrices. To have a vision about the bandwidth selection method for removal of the irrelevant attributes in nonparametric approach, we added some noisy predictors to the models and a comparison was made. Extensive discussion has been made in the content of this study regarding the methodological approach of the models.\n \n \n \n \n To summarize, alcohol and drug involvement, driving on non-level grade, and bad lighting conditions are some of the factors that increase the likelihood of pedestrian crash severity. This is one of the earliest studies that implemented the methods in the context of transportation problems. The nonparametric method is especially recommended to be used in the field of traffic safety when there are uncertainties regarding the importance of predictors as the technique would automatically drop unimportant predictors.\n","PeriodicalId":38631,"journal":{"name":"Open Transportation Journal","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Open Transportation Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2174/1874447802115010280","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Social Sciences","Score":null,"Total":0}

引用次数: 2

Abstract

Kernel-based methods have gained popularity as employed model residual’s distribution might not be defined by any classical parametric distribution. Kernel-based method has been extended to estimate conditional densities instead of conditional distributions when data incorporate both discrete and continuous attributes. The method often has been based on smoothing parameters to use optimal values for various attributes. Thus, in case of an explanatory variable being independent of the dependent variable, that attribute would be dropped in the nonparametric method by assigning a large smoothing parameter, giving them uniform distributions so their variances to the model’s variance would be minimal. The objective of this study was to identify factors to the severity of pedestrian crashes based on an unbiased method. Especially, this study was conducted to evaluate the applicability of kernel-based techniques of semi- and nonparametric methods on the crash dataset by means of confusion techniques. In this study, two non- and semi-parametric kernel-based methods were implemented to model the severity of pedestrian crashes. The estimation of the semi-parametric densities is based on the adoptive local smoothing and maximization of the quasi-likelihood function, which is similar somehow to the likelihood of the binary logit model. On the other hand, the nonparametric method is based on the selection of optimal smoothing parameters in estimation of the conditional probability density function to minimize mean integrated squared error (MISE). The performances of those models are evaluated by their prediction power. To have a benchmark for comparison, the standard logistic regression was also employed. Although those methods have been employed in other fields, this is one of the earliest studies that employed those techniques in the context of traffic safety. The results highlighted that the nonparametric kernel-based method outperforms the semi-parametric (single-index model) and the standard logit model based on the confusion matrices. To have a vision about the bandwidth selection method for removal of the irrelevant attributes in nonparametric approach, we added some noisy predictors to the models and a comparison was made. Extensive discussion has been made in the content of this study regarding the methodological approach of the models. To summarize, alcohol and drug involvement, driving on non-level grade, and bad lighting conditions are some of the factors that increase the likelihood of pedestrian crash severity. This is one of the earliest studies that implemented the methods in the context of transportation problems. The nonparametric method is especially recommended to be used in the field of traffic safety when there are uncertainties regarding the importance of predictors as the technique would automatically drop unimportant predictors.

查看原文本刊更多论文

半参数和非参数条件概率密度，以行人碰撞为例

基于核的方法已经流行起来，因为所使用的模型残差的分布可能不是由任何经典的参数分布定义的。当数据包含离散和连续属性时，基于核的方法被扩展到估计条件密度，而不是条件分布。该方法通常基于平滑参数来使用各种属性的最佳值。因此，在解释变量独立于因变量的情况下，在非参数方法中，通过分配一个大的平滑参数，使其均匀分布，使其与模型方差的方差最小，从而放弃该属性。本研究的目的是基于一种无偏的方法来确定行人碰撞严重程度的因素。特别是，本研究通过混淆技术来评估基于核的半参数和非参数方法在崩溃数据集上的适用性。在本研究中，实现了两种基于非参数和半参数核的方法来对行人碰撞的严重程度进行建模。半参数密度的估计是基于拟似然函数的局部平滑和最大化，这在某种程度上类似于二元logit模型的似然。另一方面，非参数方法是基于在条件概率密度函数的估计中选择最佳平滑参数来最小化均方误差（MISE）。这些模型的性能是通过它们的预测能力来评估的。为了有一个比较基准，还采用了标准的逻辑回归。尽管这些方法已被用于其他领域，但这是在交通安全背景下使用这些技术的最早研究之一。结果表明，基于非参数核的方法优于基于混淆矩阵的半参数（单指标模型）和标准logit模型。为了了解非参数方法中去除不相关属性的带宽选择方法，我们在模型中添加了一些噪声预测因子，并进行了比较。在本研究的内容中，对模型的方法论方法进行了广泛的讨论。总之，饮酒和吸毒、在非水平坡上驾驶以及恶劣的照明条件是增加行人碰撞严重程度可能性的一些因素。这是在交通问题背景下实施这些方法的最早研究之一。当预测因子的重要性存在不确定性时，特别建议在交通安全领域使用非参数方法，因为该技术会自动删除不重要的预测因子。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Open Transportation Journal Social Sciences-Transportation

CiteScore

2.10

自引率

0.00%

发文量