Statistical Analysis and Data Mining: The ASA Data Science Journal最新文献_第8页

Sketched Stochastic Dictionary Learning for large‐scale data and application to high‐throughput mass spectrometry 大规模数据的随机字典学习和高通量质谱的应用

Statistical Analysis and Data Mining: The ASA Data Science Journal Pub Date : 2021-08-20 DOI: 10.1002/sam.11542

O. Permiakova, T. Burger

引用次数: 2

Weighted validation of heteroscedastic regression models for better selection 异方差回归模型的加权验证，以获得更好的选择

Statistical Analysis and Data Mining: The ASA Data Science Journal Pub Date : 2021-08-17 DOI: 10.1002/sam.11544

Yoonsuh Jung, Hayoung Kim

引用次数: 0

Modal linear regression models with multiplicative distortion measurement errors 具有乘性失真测量误差的模态线性回归模型

Statistical Analysis and Data Mining: The ASA Data Science Journal Pub Date : 2021-08-10 DOI: 10.1002/sam.11541

Jun Zhang, Gaorong Li, Yiping Yang

引用次数: 10

Multivariate Gaussian RBF‐net for smooth function estimation and variable selection 多元高斯RBF - net平滑函数估计和变量选择

Statistical Analysis and Data Mining: The ASA Data Science Journal Pub Date : 2021-08-03 DOI: 10.1002/sam.11540

Arkaprava Roy

{"title":"Multivariate Gaussian RBF‐net for smooth function estimation and variable selection","authors":"Arkaprava Roy","doi":"10.1002/sam.11540","DOIUrl":"https://doi.org/10.1002/sam.11540","url":null,"abstract":"Neural networks are routinely used for nonparametric regression modeling. The interest in these models is growing with ever‐increasing complexities in modern datasets. With modern technological advancements, the number of predictors frequently exceeds the sample size in many application areas. Thus, selecting important predictors from the huge pool is an extremely important task for judicious inference. This paper proposes a novel flexible class of single‐layer radial basis functions (RBF) networks. The proposed architecture can estimate smooth unknown regression functions and also perform variable selection. We primarily focus on Gaussian RBF‐net due to its attractive properties. The extensions to other choices of RBF are fairly straightforward. The proposed architecture is also shown to be effective in identifying relevant predictors in a low‐dimensional setting using the posterior samples without imposing any sparse estimation scheme. We develop an efficient Markov chain Monte Carlo algorithm to generate posterior samples of the parameters. We illustrate the proposed method's empirical efficacy through simulation experiments, both in high and low dimensional regression problems. The posterior contraction rate is established with respect to empirical ℓ2 distance assuming that the error variance is unknown, and the true function belongs to a Hölder ball. We illustrate our method in a Human Connectome Project dataset to predict vocabulary comprehension and to identify important edges of the structural connectome.","PeriodicalId":342679,"journal":{"name":"Statistical Analysis and Data Mining: The ASA Data Science Journal","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133067313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Negative binomial graphical model with excess zeros 带有多余零的负二项图模型

Statistical Analysis and Data Mining: The ASA Data Science Journal Pub Date : 2021-07-21 DOI: 10.1002/sam.11536

Beomjin Park, Hosik Choi, Changyi Park

引用次数: 2

Evaluation and interpretation of driving risks: Automobile claim frequency modeling with telematics data 驾驶风险的评估与解释:基于远程信息处理数据的汽车索赔频率建模

Statistical Analysis and Data Mining: The ASA Data Science Journal Pub Date : 2021-07-20 DOI: 10.2139/ssrn.3910216

Yaqian Gao, Yifan Huang, Shengwang Meng

{"title":"Evaluation and interpretation of driving risks: Automobile claim frequency modeling with telematics data","authors":"Yaqian Gao, Yifan Huang, Shengwang Meng","doi":"10.2139/ssrn.3910216","DOIUrl":"https://doi.org/10.2139/ssrn.3910216","url":null,"abstract":"With the development of vehicle telematics and data mining technology, usage‐based insurance (UBI) has aroused widespread interest from both academia and industry. The extensive driving behavior features make it possible to further understand the risks of insured vehicles, but pose challenges in the identification and interpretation of important ratemaking factors. This study, based on the telematics data of policyholders in China's mainland, analyzes insurance claim frequency of commercial trucks using both Poisson regression and several machine learning models, including regression tree, random forest, gradient boosting tree, XGBoost and neural network. After selecting the best model, we analyze feature importance, feature effects and the contribution of each feature to the prediction from an actuarial perspective. Our empirical study shows that XGBoost greatly outperforms the traditional models and detects some important risk factors, such as the average speed, the average mileage traveled per day, the fraction of night driving, the number of sudden brakes and the fraction of left/right turns at intersections. These features usually have a nonlinear effect on driving risk, and there are complex interactions between features. To further distinguish high−/low‐risk drivers, we run supervised clustering for risk segmentation according to drivers' driving habits. In summary, this study not only provide a more accurate prediction of driving risk, but also greatly satisfy the interpretability requirements of insurance regulators and risk management.","PeriodicalId":342679,"journal":{"name":"Statistical Analysis and Data Mining: The ASA Data Science Journal","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130150275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Power grid frequency prediction using spatiotemporal modeling 基于时空建模的电网频率预测

Statistical Analysis and Data Mining: The ASA Data Science Journal Pub Date : 2021-07-06 DOI: 10.1002/sam.11535

Amanda Lenzi, J. Bessac, M. Anitescu

引用次数: 7

Analyzing relevance vector machines using a single penalty approach 使用单一惩罚方法分析相关向量机

Statistical Analysis and Data Mining: The ASA Data Science Journal Pub Date : 2021-07-05 DOI: 10.1002/sam.11551

A. Dixit, Vivekananda Roy

{"title":"Analyzing relevance vector machines using a single penalty approach","authors":"A. Dixit, Vivekananda Roy","doi":"10.1002/sam.11551","DOIUrl":"https://doi.org/10.1002/sam.11551","url":null,"abstract":"Relevance vector machine (RVM) is a popular sparse Bayesian learning model typically used for prediction. Recently it has been shown that improper priors assumed on multiple penalty parameters in RVM may lead to an improper posterior. Currently in the literature, the sufficient conditions for posterior propriety of RVM do not allow improper priors over the multiple penalty parameters. In this article, we propose a single penalty relevance vector machine (SPRVM) model in which multiple penalty parameters are replaced by a single penalty and we consider a semi‐Bayesian approach for fitting the SPRVM. The necessary and sufficient conditions for posterior propriety of SPRVM are more liberal than those of RVM and allow for several improper priors over the penalty parameter. Additionally, we also prove the geometric ergodicity of the Gibbs sampler used to analyze the SPRVM model and hence can estimate the asymptotic standard errors associated with the Monte Carlo estimate of the means of the posterior predictive distribution. Such a Monte Carlo standard error cannot be computed in the case of RVM, since the rate of convergence of the Gibbs sampler used to analyze RVM is not known. The predictive performance of RVM and SPRVM is compared by analyzing two simulation examples and three real life datasets.","PeriodicalId":342679,"journal":{"name":"Statistical Analysis and Data Mining: The ASA Data Science Journal","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132586838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Coefficient tree regression for generalized linear models 广义线性模型的系数树回归

Statistical Analysis and Data Mining: The ASA Data Science Journal Pub Date : 2021-07-02 DOI: 10.1002/sam.11534

Özge Sürer, D. Apley, E. Malthouse

引用次数: 2

Fourier neural networks as function approximators and differential equation solvers 傅里叶神经网络作为函数逼近器和微分方程求解器

Statistical Analysis and Data Mining: The ASA Data Science Journal Pub Date : 2021-06-22 DOI: 10.1002/sam.11531

M. Ngom, O. Marin

引用次数: 14