Journal of Applied Statistics最新文献_第2页

On use of adaptive cluster sampling for variance estimation. 自适应聚类抽样在方差估计中的应用。

IF 1.1 4区数学

Journal of Applied Statistics Pub Date : 2025-02-05 eCollection Date: 2025-01-01 DOI: 10.1080/02664763.2025.2460072

Shameem Alam, Javid Shabbir, Malaika Nadeem

引用次数: 0

Influence diagnostics in the Heckman selection models based on EM algorithms. 基于EM算法的Heckman选择模型中的影响诊断。

IF 1.1 4区数学

Journal of Applied Statistics Pub Date : 2025-02-05 eCollection Date: 2025-01-01 DOI: 10.1080/02664763.2025.2461715

Marcos S Oliveira, Marcos O Prates, Christian E Galarza, Victor H Lachos

引用次数: 0

Objective Bayesian trend filtering via adaptive piecewise polynomial regression. 目的基于自适应分段多项式回归的贝叶斯趋势滤波。

IF 1.1 4区数学

Journal of Applied Statistics Pub Date : 2025-02-04 eCollection Date: 2025-01-01 DOI: 10.1080/02664763.2025.2461186

Sang Gil Kang, Yongku Kim

{"title":"Objective Bayesian trend filtering via adaptive piecewise polynomial regression.","authors":"Sang Gil Kang, Yongku Kim","doi":"10.1080/02664763.2025.2461186","DOIUrl":"https://doi.org/10.1080/02664763.2025.2461186","url":null,"abstract":"Several methods have been developed for nonparametric regression problems, including classical approaches such as kernels, local polynomials, smoothing splines, sieves, and wavelets, as well as relatively new methods such as lasso, generalized lasso, and trend filtering. This study proposes an objective Bayesian trend filtering method based on model selection. The procedure followed in this study estimates the functions based on adaptive piecewise polynomial regression models with two components. First, we determine the intervals with varying trends using Bayesian binary segmentation and then evaluate the most reasonable trend via Bayesian model selection at these intervals. This trend filtering procedure follows Bayesian model selection that uses intrinsic priors, which eliminated any subjective input. Additionally, we prove that the proposed method using these intrinsic priors was consistent when applied to large sample sizes. The behavior of the proposed Bayesian trend filtering procedure is compared with the trend filtering using a simulation study and real examples. Finally, we apply the proposed method to detect the variance change points under mean changes, whereas the existing methods yielded inaccurate estimates of the variance change points when the mean varied smoothly, as the sudden-change assumption was violated in such cases.","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 13","pages":"2357-2383"},"PeriodicalIF":1.1,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12490381/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145232665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Parametric estimation of quantile versions of Zenga and D inequality curves: methodology and application to Weibull distribution. Zenga和D不等式曲线分位数版本的参数估计：方法及其在威布尔分布中的应用。

IF 1.1 4区数学

Journal of Applied Statistics Pub Date : 2025-02-03 eCollection Date: 2025-01-01 DOI: 10.1080/02664763.2025.2458126

Sylwester Pia̧tek

{"title":"Parametric estimation of quantile versions of Zenga and D inequality curves: methodology and application to Weibull distribution.","authors":"Sylwester Pia̧tek","doi":"10.1080/02664763.2025.2458126","DOIUrl":"https://doi.org/10.1080/02664763.2025.2458126","url":null,"abstract":"Inequality (concentration) curves such as Lorenz, Bonferroni, Zenga curves, as well as a new inequality curve - the D curve, are broadly used to analyse inequalities in wealth and income distribution in certain populations. Quantile versions of these inequality curves are more robust to outliers. We discuss several parametric estimators of quantile versions of the Zenga and D curves. A minimum distance (MD) estimator is proposed for these two curves and the indices related to them. The consistency and asymptotic normality of the MD estimator is proved. The MD estimator can also be used to estimate the inequality measures corresponding to the quantile versions of the inequality curves. The estimation methods considered are illustrated in the case of the Weibull model, which has many applications in life sciences, for example, to fit the precipitation data. In econometrics it is also considered to fit incomes, especially in the case when a significant share of population have low incomes, for example, in less developed countries or among low-paid jobs.","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 12","pages":"2226-2246"},"PeriodicalIF":1.1,"publicationDate":"2025-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12416017/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145029907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Gene mutation estimations via mutual information and Ewens sampling based CNN & machine learning algorithms. 基于互信息和埃文斯采样的CNN和机器学习算法的基因突变估计。

IF 1.1 4区数学

Journal of Applied Statistics Pub Date : 2025-02-03 eCollection Date: 2025-01-01 DOI: 10.1080/02664763.2025.2460076

Wanyang Dai

{"title":"Gene mutation estimations via mutual information and Ewens sampling based CNN & machine learning algorithms.","authors":"Wanyang Dai","doi":"10.1080/02664763.2025.2460076","DOIUrl":"https://doi.org/10.1080/02664763.2025.2460076","url":null,"abstract":"We conduct gene mutation rate estimations via developing mutual information and Ewens sampling based convolutional neural network (CNN) and machine learning algorithms. More precisely, we develop a systematic methodology through constructing a CNN. Meanwhile, we develop two machine learning algorithms to study protein production with target gene sequences and protein structures. The core of the CNN and machine learning approach is to address a two-stage optimization problem to balance gene mutation rates during protein production. To wit, we try to optimally coordinate the consistency between the given input DNA sequences and the given (or optimally computed) target ones through controlling their intermediate gene mutation rates. The purposes in doing so are aimed to conduct gene editing and protein structure prediction. For example, after the gene mutation rates are estimated, the computing complexity of protein structure prediction will be reduced to a reasonable degree. Our developed CNN numerical optimization scheme consists of two newly designed machine learning algorithms. The stochastic gradients for the two algorithms are designed according to the Kuhn-Tucker conditions with boundary constraints and with the support of Ewens sampling, multi-input multi-output (MIMO) mutual information, and codon optimization techniques. The associated learning rate bounds are explicitly derived from the method and the two algorithms are numerically implemented. The convergence and optimality of the algorithms are mathematically proved. To illustrate the usage of our study, we also conduct a real-world data implementation.","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 12","pages":"2321-2353"},"PeriodicalIF":1.1,"publicationDate":"2025-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12416021/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145029916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Change point detection to analyze air pollution and its economic effects: an exponentially weighted moving average perspective. 变化点检测分析空气污染及其经济影响：指数加权移动平均视角。

IF 1.1 4区数学

Journal of Applied Statistics Pub Date : 2025-02-02 eCollection Date: 2025-01-01 DOI: 10.1080/02664763.2025.2455636

Shabbir Ahmad, Muhammad Riaz, Tahir Mahmood, Nasir Abbas

{"title":"Change point detection to analyze air pollution and its economic effects: an exponentially weighted moving average perspective.","authors":"Shabbir Ahmad, Muhammad Riaz, Tahir Mahmood, Nasir Abbas","doi":"10.1080/02664763.2025.2455636","DOIUrl":"10.1080/02664763.2025.2455636","url":null,"abstract":"Air pollution has a direct impact on every society, leading to consequential effects on the economy of a nation. Poor air quality adversely affects human health, resulting in various economic outcomes such as rising healthcare costs, diminished labor productivity, negative impacts on tourism and living standards, increased regulatory expenses for businesses, and heightened economic disparities. Effective control methods are essential to monitor factors influencing the economy, including air quality. The presence of toxic substances in the air reduces air quality, necessitating its monitoring through indices like PM10. Among statistical process control tools, control charts are the most prominent for efficient change point detection. This study introduces a new process monitoring tool that incorporates additional auxiliary information, if available, alongside the main variable of interest. The proposed methodology ensures detection ability remains robust, even under disturbances in the auxiliary variable. Furthermore, mathematical analyses reveal that many existing statistical quality control tools become special cases of the proposed structure for specific sensitivity parameter values. Evaluated through properties of run length distribution, the proposed chart allows control of the robustness-efficiency balance by adjusting its sensitivity parameter. A practical implementation demonstrates the effectiveness of the chart in monitoring air quality data.","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 11","pages":"2113-2155"},"PeriodicalIF":1.1,"publicationDate":"2025-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12404093/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144992776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

On the use and misuse of time-rescaling to assess the goodness-of-fit of self-exciting temporal point processes. 自激时间点过程拟合优度评估的时间重标化方法。

IF 1.1 4区数学

Journal of Applied Statistics Pub Date : 2025-02-02 eCollection Date: 2025-01-01 DOI: 10.1080/02664763.2025.2459245

M-A El-Aroui

{"title":"On the use and misuse of time-rescaling to assess the goodness-of-fit of self-exciting temporal point processes.","authors":"M-A El-Aroui","doi":"10.1080/02664763.2025.2459245","DOIUrl":"https://doi.org/10.1080/02664763.2025.2459245","url":null,"abstract":"The paper first highlights important drawbacks and biases related to the common use of time-rescaling to assess the goodness-of-fit (Gof) of self-exciting temporal point process (SETPP) models. Then it presents a new predictive time-rescaling approach leading to an asymptotically unbiased Gof framework for general SETPPs in the case of single observed trajectories. The predictive approach focuses on forecasting accuracy and addresses the bias problem resulting from the plugged-in estimated parameters. Dawid's prequential approach is used and the models' checking is mainly based on the forecasting accuracy of arrival times. These times are transformed, using sequentially estimated parameters, into random vectors which are proved to converge in probability under the null hypothesis and standard regulatory conditions to vectors of iid Exponential(1) rv's. Numerical experiments are used to compare the performances of the standard and predictive time-rescaling for Gof assessment of non-homogeneous Poisson and Hawkes self-exciting temporal processes. Data of Japanese seismic events are also used to illustrate the dynamic aspect of the proposed model-checking approach.","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 12","pages":"2247-2270"},"PeriodicalIF":1.1,"publicationDate":"2025-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12416029/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145029909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Gradient test to assess homogeneity of probabilities in discrete-time transition models with application in agricultural science data. 在农业科学数据中应用梯度检验评估离散时间过渡模型的概率同质性。

IF 1.1 4区数学

Journal of Applied Statistics Pub Date : 2025-02-02 eCollection Date: 2025-01-01 DOI: 10.1080/02664763.2025.2457008

Laura Vicuña Torres de Paula, Idemauro Antonio Rodrigues de Lara, Cesar Auguto Taconeli, Carolina Reigada, Rafael de Andrade Moral

{"title":"Gradient test to assess homogeneity of probabilities in discrete-time transition models with application in agricultural science data.","authors":"Laura Vicuña Torres de Paula, Idemauro Antonio Rodrigues de Lara, Cesar Auguto Taconeli, Carolina Reigada, Rafael de Andrade Moral","doi":"10.1080/02664763.2025.2457008","DOIUrl":"10.1080/02664763.2025.2457008","url":null,"abstract":"Longitudinal studies in discrete or continuous time involving categorical data are common in agricultural sciences. Transition models can be used as a means to analyse the resulting data, especially when the aim is to describe category changes over time, as well as to accommodate covariates due to experimental design. Here we focus on discrete-time models, for which it is critical to assess whether the underlying process is stationary or not. Tests based on likelihood procedures are very useful, and here we propose the Gradient test to assess stationary, or homogeneity of transition probabilities. We carried out simulation studies to evaluate the performance of the proposed test, which indicated a good performance regarding type-I error and power when compared to other classical tests available in the literature. As motivation we present two studies with agricultural data, the first one applied to entomology with nominal responses and the second application refers to the degree of injury in pigs. Using our proposed test, stationarity and non-stationarity were verified respectively in the applications. Since the gradient test to assess stationarity has a simplified structure when compared to other tests, it is therefore a useful alternative when carrying out inference in these types of models.","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 11","pages":"2172-2190"},"PeriodicalIF":1.1,"publicationDate":"2025-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12404091/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144992721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Pathway-based genetic association analysis for overdispersed count data. 过度分散计数数据的通路遗传关联分析。

IF 1.1 4区数学

Journal of Applied Statistics Pub Date : 2025-02-02 eCollection Date: 2025-01-01 DOI: 10.1080/02664763.2025.2460073

Yang Liu

{"title":"Pathway-based genetic association analysis for overdispersed count data.","authors":"Yang Liu","doi":"10.1080/02664763.2025.2460073","DOIUrl":"https://doi.org/10.1080/02664763.2025.2460073","url":null,"abstract":"Overdispersion is a common phenomenon in genetic data, such as gene expression count data. In genetic association studies, it is important to investigate the association between a gene expression and a set of genetic variants from a pathway. However, existing approaches for pathway analysis are primarily designed for continuous and binary outcomes and are not applicable to overdispersed count data. In this paper, we propose a hierarchical approach to analyze the association between an overdispersed count response and a set of low-frequency genetic variants in negative binomial regression. We derive score-type test statistics for both fixed and random effects of genetic variants, and further introduce a novel procedure for efficiently combining these two statistics for global testing. Through simulation studies, we demonstrate that the proposed method tends to be more powerful than existing methods under a wide range of scenarios. Additionally, we apply the proposed method to a colorectal cancer study, demonstrating its power in identifying associations between gene expression and somatic mutations.","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 12","pages":"2306-2320"},"PeriodicalIF":1.1,"publicationDate":"2025-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12416034/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145029923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Clustering of recurrent events data. 重复事件数据的聚类。

IF 1.1 4区数学

Journal of Applied Statistics Pub Date : 2025-01-28 eCollection Date: 2025-01-01 DOI: 10.1080/02664763.2025.2452966

G Babykina, V Vandewalle, J Carretero-Bravo

{"title":"Clustering of recurrent events data.","authors":"G Babykina, V Vandewalle, J Carretero-Bravo","doi":"10.1080/02664763.2025.2452966","DOIUrl":"10.1080/02664763.2025.2452966","url":null,"abstract":"Nowadays data are often timestamped, thus, when analysing the events which may occur several times (recurrent events), it is desirable to model the whole dynamics of the counting process rather than to focus on a total number of events. Such kind of data can be encountered in hospital readmissions, disease recurrences or repeated failures of industrial systems. Recurrent events can be analysed in the counting process framework, as in the Andersen-Gill model, assuming that the baseline intensity depends on time and on covariates, as in the Cox model. However, observed covariates are often insufficient to explain the observed heterogeneity in the data. We propose a mixture model for recurrent events, allowing to account for the unobserved heterogeneity and to perform clustering of individuals (unsupervised classification allowing to partition of the heterogeneous data according to unobserved, or latent, variables). Within each cluster, the recurrent event process intensity is specified parametrically and is adjusted for covariates. Model parameters are estimated by maximum likelihood using the EM algorithm; the BIC criterion is adopted to choose an optimal number of clusters. The model feasibility is checked on simulated data. Real data on hospital readmissions of elderly people, which motivated the development of the proposed clustering model, are analysed. The obtained results allow a fine understanding of the recurrent event process in each cluster.","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 11","pages":"2031-2059"},"PeriodicalIF":1.1,"publicationDate":"2025-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12404095/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144992763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0