{"title":"Neyman’s truncation test for two-sample means under high dimensional setting","authors":"Ping Dong, Lu Lin","doi":"10.1214/21-bjps519","DOIUrl":"https://doi.org/10.1214/21-bjps519","url":null,"abstract":"Abstract. Multivariate two-sample testing problems often arise from the statistical analysis for scientific data, especially for bioinformatics data. To detect components with different values between two mean vectors, well-known procedures are to apply Sum-of-Squares type tests, such as Hotelling’s T 2-test. However, such a test is not suitable to high dimensional settings because of singular covariance matrix and accumulated errors. Nowadays, a lot of test methods for high dimensional data are developed, mainly including two types, Sum-of-Squares type and Max type. The Sum-of-Squares type test statistics have poor performance against sparse alternatives. And the Max type test statistic is not powerful enough to deal with non-sparse datasets. In this paper, we propose a Max-Partial-Sum type statistic named Neyman’s Truncation test, which is conducted by maximum partial sums of marginal test statistics. Besides non-sparse datasets, Neyman’s Truncation test also has great power against dense and sparse alternatives. The asymptotic distribution of the test statistic under null hypothesis is obtained and the power of the test is analyzed. To avoid the slow convergence rate of the asymptotic distribution, we realize our method by Bootstrap procedures. Simulation studies and the analysis of leukemia dataset are carried out to verify the numerical performance.","PeriodicalId":51242,"journal":{"name":"Brazilian Journal of Probability and Statistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43713670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fast feature selection via streamwise procedure for massive data","authors":"Bingqing Lin, Zhen Pang, Jun Zhang, Cuiqing Chen","doi":"10.1214/21-bjps516","DOIUrl":"https://doi.org/10.1214/21-bjps516","url":null,"abstract":"","PeriodicalId":51242,"journal":{"name":"Brazilian Journal of Probability and Statistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44260826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Valid properties of truncated Student-t regression model with applications in analysis of censored data","authors":"Chi Zhang, Guozheng Tian, Yibo Zhai, Y. Fei","doi":"10.1214/21-bjps521","DOIUrl":"https://doi.org/10.1214/21-bjps521","url":null,"abstract":"Kim (2008) introduced an incorrect stochastic representation (SR) for the truncated Student-t (Tt) random variable. By pointing out that the gamma mixture based on a truncated normal distribution actually cannot result in a true Tt distribution, in this paper, we first propose three correct SRs and then recalculate the corresponding moments of the Tt distribution. Different from those derived by following the invalid SR of Kim (2008), the correct moments of the Tt distribution play a crucial role in parameter estimations. Based on the third SR proposed and the correct expressions of truncated moments, expectation–maximization (EM) algorithms are developed for calculating the maximum likelihood estimates of parameters in the Tt distribution. Extensions to a Tt regression model and a t interval–censored regression model are provided as well. Simulated experiments are conducted to evaluate the performance of the proposed methods. Finally, two real data analyses corroborate the theoretical results.","PeriodicalId":51242,"journal":{"name":"Brazilian Journal of Probability and Statistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43056982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The discrete renewal equation with nonsummable inhomogeneous term","authors":"M. Sgibnev","doi":"10.1214/21-bjps517","DOIUrl":"https://doi.org/10.1214/21-bjps517","url":null,"abstract":"","PeriodicalId":51242,"journal":{"name":"Brazilian Journal of Probability and Statistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48743808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Unit level model for small area estimation with count data under square root transformation","authors":"Kelly C. M. Gonçalves, M. Ghosh","doi":"10.1214/21-bjps513","DOIUrl":"https://doi.org/10.1214/21-bjps513","url":null,"abstract":"Abstract. In recent years, the demand for small area statistics has greatly increased worldwide. Small area models are formulated with random area-specific effects assumed to account for the between-area variation that is not explained by auxiliary variables. The unit level models relate the unit values of a study variable to unit-specific covariates. The main aim of this paper is to consider small area estimation under unit level models based on count data. In particular, instead of modelling the variables assuming the Poisson distribution, which is a usual choice, we consider the square root transformation of the original data. One practical advantage is that the proposed transformation achieves approximate homoscedasticity of the error variances, reducing one layer of estimation problem. Inference for the model is carried out under the hierarchical Bayes approach. The square root transformation is evaluated under a simulation study and two design-based studies with real datasets.","PeriodicalId":51242,"journal":{"name":"Brazilian Journal of Probability and Statistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42563947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Unadjusted Langevin algorithm for sampling a mixture of weakly smooth potentials","authors":"D. Nguyen","doi":"10.1214/22-bjps538","DOIUrl":"https://doi.org/10.1214/22-bjps538","url":null,"abstract":". Discretization of continuous-time diffusion processes is a widely recognized method for sampling. However, it seems to be a considerable restriction when the potentials are often required to be smooth (gradient Lipschitz). This paper studies the problem of sampling through Euler discretization, where the potential function is assumed to be a mixture of weakly smooth distributions and satisfies weakly dissipative. We establish the convergence in Kullback-Leibler (KL) divergence with the number of iterations to reach (cid:15) neighborhood of a target distribution in only polynomial dependence on the dimension. We relax the degenerated convex at infinity conditions of Erdogdu and Hosseinzadeh (2020) and prove convergence guarantees under Poincaré inequality or non-strongly convex outside the ball. In addition, we also provide convergence in L β -Wasserstein metric for the smoothing potential.","PeriodicalId":51242,"journal":{"name":"Brazilian Journal of Probability and Statistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2021-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45608926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A robust partial least squares approach for function-on-function regression","authors":"U. Beyaztas, H. Shang","doi":"10.1214/21-bjps523","DOIUrl":"https://doi.org/10.1214/21-bjps523","url":null,"abstract":"The function-on-function linear regression model in which the response and predictors consist of random curves has become a general framework to investigate the relationship between the functional response and functional predictors. Existing methods to estimate the model parameters may be sensitive to outlying observations, common in empirical applications. In addition, these methods may be severely affected by such observations, leading to undesirable estimation and prediction results. A robust estimation method, based on iteratively reweighted simple partial least squares, is introduced to improve the prediction accuracy of the function-on-function linear regression model in the presence of outliers. The performance of the proposed method is based on the number of partial least squares components used to estimate the function-on-function linear regression model. Thus, the optimum number of components is determined via a data-driven error criterion. The finite-sample performance of the proposed method is investigated via several Monte Carlo experiments and an empirical data analysis. In addition, a nonparametric bootstrap method is applied to construct pointwise prediction intervals for the response function. The results are compared with some of the existing methods to illustrate the improvement potentially gained by the proposed method.","PeriodicalId":51242,"journal":{"name":"Brazilian Journal of Probability and Statistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48499943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bayesian inference for zero-and/or-one augmented beta rectangular regression models","authors":"Ana R. S. Silva, C. Azevedo, J. Bazán, J. Nobre","doi":"10.1214/21-bjps505","DOIUrl":"https://doi.org/10.1214/21-bjps505","url":null,"abstract":"Abstract. In this paper we developed a full set of Bayesian inference tools, for zero-and/or-one augmented beta rectangular regression models to analyze limited-augmented data, under a new parameterization. This parameterization: facilitates the development of both regression models and inferential tools as well as make simplifies the respective computational implementations. The proposed Bayesian tools were parameter estimation, model fit assessment, model comparison (information criteria), residual analysis and case influence diagnostics, developed through MCMC algorithms. In addition, we adapted available methods of posterior predictive checking, using appropriate discrepancy measures. We conducted several simulation studies, considering some situations of practical interest, aiming to evaluate: prior sensitivity choice, parameter recovery of the proposed model and estimation method, the impact of transforming the observed zeros and ones, along with the use of non-augmented models, and the behavior of the proposed model fit assessment and model comparison tools. A psychometric real data set was analyzed to illustrate the performance of the developed tools, illustrating the advantages of the developed analysis framework.","PeriodicalId":51242,"journal":{"name":"Brazilian Journal of Probability and Statistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43765996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A note on the Nielsen distribution","authors":"D. Gallardo, M. Bourguignon","doi":"10.1214/21-bjps507","DOIUrl":"https://doi.org/10.1214/21-bjps507","url":null,"abstract":"Castellares, Lemonte, and Santos [Brazilian Journal of Probability and Statistics, 34(1), 90-111, 2020] introduced a two-parameter discrete Nielsen distribution, derived its properties, and illustrated the advantages of the model in three data applications. In this note, we will present a corrected version for some results for the particular case θ = 1.","PeriodicalId":51242,"journal":{"name":"Brazilian Journal of Probability and Statistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47840078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Model-assisted SCAD calibration for non-probability samples","authors":"Zhanxu Liu, Chao-Cheng Tu, Yingli Pan","doi":"10.1214/21-bjps506","DOIUrl":"https://doi.org/10.1214/21-bjps506","url":null,"abstract":"Increasing costs and non-response rates of probability samples have provoked the extensive use of non-probability samples. However, non-probability samples are subject to selection bias, resulting in difficulty for inference. Calibration is a popular method to reduce selection bias in non-probability samples. When rich covariate information is available, a key problem is how to select covariates and estimate parameters in calibration for non-probability samples. In this paper, the model-assisted SCAD calibration is proposed to make population inference from non-probability samples. A parametric model between the study variable and covariates is first established. SCAD is then used to estimate the model parameters based on non-probability samples. The modified forward Kullback-Leibler distance is lastly explored to conduct calibration for non-probability samples based on the estimated parametric model. The theoretical properties of the model-assisted SCAD calibration estimator are further derived. Results from simulation studies show that the model-assisted SCAD calibration estimator yields the smallest bias and mean square error compared with other estimators. Also, a real data from the *Correspondence author: Yingli Pan, Email: panyingli220@163.com","PeriodicalId":51242,"journal":{"name":"Brazilian Journal of Probability and Statistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46547938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}