{"title":"Analysis and Simulation of Extremes and Rare Events in Complex Systems","authors":"Meagan Carney, H. Kantz, M. Nicol","doi":"10.1007/978-3-030-51264-4_7","DOIUrl":"https://doi.org/10.1007/978-3-030-51264-4_7","url":null,"abstract":"","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116815409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Nonparametric sequential change-point detection for multivariate time series based on empirical distribution functions","authors":"I. Kojadinovic, Ghislain Verdier","doi":"10.1214/21-EJS1798","DOIUrl":"https://doi.org/10.1214/21-EJS1798","url":null,"abstract":"The aim of sequential change-point detection is to issue an alarm when it is thought that certain probabilistic properties of the monitored observations have changed. This work is concerned with nonparametric, closed-end testing procedures based on differences of empirical distribution functions that are designed to be particularly sensitive to changes in the comtemporary distribution of multivariate time series. The proposed detectors are adaptations of statistics used in a posteriori (offline) change-point testing and involve a weighting allowing to give more importance to recent observations. The resulting sequential change-point detection procedures are carried out by comparing the detectors to threshold functions estimated through resampling such that the probability of false alarm remains approximately constant over the monitoring period. A generic result on the asymptotic validity of such a way of estimating a threshold function is stated. As a corollary, the asymptotic validity of the studied sequential tests based on empirical distribution functions is proven when these are carried out using a dependent multiplier bootstrap for multivariate time series. Large-scale Monte Carlo experiments demonstrate the good finite-sample properties of the resulting procedures. The application of the derived sequential tests is illustrated on financial data.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115074170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Modeling Nonstationary and Asymmetric Multivariate Spatial Covariances via Deformations","authors":"Quan Vu, A. Zammit‐Mangion, N. Cressie","doi":"10.5705/ss.202020.0156","DOIUrl":"https://doi.org/10.5705/ss.202020.0156","url":null,"abstract":"Multivariate spatial-statistical models are useful for modeling environmental and socio-demographic processes. The most commonly used models for multivariate spatial covariances assume both stationarity and symmetry for the cross-covariances, but these assumptions are rarely tenable in practice. In this article we introduce a new and highly flexible class of nonstationary and asymmetric multivariate spatial covariance models that are constructed by modeling the simpler and more familiar stationary and symmetric multivariate covariances on a warped domain. Inspired by recent developments in the univariate case, we propose modeling the warping function as a composition of a number of simple injective warping functions in a deep-learning framework. Importantly, covariance-model validity is guaranteed by construction. We establish the types of warpings that allow for symmetry and asymmetry, and we use likelihood-based methods for inference that are computationally efficient. The utility of this new class of models is shown through various data illustrations, including a simulation study on nonstationary data and an application on ocean temperatures at two different depths.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133264848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Stratification and Optimal Resampling for Sequential Monte Carlo","authors":"Yichao Li, Wenshuo Wang, Ke Deng, Jun S. Liu","doi":"10.1093/BIOMET/ASAB004","DOIUrl":"https://doi.org/10.1093/BIOMET/ASAB004","url":null,"abstract":"Sequential Monte Carlo (SMC), also known as particle filters, has been widely accepted as a powerful computational tool for making inference with dynamical systems. A key step in SMC is resampling, which plays the role of steering the algorithm towards the future dynamics. Several strategies have been proposed and used in practice, including multinomial resampling, residual resampling (Liu and Chen 1998), optimal resampling (Fearnhead and Clifford 2003), stratified resampling (Kitagawa 1996), and optimal transport resampling (Reich 2013). We show that, in the one dimensional case, optimal transport resampling is equivalent to stratified resampling on the sorted particles, and they both minimize the resampling variance as well as the expected squared energy distance between the original and resampled empirical distributions; in the multidimensional case, the variance of stratified resampling after sorting particles using Hilbert curve (Gerber et al. 2019) in $mathbb{R}^d$ is $O(m^{-(1+2/d)})$, an improved rate compared to the original $O(m^{-(1+1/d)})$, where $m$ is the number of particles. This improved rate is the lowest for ordered stratified resampling schemes, as conjectured in Gerber et al. (2019). We also present an almost sure bound on the Wasserstein distance between the original and Hilbert-curve-resampled empirical distributions. In light of these theoretical results, we propose the stratified multiple-descendant growth (SMG) algorithm, which allows us to explore the sample space more efficiently compared to the standard i.i.d. multiple-descendant sampling-resampling approach as measured by the Wasserstein metric. Numerical evidence is provided to demonstrate the effectiveness of our proposed method.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131324779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A novel change-point approach for the detection of gas emission sources using remotely contained concentration data","authors":"I. Eckley, C. Kirch, S. Weber","doi":"10.1214/20-aoas1345","DOIUrl":"https://doi.org/10.1214/20-aoas1345","url":null,"abstract":"Motivated by an example from remote sensing of gas emission sources, we derive two novel change point procedures for multivariate time series where, in contrast to classical change point literature, the changes are not required to be aligned in the different components of the time series. Instead the change points are described by a functional relationship where the precise shape depends on unknown parameters of interest such as the source of the gas emission in the above example. Two different types of tests and the corresponding estimators for the unknown parameters describing the change locations are proposed. We derive the null asymptotics for both tests under weak assumptions on the error time series and show asymptotic consistency under alternatives. Furthermore, we prove consistency for the corresponding estimators of the parameters of interest. The small sample behavior of the methodology is assessed by means of a simulation study and the above remote sensing example analyzed in detail.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"325 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122818475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bootstrap p-values reduce type 1 error of the robust rank-order test of difference in medians","authors":"Nirvik Sinha","doi":"10.17632/397FM8XDZ2.1","DOIUrl":"https://doi.org/10.17632/397FM8XDZ2.1","url":null,"abstract":"The robust rank-order test (Fligner and Policello, 1981) was designed as an improvement of the non-parametric Wilcoxon-Mann-Whitney U-test to be more appropriate when the samples being compared have unequal variance. However, it tends to be excessively liberal when the samples are asymmetric. This is likely because the test statistic is assumed to have a standard normal distribution for sample sizes > 12. This work proposes an on-the-fly method to obtain the distribution of the test statistic from which the critical/p-value may be computed directly. The method of likelihood maximization is used to estimate the parameters of the parent distributions of the samples being compared. Using these estimated populations, the null distribution of the test statistic is obtained by the Monte-Carlo method. Simulations are performed to compare the proposed method with that of standard normal approximation of the test statistic. For small sample sizes (<= 20), the Monte-Carlo method outperforms the normal approximation method. This is especially true for low values of significance levels (< 5%). Additionally, when the smaller sample has the larger standard deviation, the Monte-Carlo method outperforms the normal approximation method even for large sample sizes (= 40/60). The two methods do not differ in power. Finally, a Monte-Carlo sample size of 10^4 is found to be sufficient to obtain the aforementioned relative improvements in performance. Thus, the results of this study pave the way for development of a toolbox to perform the robust rank-order test in a distribution-free manner.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127579468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Trambak Banerjee, B. Bhattacharya, Gourab Mukherjee
{"title":"A nearest-neighbor based nonparametric test for viral remodeling in heterogeneous single-cell proteomic data","authors":"Trambak Banerjee, B. Bhattacharya, Gourab Mukherjee","doi":"10.1214/20-aoas1362","DOIUrl":"https://doi.org/10.1214/20-aoas1362","url":null,"abstract":"An important problem in contemporary immunology studies based on single-cell protein expression data is to determine whether cellular expressions are remodeled post infection by a pathogen. One natural approach for detecting such changes is to use non-parametric two-sample statistical tests. However, in single-cell studies, direct application of these tests is often inadequate because single-cell level expression data from uninfected populations often contains attributes of several latent sub-populations with highly heterogeneous characteristics. As a result, viruses often infect these different sub-populations at different rates in which case the traditional nonparametric two-sample tests for checking similarity in distributions are no longer conservative. We propose a new nonparametric method for Testing Remodeling Under Heterogeneity (TRUH) that can accurately detect changes in the infected samples compared to possibly heterogeneous uninfected samples. Our testing framework is based on composite nulls and is designed to allow the null model to encompass the possibility that the infected samples, though unaltered by the virus, might be dominantly arising from under-represented sub-populations in the baseline data. The TRUH statistic, which uses nearest neighbor projections of the infected samples into the baseline uninfected population, is calibrated using a novel bootstrap algorithm. We demonstrate the non-asymptotic performance of the test via simulation experiments and derive the large sample limit of the test statistic, which provides theoretical support towards consistent asymptotic calibration of the test. We use the TRUH statistic for studying remodeling in tonsillar T cells under different types of HIV infection and find that unlike traditional tests, TRUH based statistical inference conforms to the biologically validated immunological theories on HIV infection.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125897887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Finite space Kantorovich problem with an MCMC of table moves","authors":"Giovanni Pistone, Fabio Rapallo, M. Rogantin","doi":"10.1214/21-EJS1804","DOIUrl":"https://doi.org/10.1214/21-EJS1804","url":null,"abstract":"In Optimal Transport (OT) on a finite metric space, one defines a distance on the probability simplex that extends the distance on the ground space. The distance is the value of a Linear Programming (LP) problem on the set of nonegative-valued 2-way tables with assigned probability functions as margins. We apply to this case the methodology of moves from Algebraic Statistics (AS) and use it to derive an Monte Carlo Markov Chain solution algorithm.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"142 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114449020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
arXiv: MethodologyPub Date : 2020-02-17DOI: 10.13140/RG.2.2.18014.59206/1
Oliver Y. Ch'en, Raúl G. Saraiva, G. Nagels, Huy P Phan, Tom Schwantje, H. Cao, Jiangtao Gou, Jenna M. Reinen, Bin Xiong, M. Vos
{"title":"Thou Shalt Not Reject the P-value","authors":"Oliver Y. Ch'en, Raúl G. Saraiva, G. Nagels, Huy P Phan, Tom Schwantje, H. Cao, Jiangtao Gou, Jenna M. Reinen, Bin Xiong, M. Vos","doi":"10.13140/RG.2.2.18014.59206/1","DOIUrl":"https://doi.org/10.13140/RG.2.2.18014.59206/1","url":null,"abstract":"Since its debut in the 18th century, the P-value has been an important part of hypothesis testing-based scientific discoveries. As the statistical engine accelerates, questions are beginning to be raised, asking to what extent scientific discoveries based on a P-value are reliable and reproducible, and the voice calling for adjusting the significance level or banning the P-value has been increasingly heard. Inspired by these questions and discussions, here we enquire into the useful roles and misuses of the P-value in scientific studies. For common misuses and misinterpretations, we provide modest recommendations for practitioners. Additionally, we compare statistical significance with clinical relevance. In parallel, we review the Bayesian alternatives for seeking evidence. Finally, we discuss the promises and risks of using meta-analysis to pool P-values from multiple studies to aggregate evidence. Taken together, the P-value underpins a useful probabilistic decision-making system and provides evidence at a continuous scale. But its interpretation must be contextual, considering the scientific question, experimental design (including model specification, sample size, and significance level), statistical power, effect size, and reproducibility.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127676322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Computationally efficient univariate filtering for massive data.","authors":"M. Tsagris, A. Alenazi, S. Fafalios","doi":"10.1285/I20705948V13N2P390","DOIUrl":"https://doi.org/10.1285/I20705948V13N2P390","url":null,"abstract":"The vast availability of large scale, massive and big data has increased the computational cost of data analysis. One such case is the computational cost of the univariate filtering which typically involves fitting many univariate regression models and is essential for numerous variable selection algorithms to reduce the number of predictor variables. The paper manifests how to dramatically reduce that computational cost by employing the score test or the simple Pearson correlation (or the t-test for binary responses). Extensive Monte Carlo simulation studies will demonstrate their advantages and disadvantages compared to the likelihood ratio test and examples with real data will illustrate the performance of the score test and the log-likelihood ratio test under realistic scenarios. Depending on the regression model used, the score test is 30 - 60,000 times faster than the log-likelihood ratio test and produces nearly the same results. Hence this paper strongly recommends to substitute the log-likelihood ratio test with the score test when coping with large scale data, massive data, big data, or even with data whose sample size is in the order of a few tens of thousands or higher.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121522752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}