arXiv: Methodology最新文献_第4页

Analysis and Simulation of Extremes and Rare Events in Complex Systems 复杂系统中极端和罕见事件的分析与模拟

arXiv: Methodology Pub Date : 2020-05-11 DOI: 10.1007/978-3-030-51264-4_7

Meagan Carney, H. Kantz, M. Nicol

引用次数: 2

Nonparametric sequential change-point detection for multivariate time series based on empirical distribution functions 基于经验分布函数的多变量时间序列非参数序列变化点检测

arXiv: Methodology Pub Date : 2020-04-26 DOI: 10.1214/21-EJS1798

I. Kojadinovic, Ghislain Verdier

{"title":"Nonparametric sequential change-point detection for multivariate time series based on empirical distribution functions","authors":"I. Kojadinovic, Ghislain Verdier","doi":"10.1214/21-EJS1798","DOIUrl":"https://doi.org/10.1214/21-EJS1798","url":null,"abstract":"The aim of sequential change-point detection is to issue an alarm when it is thought that certain probabilistic properties of the monitored observations have changed. This work is concerned with nonparametric, closed-end testing procedures based on differences of empirical distribution functions that are designed to be particularly sensitive to changes in the comtemporary distribution of multivariate time series. The proposed detectors are adaptations of statistics used in a posteriori (offline) change-point testing and involve a weighting allowing to give more importance to recent observations. The resulting sequential change-point detection procedures are carried out by comparing the detectors to threshold functions estimated through resampling such that the probability of false alarm remains approximately constant over the monitoring period. A generic result on the asymptotic validity of such a way of estimating a threshold function is stated. As a corollary, the asymptotic validity of the studied sequential tests based on empirical distribution functions is proven when these are carried out using a dependent multiplier bootstrap for multivariate time series. Large-scale Monte Carlo experiments demonstrate the good finite-sample properties of the resulting procedures. The application of the derived sequential tests is illustrated on financial data.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115074170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Modeling Nonstationary and Asymmetric Multivariate Spatial Covariances via Deformations 基于变形的非平稳和非对称多元空间协方差建模

arXiv: Methodology Pub Date : 2020-04-18 DOI: 10.5705/ss.202020.0156

Quan Vu, A. Zammit‐Mangion, N. Cressie

{"title":"Modeling Nonstationary and Asymmetric Multivariate Spatial Covariances via Deformations","authors":"Quan Vu, A. Zammit‐Mangion, N. Cressie","doi":"10.5705/ss.202020.0156","DOIUrl":"https://doi.org/10.5705/ss.202020.0156","url":null,"abstract":"Multivariate spatial-statistical models are useful for modeling environmental and socio-demographic processes. The most commonly used models for multivariate spatial covariances assume both stationarity and symmetry for the cross-covariances, but these assumptions are rarely tenable in practice. In this article we introduce a new and highly flexible class of nonstationary and asymmetric multivariate spatial covariance models that are constructed by modeling the simpler and more familiar stationary and symmetric multivariate covariances on a warped domain. Inspired by recent developments in the univariate case, we propose modeling the warping function as a composition of a number of simple injective warping functions in a deep-learning framework. Importantly, covariance-model validity is guaranteed by construction. We establish the types of warpings that allow for symmetry and asymmetry, and we use likelihood-based methods for inference that are computationally efficient. The utility of this new class of models is shown through various data illustrations, including a simulation study on nonstationary data and an application on ocean temperatures at two different depths.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133264848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Stratification and Optimal Resampling for Sequential Monte Carlo 序贯蒙特卡罗的分层和最优重采样

arXiv: Methodology Pub Date : 2020-04-04 DOI: 10.1093/BIOMET/ASAB004

Yichao Li, Wenshuo Wang, Ke Deng, Jun S. Liu

{"title":"Stratification and Optimal Resampling for Sequential Monte Carlo","authors":"Yichao Li, Wenshuo Wang, Ke Deng, Jun S. Liu","doi":"10.1093/BIOMET/ASAB004","DOIUrl":"https://doi.org/10.1093/BIOMET/ASAB004","url":null,"abstract":"Sequential Monte Carlo (SMC), also known as particle filters, has been widely accepted as a powerful computational tool for making inference with dynamical systems. A key step in SMC is resampling, which plays the role of steering the algorithm towards the future dynamics. Several strategies have been proposed and used in practice, including multinomial resampling, residual resampling (Liu and Chen 1998), optimal resampling (Fearnhead and Clifford 2003), stratified resampling (Kitagawa 1996), and optimal transport resampling (Reich 2013). We show that, in the one dimensional case, optimal transport resampling is equivalent to stratified resampling on the sorted particles, and they both minimize the resampling variance as well as the expected squared energy distance between the original and resampled empirical distributions; in the multidimensional case, the variance of stratified resampling after sorting particles using Hilbert curve (Gerber et al. 2019) in $mathbb{R}^d$ is $O(m^{-(1+2/d)})$, an improved rate compared to the original $O(m^{-(1+1/d)})$, where $m$ is the number of particles. This improved rate is the lowest for ordered stratified resampling schemes, as conjectured in Gerber et al. (2019). We also present an almost sure bound on the Wasserstein distance between the original and Hilbert-curve-resampled empirical distributions. In light of these theoretical results, we propose the stratified multiple-descendant growth (SMG) algorithm, which allows us to explore the sample space more efficiently compared to the standard i.i.d. multiple-descendant sampling-resampling approach as measured by the Wasserstein metric. Numerical evidence is provided to demonstrate the effectiveness of our proposed method.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131324779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

A novel change-point approach for the detection of gas emission sources using remotely contained concentration data 一种新的变化点方法，用于检测气体排放源使用远程包含浓度数据

arXiv: Methodology Pub Date : 2020-04-02 DOI: 10.1214/20-aoas1345

I. Eckley, C. Kirch, S. Weber

引用次数: 0

Bootstrap p-values reduce type 1 error of the robust rank-order test of difference in medians 自举p值减少了中位数差异的鲁棒秩序检验的类型1误差

arXiv: Methodology Pub Date : 2020-03-09 DOI: 10.17632/397FM8XDZ2.1

Nirvik Sinha

{"title":"Bootstrap p-values reduce type 1 error of the robust rank-order test of difference in medians","authors":"Nirvik Sinha","doi":"10.17632/397FM8XDZ2.1","DOIUrl":"https://doi.org/10.17632/397FM8XDZ2.1","url":null,"abstract":"The robust rank-order test (Fligner and Policello, 1981) was designed as an improvement of the non-parametric Wilcoxon-Mann-Whitney U-test to be more appropriate when the samples being compared have unequal variance. However, it tends to be excessively liberal when the samples are asymmetric. This is likely because the test statistic is assumed to have a standard normal distribution for sample sizes > 12. This work proposes an on-the-fly method to obtain the distribution of the test statistic from which the critical/p-value may be computed directly. The method of likelihood maximization is used to estimate the parameters of the parent distributions of the samples being compared. Using these estimated populations, the null distribution of the test statistic is obtained by the Monte-Carlo method. Simulations are performed to compare the proposed method with that of standard normal approximation of the test statistic. For small sample sizes (<= 20), the Monte-Carlo method outperforms the normal approximation method. This is especially true for low values of significance levels (< 5%). Additionally, when the smaller sample has the larger standard deviation, the Monte-Carlo method outperforms the normal approximation method even for large sample sizes (= 40/60). The two methods do not differ in power. Finally, a Monte-Carlo sample size of 10^4 is found to be sufficient to obtain the aforementioned relative improvements in performance. Thus, the results of this study pave the way for development of a toolbox to perform the robust rank-order test in a distribution-free manner.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127579468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A nearest-neighbor based nonparametric test for viral remodeling in heterogeneous single-cell proteomic data 异质单细胞蛋白质组学数据中基于最近邻的病毒重构非参数检验

arXiv: Methodology Pub Date : 2020-03-05 DOI: 10.1214/20-aoas1362

Trambak Banerjee, B. Bhattacharya, Gourab Mukherjee

{"title":"A nearest-neighbor based nonparametric test for viral remodeling in heterogeneous single-cell proteomic data","authors":"Trambak Banerjee, B. Bhattacharya, Gourab Mukherjee","doi":"10.1214/20-aoas1362","DOIUrl":"https://doi.org/10.1214/20-aoas1362","url":null,"abstract":"An important problem in contemporary immunology studies based on single-cell protein expression data is to determine whether cellular expressions are remodeled post infection by a pathogen. One natural approach for detecting such changes is to use non-parametric two-sample statistical tests. However, in single-cell studies, direct application of these tests is often inadequate because single-cell level expression data from uninfected populations often contains attributes of several latent sub-populations with highly heterogeneous characteristics. As a result, viruses often infect these different sub-populations at different rates in which case the traditional nonparametric two-sample tests for checking similarity in distributions are no longer conservative. We propose a new nonparametric method for Testing Remodeling Under Heterogeneity (TRUH) that can accurately detect changes in the infected samples compared to possibly heterogeneous uninfected samples. Our testing framework is based on composite nulls and is designed to allow the null model to encompass the possibility that the infected samples, though unaltered by the virus, might be dominantly arising from under-represented sub-populations in the baseline data. The TRUH statistic, which uses nearest neighbor projections of the infected samples into the baseline uninfected population, is calibrated using a novel bootstrap algorithm. We demonstrate the non-asymptotic performance of the test via simulation experiments and derive the large sample limit of the test statistic, which provides theoretical support towards consistent asymptotic calibration of the test. We use the TRUH statistic for studying remodeling in tonsillar T cells under different types of HIV infection and find that unlike traditional tests, TRUH based statistical inference conforms to the biologically validated immunological theories on HIV infection.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125897887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Finite space Kantorovich problem with an MCMC of table moves 有限空间Kantorovich问题与表移动的MCMC

arXiv: Methodology Pub Date : 2020-02-24 DOI: 10.1214/21-EJS1804

Giovanni Pistone, Fabio Rapallo, M. Rogantin

引用次数: 3

Thou Shalt Not Reject the P-value 你不能拒绝p值

arXiv: Methodology Pub Date : 2020-02-17 DOI: 10.13140/RG.2.2.18014.59206/1

Oliver Y. Ch'en, Raúl G. Saraiva, G. Nagels, Huy P Phan, Tom Schwantje, H. Cao, Jiangtao Gou, Jenna M. Reinen, Bin Xiong, M. Vos

{"title":"Thou Shalt Not Reject the P-value","authors":"Oliver Y. Ch'en, Raúl G. Saraiva, G. Nagels, Huy P Phan, Tom Schwantje, H. Cao, Jiangtao Gou, Jenna M. Reinen, Bin Xiong, M. Vos","doi":"10.13140/RG.2.2.18014.59206/1","DOIUrl":"https://doi.org/10.13140/RG.2.2.18014.59206/1","url":null,"abstract":"Since its debut in the 18th century, the P-value has been an important part of hypothesis testing-based scientific discoveries. As the statistical engine accelerates, questions are beginning to be raised, asking to what extent scientific discoveries based on a P-value are reliable and reproducible, and the voice calling for adjusting the significance level or banning the P-value has been increasingly heard. Inspired by these questions and discussions, here we enquire into the useful roles and misuses of the P-value in scientific studies. For common misuses and misinterpretations, we provide modest recommendations for practitioners. Additionally, we compare statistical significance with clinical relevance. In parallel, we review the Bayesian alternatives for seeking evidence. Finally, we discuss the promises and risks of using meta-analysis to pool P-values from multiple studies to aggregate evidence. Taken together, the P-value underpins a useful probabilistic decision-making system and provides evidence at a continuous scale. But its interpretation must be contextual, considering the scientific question, experimental design (including model specification, sample size, and significance level), statistical power, effect size, and reproducibility.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127676322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Computationally efficient univariate filtering for massive data. 计算效率高的海量数据单变量滤波。

arXiv: Methodology Pub Date : 2020-02-11 DOI: 10.1285/I20705948V13N2P390

M. Tsagris, A. Alenazi, S. Fafalios

{"title":"Computationally efficient univariate filtering for massive data.","authors":"M. Tsagris, A. Alenazi, S. Fafalios","doi":"10.1285/I20705948V13N2P390","DOIUrl":"https://doi.org/10.1285/I20705948V13N2P390","url":null,"abstract":"The vast availability of large scale, massive and big data has increased the computational cost of data analysis. One such case is the computational cost of the univariate filtering which typically involves fitting many univariate regression models and is essential for numerous variable selection algorithms to reduce the number of predictor variables. The paper manifests how to dramatically reduce that computational cost by employing the score test or the simple Pearson correlation (or the t-test for binary responses). Extensive Monte Carlo simulation studies will demonstrate their advantages and disadvantages compared to the likelihood ratio test and examples with real data will illustrate the performance of the score test and the log-likelihood ratio test under realistic scenarios. Depending on the regression model used, the score test is 30 - 60,000 times faster than the log-likelihood ratio test and produces nearly the same results. Hence this paper strongly recommends to substitute the log-likelihood ratio test with the score test when coping with large scale data, massive data, big data, or even with data whose sample size is in the order of a few tens of thousands or higher.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121522752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0