Julia Sharp, Emily H. Griffith, Bruce A. Craig, Alexandra Hanlon, Sarah Peskoe, Jennifer Van Mullekom
{"title":"The current landscape of academic statistical and data science collaboration units with examples","authors":"Julia Sharp, Emily H. Griffith, Bruce A. Craig, Alexandra Hanlon, Sarah Peskoe, Jennifer Van Mullekom","doi":"10.1002/sta4.718","DOIUrl":"https://doi.org/10.1002/sta4.718","url":null,"abstract":"The delivery of academic statistical collaboration resources can vary among types of institutions and across time. In particular, this variation might occur in the management of infrastructure and the business model, the staffing model and opportunities for staff development. In this manuscript, we present examples of these three themes in modern academic statistical collaboration units and describe key advantages and challenges.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"67 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141774141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"New two‐sample test utilizing interpoint distance discrepancy","authors":"Dong Xu","doi":"10.1002/sta4.712","DOIUrl":"https://doi.org/10.1002/sta4.712","url":null,"abstract":"In this paper, we propose a novel two‐sample test for multivariate sample space. The test statistic calculates the mean of absolute difference of average interpoint distance. We utilize a permutation procedure to establish the critical value for the test. Through comprehensive simulation studies, we compare the performance of our proposed test with that of the K‐nearest neighbour test and the energy test. The results demonstrate that our proposed test exhibits advantages over the other two tests, particularly in high‐dimensional sample spaces. This superiority is further validated by its application to UCR time series datasets.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"50 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141774143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Michele Cavazzutti, Eleonora Arnone, Federico Ferraccioli, Cristina Galimberti, Livio Finos, Laura M. Sangalli
{"title":"Sign‐flip inference for spatial regression with differential regularisation","authors":"Michele Cavazzutti, Eleonora Arnone, Federico Ferraccioli, Cristina Galimberti, Livio Finos, Laura M. Sangalli","doi":"10.1002/sta4.711","DOIUrl":"https://doi.org/10.1002/sta4.711","url":null,"abstract":"SummaryWe address the problem of performing inference on the linear and nonlinear terms of a semiparametric spatial regression model with differential regularisation. For the linear term, we propose a new resampling procedure, based on (partial) sign‐flipping of an appropriate transformation of the residuals of the model. The proposed resampling scheme can mitigate the bias effect induced by the differential regularisation. We prove that the proposed test is asymptotically exact. Moreover, we show, by simulation studies, that it enjoys very good control of Type‐I error also in small sample scenarios, differently from parametric alternatives. Additionally, we show that the proposed test has higher power with respect than recently proposed nonparametric tests on the linear term of semiparametric regression models with differential regularisation. Concerning the nonlinear term, we develop three different inference approaches: a parametric one and two nonparametric alternatives. The nonparametric tests are based on a sign‐flip approach. One of these is proved to be asymptotically exact, while the other is proved to be exact also for finite samples. Simulation studies highlight the good control of Type‐I error of the nonparametric approaches with respect the parametric test, while retaining high power.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"48 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141739153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tzung Hsuen Khoo, Dharini Pathmanathan, Philipp Otto, Sophie Dabo‐Niang
{"title":"A Markov‐switching spatio‐temporal ARCH model","authors":"Tzung Hsuen Khoo, Dharini Pathmanathan, Philipp Otto, Sophie Dabo‐Niang","doi":"10.1002/sta4.713","DOIUrl":"https://doi.org/10.1002/sta4.713","url":null,"abstract":"Stock market indices are volatile by nature, and sudden shocks are known to affect volatility patterns. The autoregressive conditional heteroskedasticity (ARCH) and generalized ARCH (GARCH) models neglect structural breaks triggered by sudden shocks that may lead to an overestimation of persistence, causing an upward bias in the estimates. Different regime‐switching models that have abrupt regime‐switching governed by a Markov chain were developed to model volatility in financial time series data. Volatility modelling was also extended to spatially interconnected time series, resulting in spatial variants of ARCH models. This inspired us to propose a Markov switching framework of the spatio‐temporal log‐ARCH model. In this article, we discuss the Markov‐switching extension of the model, the estimation procedure and the smooth inferences of the regimes. The Monte Carlo simulation studies show that the maximum likelihood estimation method for our proposed model has good finite sample properties. The proposed model was applied to 28 stock indices' data that were presumably affected by the 2015–2016 Chinese stock market crash. The results showed that our model is a better fit compared to that of the one‐regime counterpart. Furthermore, the smoothed inference of the data indicated the approximate periods where structural breaks occurred. This model can capture structural breaks that simultaneously occur in nearby locations.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"33 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141720613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using sliced inverse mean difference for dimension reduction in multivariate time series","authors":"Hector Haffenden, Andreas Artemiou","doi":"10.1002/sta4.709","DOIUrl":"https://doi.org/10.1002/sta4.709","url":null,"abstract":"Following recent developments of dimension reduction algorithms for a multivariate time series, we propose in this work the adaptation of sliced inverse mean difference algorithm, an algorithm which was previously proposed in a standard multiple regression setting, to develop an algorithm appropriate to perform dimension reduction for a multivariate time series. The resulting algorithm called time series sliced inverse mean difference (TSIMD) is shown to be able to identify important directions and important lags using less significant pairs than previously proposed algorithms for dimension reduction in multivariate time series. We demonstrate the competitive performance of our algorithms through a number of experiments.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"327 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141615021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chasz Griego, Nicky Agate, Ana‐Maria Iosif, Amy M. Crisp
{"title":"What is it that you say you do here? Advocating for the critical role of data scientists in research infrastructure","authors":"Chasz Griego, Nicky Agate, Ana‐Maria Iosif, Amy M. Crisp","doi":"10.1002/sta4.714","DOIUrl":"https://doi.org/10.1002/sta4.714","url":null,"abstract":"Clinical and academic research continues to become more complex as our knowledge and technology advance. A substantial and growing number of specialists in biostatistics, data science and library sciences are needed to support these research systems and promote high‐calibre research. However, that support is often marginalized as optional rather than a fundamental component of research infrastructure. By building research infrastructure, an institution harnesses access to tools and support/service centres that host skilled experts who approach research with best practices in mind and domain‐specific knowledge at hand. We outline the potential roles of data scientists and statisticians in research infrastructure and recommend guidelines for advocating for the institutional resources needed to support these roles in a sustainable and efficient manner for the long‐term success of the institution. We provide these guidelines in terms of resource efficiency, monetary efficiency and long‐term sustainability. We hope this work contributes to—and provides shared language for—a conversation on a broader framework beyond metrics that can be used to advocate for needed resources.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"59 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141613775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A sharper bound of the Hotelling–Solomons inequality","authors":"Yuzo Maruyama","doi":"10.1002/sta4.710","DOIUrl":"https://doi.org/10.1002/sta4.710","url":null,"abstract":"The original Hotelling–Solomons inequality states that an upper bound of the absolute difference between the mean and median, standardised by the standard deviation, is 1. However, in this paper, we introduce a new bound that depends on the sample size, which is strictly smaller than 1.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"20 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141569331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Tensor factor adjustment for image classification with pervasive noises","authors":"Xiaochuan Li, Bingnan Li, Wenzhan Song, Yuan Ke","doi":"10.1002/sta4.705","DOIUrl":"https://doi.org/10.1002/sta4.705","url":null,"abstract":"This paper studies a tensor factor model that augments samples from multiple classes. The nuisance common patterns shared across classes are characterised by pervasive noises, and the patterns that distinguish different classes are represented by class‐specific components. Additionally, the pervasive component is modelled by the production of a low‐rank tensor latent factor and several factor loading matrices. This augmented tensor factor model can be expanded to a series of matrix variate tensor factor models and estimated using principal component analysis. The ranks of latent factors are estimated using a modified eigen‐ratio method. The proposed estimators have fast convergence rates and enjoy the blessing of dimensionality. The proposed factor model is applied to address the challenge of overlapping issues in image classification through a factor adjustment procedure. The procedure is shown to be powerful through synthetic experiments and an application to COVID‐19 pneumonia diagnosis from frontal chest X‐ray images.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"16 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141506481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Solving the missing at random problem in semi‐supervised learning: An inverse probability weighting method","authors":"Jin Su, Shuyi Zhang, Yong Zhou","doi":"10.1002/sta4.707","DOIUrl":"https://doi.org/10.1002/sta4.707","url":null,"abstract":"We propose an estimator for the population mean under the semi‐supervised learning setting with the Missing at Random (MAR) assumption. This setting assumes that the probability of observing , denoted by , depends on the total sample size and satisfies . To efficiently estimate , we introduce an adaptive estimator based on inverse probability weighting and cross‐fitting. Theoretical analysis reveals that our proposed estimator is consistent and efficient, with a convergence rate of , slower than the typical rate, due to the diminishing proportion of labelled data as the sample size increases in the semi‐supervised setting. We also prove the consistency of inverse probability weighting (IPW)–Nadaraya–Watson density function estimators. Extensive simulations and an application to the Los Angeles homeless data validate the effectiveness of our approach.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"29 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141506482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Graph spatial sampling","authors":"Li‐Chun Zhang","doi":"10.1002/sta4.708","DOIUrl":"https://doi.org/10.1002/sta4.708","url":null,"abstract":"We develop lagged Metropolis–Hastings walk for sampling from simple undirected graphs according to given stationary sampling probabilities. It is explained how the technique can be applied together with designed graphs for sampling of units‐in‐space. Compared with the existing spatial sampling methods, which chiefly focus on the sample spatial balance regardless of the associated outcomes of interest, the proposed graph spatial sampling method can considerably improve the efficiency because the graph can be designed to take into account the anticipated spatial distribution of the outcome of interest.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"24 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141506479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}