{"title":"High‐dimensional feature screening for nonlinear associations with survival outcome using restricted mean survival time","authors":"Yaxian Chen, Kwok Fai Lam, Zhonghua Liu","doi":"10.1002/sta4.673","DOIUrl":"https://doi.org/10.1002/sta4.673","url":null,"abstract":"SummaryFeature screening is an important tool in analysing ultrahigh‐dimensional data, particularly in the field of Omics and oncology studies. However, most attention has been focused on identifying features that have a linear or monotonic impact on the response variable. Detecting a sparse set of variables that have a nonlinear or nonmonotonic relationship with the response variable is still a challenging task. To fill the gap, this paper proposed a robust model‐free screening approach for right‐censored survival data by providing a new perspective of quantifying the covariate effect on the restricted mean survival time, rather than the routinely used hazard function. The proposed measure, based on the difference between the restricted mean survival time of covariate‐stratified and overall data, is able to identify comprehensive types of associations including linear, nonlinear, nonmonotone and even local dependencies like change points. The sure screening property is established, and a more flexible iterative screening procedure is developed to increase the accuracy of the variable screening. Simulation studies are carried out to demonstrate the superiority of the proposed method in selecting important features with a complex association with the response variable. The potential of applying the proposed method to handle interval‐censored failure time data has also been explored in simulations, and the results have been promising. The method is applied to a breast cancer dataset to identify potential prognostic factors, which reveals potential associations between breast cancer and lymphoma.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"39 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140583941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Double verification for two‐sample covariance matrices test","authors":"Wenming Sun, Lingfeng Lyu, Xiao Guo","doi":"10.1002/sta4.670","DOIUrl":"https://doi.org/10.1002/sta4.670","url":null,"abstract":"This paper explores testing the equality of two covariance matrices under high‐dimensional settings. Existing test statistics are usually constructed based on the squared Frobenius norm or the elementwise maximum norm. However, the former may experience power loss when handling sparse alternatives, while the latter may have a poor performance against dense alternatives. In this paper, with a novel framework, we introduce a double verification test statistic designed to be powerful against both dense and sparse alternatives. Additionally, we propose an adaptive weight test statistic to enhance power. Furthermore, we present an analysis of the asymptotic size and power of the proposed test. Simulation results demonstrate the satisfactory performance of our proposed method.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"4 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140583850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Riccardo Parviero, Kristoffer H. Hellton, Geoffrey Canright, Ida Scheel
{"title":"STAR: Spread of innovations on graph structures with the Susceptible‐Tattler‐Adopter‐Removed model","authors":"Riccardo Parviero, Kristoffer H. Hellton, Geoffrey Canright, Ida Scheel","doi":"10.1002/sta4.671","DOIUrl":"https://doi.org/10.1002/sta4.671","url":null,"abstract":"Adoptions of a new innovation such as a product, service or idea are typically driven both by peer‐to‐peer social interactions and by external influence. Social graphs are usually used to efficiently model the peer‐to‐peer interactions, where new adopters influence their peers to also adopt the innovation. However, the influence to adopt may also spread through individuals close to the adopters, known as tattlers, who only share information regarding the innovation. We extend an inhomogeneous Poisson process model accounting for both external and peer‐to‐peer influence to include an optional tattling stage, and we term the extension the Susceptible‐Tattler‐Adopter‐Removed (STAR) model. In an extensive simulation study, the proposed model is shown to be stable and identifiable and to accurately identify tattling when present. Further, using simulations, we show that both inference and prediction of the STAR model are quite robust against missing edges in the social graph, a common situation in real‐world data. Simulations and theoretical considerations demonstrate that, when edges are missing, the STAR model is able to accurately estimate the shares attributed to the external and internal sources of influence. Furthermore, the STAR model may be used to improve the inference of the external and viral parameters and subsequent predictions even when tattling is not part of the real data‐generating mechanism.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"31 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140583938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Network alternating direction method of multipliers for ultrahigh‐dimensional decentralised federated learning","authors":"Wei Dong, Sanying Feng","doi":"10.1002/sta4.669","DOIUrl":"https://doi.org/10.1002/sta4.669","url":null,"abstract":"Ultrahigh‐dimensional data analysis has received great achievement in recent years. When the data are stored in multiple clients and the clients can be connected only with each other through a network structure, the implementation of ultrahigh‐dimensional analysis can be numerically challenging or even infeasible. In this work, we study decentralised federated learning for ultrahigh‐dimensional data analysis, where the parameters of interest are estimated via a large amount of devices without data sharing by a network structure. In the local machines, each parallel runs gradient ascent to obtain estimators via the sparsity‐restricted constrained methods. Also, we obtain a global model by aggregating each machine's information via an alternating direction method of multipliers (ADMM) using a concave pairwise fusion penalty between different machines through a network structure. The proposed method can mitigate privacy risks from traditional machine learning, recover the sparsity and provide estimates of all regression coefficients simultaneously. Under mild conditions, we show the convergence and estimation consistency of our method. The promising performance of the method is supported by both simulated and real data examples.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"9 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140583832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pablo Martínez‐Camblor, Sonia Pérez‐Fernández, Lucas L. Dwiel, Wilder T. Doucette
{"title":"Comparing the effectiveness of k$$ k $$‐different treatments through the area under the ROC curve","authors":"Pablo Martínez‐Camblor, Sonia Pérez‐Fernández, Lucas L. Dwiel, Wilder T. Doucette","doi":"10.1002/sta4.672","DOIUrl":"https://doi.org/10.1002/sta4.672","url":null,"abstract":"The area under the receiver‐operating characteristic curve (AUC) has become a popular index not only for measuring the overall prediction capacity of a marker but also the strength of the association between continuous and binary variables. In the current considered study, the AUC was used for comparing the association size of four different interventions involving impulsive decision making, studied through an animal model, in which each animal provides several negative (pretreatment) and positive (posttreatment) measures. The problem of the full comparison of the average AUCs arises therefore in a natural way. We construct an analysis of variance (ANOVA) type test for testing the equality of the impact of these treatments measured through the respective AUCs and considering the random‐effect represented by the animal. The use (and development) of a post hoc Tukey's HSD‐type test is also considered. We explore the finite‐sample behaviour of our proposal via Monte Carlo simulations, and analyse the data generated from the original problem. An R package implementing the procedures is provided in the supporting information.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"119 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140583847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the construction of nested orthogonal arrays with the adjacent numbers of levels","authors":"Shanqi Pang, Yan Zhu","doi":"10.1002/sta4.666","DOIUrl":"https://doi.org/10.1002/sta4.666","url":null,"abstract":"Nested orthogonal arrays (NOAs) provide an option for designing an experimental setup consisting of two experiments, with the expensive higher‐precision experiment nested within a larger and relatively inexpensive lower‐precision experiment. Construction of NOAs with the adjacent numbers of levels is a challenging problem. In this paper, we present several methods for constructing such NOAs and obtain some classes of such new symmetric NOAs in which the larger arrays have minimum run size. These methods are also extended to construction of NOAs with more than two layers. Furthermore, by adding some columns to these symmetric NOAs, we can construct a lot of new asymmetric NOAs. Illustrative examples are given.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"381 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140302361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Beta regression for double‐bounded response with correlated high‐dimensional covariates","authors":"Jianxuan Liu","doi":"10.1002/sta4.663","DOIUrl":"https://doi.org/10.1002/sta4.663","url":null,"abstract":"Continuous responses measured on a standard unit interval are ubiquitous in many scientific disciplines. Statistical models built upon a normal error structure do not generally work because they can produce biassed estimates or result in predictions outside either bound. In real‐life applications, data are often high‐dimensional, correlated and consist of a mixture of various data types. Little literature is available to address the unique data challenge. We propose a semiparametric approach to analyse the association between a double‐bounded response and high‐dimensional correlated covariates of mixed types. The proposed method makes full use of all available data through one or several linear combinations of the covariates without losing information from the data. The only assumption we make is that the response variable follows a Beta distribution; no additional assumption is required. The resulting estimators are consistent and efficient. We illustrate the proposed method in simulation studies and demonstrate it in a real‐life data application. The semiparametric approach contributes to the sufficient dimension reduction literature for its novelty in investigating double‐bounded response which is absent in the current literature. This work also provides a new tool for data practitioners to analyse the association between a popular unit interval response and mixed types of high‐dimensional correlated covariates.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"1 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140127092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alexander C. Murph, Justin D. Strait, Kelly R. Moran, Jeffrey D. Hyman, Philip H. Stauffer
{"title":"Visualisation and outlier detection for probability density function ensembles","authors":"Alexander C. Murph, Justin D. Strait, Kelly R. Moran, Jeffrey D. Hyman, Philip H. Stauffer","doi":"10.1002/sta4.662","DOIUrl":"https://doi.org/10.1002/sta4.662","url":null,"abstract":"Exploratory data analysis (EDA) for functional data—data objects where observations are entire functions—is a difficult problem that has seen significant attention in recent literature. This surge in interest is motivated by the ubiquitous nature of functional data, which are prevalent in applications across fields such as meteorology, biology, medicine and engineering. Empirical probability density functions (PDFs) can be viewed as constrained functional data objects that must integrate to one and be nonnegative. They show up in contexts such as yearly income distributions, zooplankton size structure in oceanography and in connectivity patterns in the brain, among others. While PDF data are certainly common in modern research, little attention has been given to EDA specifically for PDFs. In this paper, we extend several methods for EDA on functional data for PDFs and compare them on simulated data that exhibit different types of variation, designed to mimic that seen in real-world applications. We then use our new methods to perform EDA on the breakthrough curves observed in gas transport simulations for underground fracture networks.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"15 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140116940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimal designs for crossover model with partial interactions","authors":"Futao Zhang, Pierre Druilhet, Xiangshun Kong","doi":"10.1002/sta4.668","DOIUrl":"https://doi.org/10.1002/sta4.668","url":null,"abstract":"This paper studies the universally optimal designs for estimating total effects under crossover models with partial interactions. We provide necessary and sufficient conditions for a symmetric design to be universally optimal, based on which algorithms can be used to derive optimal symmetric designs under any form of the within-block covariance matrix. To cope with the computational complexity of algorithms when the experimental scale is too large, we provide the analytical form of optimal designs under the type-H covariance matrix. We find that for a fixed number of treatments, say <mjx-container aria-label=\"t\" ctxtmenu_counter=\"0\" ctxtmenu_oldtabindex=\"1\" jax=\"CHTML\" role=\"application\" sre-explorer- style=\"font-size: 103%; position: relative;\" tabindex=\"0\"><mjx-math aria-hidden=\"true\"><mjx-semantics><mjx-mrow><mjx-mi data-semantic-annotation=\"clearspeak:simple\" data-semantic-font=\"italic\" data-semantic- data-semantic-role=\"latinletter\" data-semantic-speech=\"t\" data-semantic-type=\"identifier\"><mjx-c></mjx-c></mjx-mi></mjx-mrow></mjx-semantics></mjx-math><mjx-assistive-mml aria-hidden=\"true\" display=\"inline\" unselectable=\"on\"><math altimg=\"/cms/asset/c3669d78-641d-4172-958e-37ddc1934825/sta4668-math-0001.png\" xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi data-semantic-=\"\" data-semantic-annotation=\"clearspeak:simple\" data-semantic-font=\"italic\" data-semantic-role=\"latinletter\" data-semantic-speech=\"t\" data-semantic-type=\"identifier\">t</mi></mrow>$$ t $$</annotation></semantics></math></mjx-assistive-mml></mjx-container>, the number of distinct treatments appearing in the support sequences increases with the increase of the number of periods, <mjx-container aria-label=\"k\" ctxtmenu_counter=\"1\" ctxtmenu_oldtabindex=\"1\" jax=\"CHTML\" role=\"application\" sre-explorer- style=\"font-size: 103%; position: relative;\" tabindex=\"0\"><mjx-math aria-hidden=\"true\"><mjx-semantics><mjx-mrow><mjx-mi data-semantic-annotation=\"clearspeak:simple\" data-semantic-font=\"italic\" data-semantic- data-semantic-role=\"latinletter\" data-semantic-speech=\"k\" data-semantic-type=\"identifier\"><mjx-c></mjx-c></mjx-mi></mjx-mrow></mjx-semantics></mjx-math><mjx-assistive-mml aria-hidden=\"true\" display=\"inline\" unselectable=\"on\"><math altimg=\"/cms/asset/c09a2ac1-1512-49b7-8baa-c3acf0ec7390/sta4668-math-0002.png\" xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi data-semantic-=\"\" data-semantic-annotation=\"clearspeak:simple\" data-semantic-font=\"italic\" data-semantic-role=\"latinletter\" data-semantic-speech=\"k\" data-semantic-type=\"identifier\">k</mi></mrow>$$ k $$</annotation></semantics></math></mjx-assistive-mml></mjx-container>, until <mjx-container aria-label=\"k greater than or equals t squared\" ctxtmenu_counter=\"2\" ctxtmenu_oldtabindex=\"1\" jax=\"CHTML\" role=\"application\" sre-explorer- style=\"font-size: 103%; position: relative;\" tabindex=\"0\"><mjx-math aria-hidden=\"true\"><mjx-semantics><mjx-mrow data-semantic-children=\"0,4\" data-semantic-content=\"1\" data-semanti","PeriodicalId":56159,"journal":{"name":"Stat","volume":"134 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140073012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Francesco Sanna Passino, Yining Che, Carlos Cardoso Correia Perello
{"title":"Graph-based mutually exciting point processes for modelling event times in docked bike-sharing systems","authors":"Francesco Sanna Passino, Yining Che, Carlos Cardoso Correia Perello","doi":"10.1002/sta4.660","DOIUrl":"https://doi.org/10.1002/sta4.660","url":null,"abstract":"This paper introduces graph-based mutually exciting processes (GB-MEP) to model event times in network point processes, focusing on an application to docked bike-sharing systems. GB-MEP incorporates known relationships between nodes in a graph within the intensity function of a node-based multivariate Hawkes process. This approach reduces the number of parameters to a quantity proportional to the number of nodes in the network, resulting in significant advantages for computational scalability when compared with traditional methods. The model is applied on event data observed on the Santander Cycles network in central London, demonstrating that exploiting network-wide information related to geographical location of the stations is beneficial to improve the performance of node-based models for applications in bike-sharing systems. The proposed GB-MEP framework is more generally applicable to any network point process where a distance function between nodes is available, demonstrating wider applicability.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"25 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140073017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}