The New England Journal of Statistics in Data Science最新文献

筛选
英文 中文
Subdata Selection With a Large Number of Variables 具有大量变量的子数据选择
The New England Journal of Statistics in Data Science Pub Date : 2023-01-01 DOI: 10.51387/23-nejsds36
Rakhi Singh, J. Stufken
{"title":"Subdata Selection With a Large Number of Variables","authors":"Rakhi Singh, J. Stufken","doi":"10.51387/23-nejsds36","DOIUrl":"https://doi.org/10.51387/23-nejsds36","url":null,"abstract":"Subdata selection from big data is an active area of research that facilitates inferences based on big data with limited computational expense. For linear regression models, the optimal design-inspired Information-Based Optimal Subdata Selection (IBOSS) method is a computationally efficient method for selecting subdata that has excellent statistical properties. But the method can only be used if the subdata size, k, is at last twice the number of regression variables, p. In addition, even when $kge 2p$, under the assumption of effect sparsity, one can expect to obtain subdata with better statistical properties by trying to focus on active variables. Inspired by recent efforts to extend the IBOSS method to situations with a large number of variables p, we introduce a method called Combining Lasso And Subdata Selection (CLASS) that, as shown, improves on other proposed methods in terms of variable selection and building a predictive model based on subdata when the full data size n is very large and the number of variables p is large. In terms of computational expense, CLASS is more expensive than recent competitors for moderately large values of n, but the roles reverse under effect sparsity for extremely large values of n.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"36 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83546858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Simultaneous False-Decision Error Rates in Master Protocols with Shared Control: False Discovery Rate Perspective 具有共享控制的主协议中的同时错误决策错误率:错误发现率的观点
The New England Journal of Statistics in Data Science Pub Date : 2023-01-01 DOI: 10.51387/23-nejsds28
Jingjing Ye, X. Li, Cheng Lu, William Wang
{"title":"Simultaneous False-Decision Error Rates in Master Protocols with Shared Control: False Discovery Rate Perspective","authors":"Jingjing Ye, X. Li, Cheng Lu, William Wang","doi":"10.51387/23-nejsds28","DOIUrl":"https://doi.org/10.51387/23-nejsds28","url":null,"abstract":"Master protocol is a type of trial designs where multiple therapies and/or multiple disease populations can be investigated in the same trial. A shared control can be used for multiple therapies to gain operational efficiency and gain attraction to patients. To balance between controlling for false positive rate and having adequate power for detecting true signals, the impact of False Discovery Rate (FDR) is evaluated when multiple investigational drugs are studied in the master protocol. With the shared control group, the “random high” or “random low” in the control group can potentially impact all hypotheses testing that compare each of the test regimens and the control group in terms of probability of having at least one positive hypothesis outcome, or multiple positive outcomes. When regulatory agencies make the decision of approving or declining one or more regimens based on the master protocol design, this introduces a different type of error: simultaneous false-decision error. In this manuscript, we examine in detail the derivations and properties of the simultaneous false-decision error in the master protocol with shared control under the framework of FDR. The simultaneous false-decision error consists of two parts: simultaneous false-discovery rate (SFDR) and simultaneous false non-discovery rate (SFNR). Based on our analytical evaluation and simulations, the magnitude of SFDR and SFNR inflation is small. Therefore, the multiple error rate controls are generally adequate, further adjustment to a pre-specified level on SFDR or SFNR or reduce the alpha allocated to each individual treatment comparison to the shared control is deemed unnecessary.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"22 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78815371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sparse Estimation in Finite Mixture of Accelerated Failure Time and Mixture of Regression Models with R Package fmrs 加速失效时间有限混合与R包fmrs混合回归模型稀疏估计
The New England Journal of Statistics in Data Science Pub Date : 2023-01-01 DOI: 10.51387/23-nejsds49
Farhad Shokoohi
{"title":"Sparse Estimation in Finite Mixture of Accelerated Failure Time and Mixture of Regression Models with R Package fmrs","authors":"Farhad Shokoohi","doi":"10.51387/23-nejsds49","DOIUrl":"https://doi.org/10.51387/23-nejsds49","url":null,"abstract":"Variable selection in large-dimensional data has been extensively studied in different settings over the past decades. In a recent article, Shokoohi et. al. [29, DOI:10.1214/18-AOAS1198] proposed a method for variable selection in finite mixture of accelerated failure time regression models for studies on time-to-event data to capture heterogeneity within the population and account for censoring. In this paper, we introduce the fmrs package, which implements the variable selection methodology for such models. Furthermore, as a byproduct, the fmrs package facilitates variable selection in finite mixture regression models. The package also incorporates a tuning parameter selection mechanism based on component-wise bic. Commonly used penalties, such as Least Absolute Shrinkage and Selection Operator, and Smoothly Clipped Absolute Deviation, are integrated into fmrs. Additionally, the package offers an option for non-mixture regression models. The C language is chosen to boost the optimization speed. We provide an overview of the fmrs principles and the strategies employed for optimization. Hands-on illustrations are presented to help users get acquainted with fmrs. Finally, we apply fmrs to a lung cancer dataset and observe that a two-component mixture model reveals a subgroup with a more aggressive form of the disease, displaying a lower survival time.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135106538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Construction of Supersaturated Designs with Small Coherence for Variable Selection 小相干变量选择过饱和设计的构造
The New England Journal of Statistics in Data Science Pub Date : 2023-01-01 DOI: 10.51387/23-nejsds34
Youran Qi, Peter Chien
{"title":"Construction of Supersaturated Designs with Small Coherence for Variable Selection","authors":"Youran Qi, Peter Chien","doi":"10.51387/23-nejsds34","DOIUrl":"https://doi.org/10.51387/23-nejsds34","url":null,"abstract":"The supersaturated design is often used to discover important factors in an experiment with a large number of factors and a small number of runs. We propose a method for constructing supersaturated designs with small coherence. Such designs are useful for variable selection methods such as the Lasso. Examples are provided to illustrate the proposed method.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72865277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Nature-inspired Metaheuristics for finding Optimal Designs for the Continuation-Ratio Models 寻找连续比模型最优设计的自然启发元启发式方法
The New England Journal of Statistics in Data Science Pub Date : 2023-01-01 DOI: 10.51387/23-nejsds44
Jiaheng Qiu, W. Wong
{"title":"Nature-inspired Metaheuristics for finding Optimal Designs for the Continuation-Ratio Models","authors":"Jiaheng Qiu, W. Wong","doi":"10.51387/23-nejsds44","DOIUrl":"https://doi.org/10.51387/23-nejsds44","url":null,"abstract":"The continuation-ratio (CR) model is frequently used in dose response studies to model a three-category outcome as the dose levels vary. Design issues for a CR model defined on an unrestricted dose interval have been discussed for estimating model parameters or a selected function of the model parameters. This paper uses metaheuristics to address design issues for a CR model defined on any compact dose interval when there are one or more objectives in the study and some are more important than others. Specifically, we use an exemplary nature-inspired metaheuristic algorithm called particle swarm optimization (PSO) to find locally optimal designs for estimating a few interesting functions of the model parameters, such as the most effective dose ($MED$), the maximum tolerated dose ($MTD$) and for estimating all parameters in a CR model. We demonstrate that PSO can efficiently find locally multiple-objective optimal designs for a CR model on various dose intervals and a small simulation study shows it tends to outperform the popular deterministic cocktail algorithm (CA) and another competitive metaheuristic algorithm called differential evolutionary (DE). We also discuss hybrid algorithms and their flexible applications to design early Phase 2 trials or tackle biomedical problems, such as different strategies for handling the recent pandemic.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88761524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Discussion of: Four Types of Frequentism and Their Interplay with Bayesianism, by J. Berger 讨论:四种类型的频率主义及其与贝叶斯主义的相互作用,J. Berger
The New England Journal of Statistics in Data Science Pub Date : 2023-01-01 DOI: 10.51387/23-nejsds4c
Judith Rousseau
{"title":"Discussion of: Four Types of Frequentism and Their Interplay with Bayesianism, by J. Berger","authors":"Judith Rousseau","doi":"10.51387/23-nejsds4c","DOIUrl":"https://doi.org/10.51387/23-nejsds4c","url":null,"abstract":"","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"224 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135784049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Indeterminate Data and Handling for Assessing Diagnostic Performance in Imaging Drug Developments 成像药物开发中评估诊断性能的不确定数据和处理
The New England Journal of Statistics in Data Science Pub Date : 2023-01-01 DOI: 10.51387/23-nejsds46
Sue-Jane Wang
{"title":"Indeterminate Data and Handling for Assessing Diagnostic Performance in Imaging Drug Developments","authors":"Sue-Jane Wang","doi":"10.51387/23-nejsds46","DOIUrl":"https://doi.org/10.51387/23-nejsds46","url":null,"abstract":"In diagnostic imaging drug developments, the imaging scan read data in controlled imaging drug clinical trials includes test positive and test negative. Broadly speaking, the standard of reference data are either presence or absence of a disease or clinical condition. Together, these data are used to assess the diagnostic performance of an investigational imaging drug in a controlled imaging drug clinical trial. For those imaging scan read data that cannot be called positive/negative, the “indeterminate” category is commonly used to cover imaging results that may be considered intermediate, indeterminate, or uninterpretable. Similarly, for those standard of reference data that cannot be categorized into presence/absence including uncollected or unavailable reference standard data, the “indeterminate” category may be used. Historically, little attention has been paid to the indeterminate imaging scan read data as they are generally rare or considered irrelevant though they are related to scanned subjects and can be informative. Subjects lack the standard of reference are simply excluded as such the study only reports the analysis results in subjects with available standard of reference data, known as completer analysis, similar to evaluable subjects seen in controlled trials for drug developments. To improve diagnostic clinical trial planning, this paper introduces five attributes of an estimand in diagnostic imaging drug clinical trials. The paper then defines the indeterminate data mechanisms and gives examples for each indeterminate mechanism that is specific to the clinical context of a diagnostic imaging drug clinical trial. Several imputation approaches to handling indeterminate data are discussed. Depending on the clinical question of primary interests, indeterminate data may be intercurrent events. The paper ends with discussions on imputations of intercurrent events occurring in indeterminate imaging scan read data and those occurring in indeterminate standard of reference data when encountered in diagnostic imaging clinical trials and provides points to consider of estimands for diagnostic imaging drug developments.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"21 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74619025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
General Additive Network Effect Models 一般可加性网络效应模型
The New England Journal of Statistics in Data Science Pub Date : 2023-01-01 DOI: 10.51387/23-nejsds29
Trang Bui, Stefan H. Steiner, Nathaniel T. Stevens
{"title":"General Additive Network Effect Models","authors":"Trang Bui, Stefan H. Steiner, Nathaniel T. Stevens","doi":"10.51387/23-nejsds29","DOIUrl":"https://doi.org/10.51387/23-nejsds29","url":null,"abstract":"In the interest of business innovation, social network companies often carry out experiments to test product changes and new ideas. In such experiments, users are typically assigned to one of two experimental conditions with some outcome of interest observed and compared. In this setting, the outcome of one user may be influenced by not only the condition to which they are assigned but also the conditions of other users via their network connections. This challenges classical experimental design and analysis methodologies and requires specialized methods. We introduce the general additive network effect (GANE) model, which encompasses many existing outcome models in the literature under a unified model-based framework. The model is both interpretable and flexible in modeling the treatment effect as well as the network influence. We show that (quasi) maximum likelihood estimators are consistent and asymptotically normal for a family of model specifications. Quantities of interest such as the global treatment effect are defined and expressed as functions of the GANE model parameters, and hence inference can be carried out using likelihood theory. We further propose the “power-degree” (POW-DEG) specification of the GANE model. The performance of POW-DEG and other specifications of the GANE model are investigated via simulations. Under model misspecification, the POW-DEG specification appears to work well. Finally, we study the characteristics of good experimental designs for the POW-DEG specification. We find that graph-cluster randomization and balanced designs are not necessarily optimal for precise estimation of the global treatment effect, indicating the need for alternative design strategies.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91170366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Invited Discussion of J.O. Berger: Four Types of Frequentism and Their Interplay with Bayesianism 邀请讨论J.O. Berger:四种类型的频率主义及其与贝叶斯主义的相互作用
The New England Journal of Statistics in Data Science Pub Date : 2023-01-01 DOI: 10.51387/23-nejsds4b
L. Pericchi
{"title":"Invited Discussion of J.O. Berger: Four Types of Frequentism and Their Interplay with Bayesianism","authors":"L. Pericchi","doi":"10.51387/23-nejsds4b","DOIUrl":"https://doi.org/10.51387/23-nejsds4b","url":null,"abstract":"One of the merits of this far reaching article is to show that not all “Frequentisms” are equal. Furthermore that there are frequentist approaches which are compelling scientifically, notably the “Empirical Frequentist” (EP), which can be paraphrased as “The proof of the pudding is in the eating”. Somewhat surprisingly to some (but anticipated in Wald’s admissibility Theorems in Decision Theory), is the conclusion that the easiest and best way to achieve the EP property is through Bayesian reasoning, perhaps more exactly, through Objective Bayesian reasoning. (I am avoiding the expression Empirical Bayesian reasoning which would be appropriate if it wasn’t associated with a very particular group of methods. It is argued below that a better name would be “Bayes Empirical”) I concentrate on Hypothesis Testing since that is the most challenging area of deeper disagreement among schools. From this substantive classification of Frequentisms, emerges the opportunity for a convergence, which is even more satisfying than a compromise, between schools. This may only be fully achieved if the prior probabilities are known, which is not usually the case. However, particularly in Hypothesis Testing, prior probabilities can and should be estimated and its uncertainty acknowledged in a Bayesian way. This may be termed perhaps, Bayes Empirical: The systematic empirical study of Prior Possibilities based on relevant data, acknowledging its uncertainty.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"12 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86015080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Gamma-Minimax Wavelet Shrinkage for Signals with Low SNR 低信噪比信号的Gamma-Minimax小波收缩
The New England Journal of Statistics in Data Science Pub Date : 2023-01-01 DOI: 10.51387/23-nejsds43
Dixon Vimalajeewa, A. Dasgupta, F. Ruggeri, B. Vidakovic
{"title":"Gamma-Minimax Wavelet Shrinkage for Signals with Low SNR","authors":"Dixon Vimalajeewa, A. Dasgupta, F. Ruggeri, B. Vidakovic","doi":"10.51387/23-nejsds43","DOIUrl":"https://doi.org/10.51387/23-nejsds43","url":null,"abstract":"In this paper, we propose a method for wavelet denoising of signals contaminated with Gaussian noise when prior information about the ${L^{2}}$-energy of the signal is available. Assuming the independence model, according to which the wavelet coefficients are treated individually, we propose simple, level-dependent shrinkage rules that turn out to be Γ-minimax for a suitable class of priors. The proposed methodology is particularly well suited in denoising tasks when the signal-to-noise ratio is low, which is illustrated by simulations on a battery of some standard test functions. Comparison to some commonly used wavelet shrinkage methods is provided.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"36 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75513278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信