{"title":"Comments on Xiao-Li Meng’s Double Your Variance, Dirtify Your Bayes, Devour Your Pufferfish, and Draw Your Kidstogram","authors":"D. Lin","doi":"10.51387/23-nejsds6e","DOIUrl":"https://doi.org/10.51387/23-nejsds6e","url":null,"abstract":"","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"633 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78985207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Four Types of Frequentism and Their Interplay with Bayesianism","authors":"James O. Berger","doi":"10.51387/22-nejsds4","DOIUrl":"https://doi.org/10.51387/22-nejsds4","url":null,"abstract":"","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"20 6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83470093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Double Your Variance, Dirtify Your Bayes, Devour Your Pufferfish, and Draw your Kidstrogram","authors":"X. Meng","doi":"10.51387/22-nejsds6","DOIUrl":"https://doi.org/10.51387/22-nejsds6","url":null,"abstract":"This article expands upon my presentation to the panel on “The Radical Prescription for Change” at the 2017 ASA (American Statistical Association) symposium on A World Beyond $p<0.05$. It emphasizes that, to greatly enhance the reliability of—and hence public trust in—statistical and data scientific findings, we need to take a holistic approach. We need to lead by example, incentivize study quality, and inoculate future generations with profound appreciations for the world of uncertainty and the uncertainty world. The four “radical” proposals in the title—with all their inherent defects and trade-offs—are designed to provoke reactions and actions. First, research methodologies are trustworthy only if they deliver what they promise, even if this means that they have to be overly protective, a necessary trade-off for practicing quality-guaranteed statistics. This guiding principle may compel us to doubling variance in some situations, a strategy that also coincides with the call to raise the bar from $p<0.05$ to $p<0.005$ [3]. Second, teaching principled practicality or corner-cutting is a promising strategy to enhance the scientific community’s as well as the general public’s ability to spot—and hence to deter—flawed arguments or findings. A remarkable quick-and-dirty Bayes formula for rare events, which simply divides the prevalence by the sum of the prevalence and the false positive rate (or the total error rate), as featured by the popular radio show Car Talk, illustrates the effectiveness of this strategy. Third, it should be a routine mental exercise to put ourselves in the shoes of those who would be affected by our research finding, in order to combat the tendency of rushing to conclusions or overstating confidence in our findings. A pufferfish/selfish test can serve as an effective reminder, and can help to institute the mantra “Thou shalt not sell what thou refuseth to buy” as the most basic professional decency. Considering personal stakes in our statistical endeavors also points to the concept of behavioral statistics, in the spirit of behavioral economics. Fourth, the current mathematical education paradigm that puts “deterministic first, stochastic second” is likely responsible for the general difficulties with reasoning under uncertainty, a situation that can be improved by introducing the concept of histogram, or rather kidstogram, as early as the concept of counting.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"17 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84393771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comment on “Double Your Variance, Dirtify Your Bayes, Devour Your Pufferfish, and Draw Your Kidstogram,” by Xiao-Li Meng","authors":"E. Kolaczyk","doi":"10.51387/22-nejsds6c","DOIUrl":"https://doi.org/10.51387/22-nejsds6c","url":null,"abstract":"","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"16 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78069420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comment on “Double Your Variance, Dirtify Your Bayes, Devour Your Pufferfish, and Draw your Kidstogram” by Xiao-Li Meng","authors":"C. Franklin","doi":"10.51387/22-nejsds6d","DOIUrl":"https://doi.org/10.51387/22-nejsds6d","url":null,"abstract":"","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"213 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79265392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Radical and Not-So-Radical Principles and Practices: Discussion of Meng","authors":"R. Wasserstein, A. Schirm, N. Lazar","doi":"10.51387/22-nejsds6a","DOIUrl":"https://doi.org/10.51387/22-nejsds6a","url":null,"abstract":"We highlight points of agreement between Meng’s suggested principles and those proposed in our 2019 editorial in The American Statistician. We also discuss some questions that arise in the application of Meng’s principles in practice.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82811017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Total i3+3 (Ti3+3) Design for Assessing Multiple Types and Grades of Toxicity in Phase I Trials","authors":"Meizi Liu, Yuan Ji, Ji Lin","doi":"10.51387/22-nejsds7","DOIUrl":"https://doi.org/10.51387/22-nejsds7","url":null,"abstract":"Phase I trials investigate the toxicity profile of a new treatment and identify the maximum tolerated dose for further evaluation. Most phase I trials use a binary dose-limiting toxicity endpoint to summarize the toxicity profile of a dose. In reality, reported toxicity information is much more abundant, including various types and grades of adverse events. Building upon the i3+3 design (Liu et al., 2020), we propose the Ti3+3 design, in which the letter “T” represents “total” toxicity. The proposed design takes into account multiple toxicity types and grades by computing the toxicity burden at each dose. The Ti3+3 design aims to achieve desirable operating characteristics using a simple statistics framework that utilizes“toxicity burden interval” (TBI). Simulation results show that Ti3+3 demonstrates comparable performance with existing more complex designs.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"7 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83280352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
V. Edefonti, R. De Vito, M. Parpinel, M. Ferraroni
{"title":"Dietary Patterns and Cancer Risk: An Overview with Focus on Methods","authors":"V. Edefonti, R. De Vito, M. Parpinel, M. Ferraroni","doi":"10.51387/23-nejsds35","DOIUrl":"https://doi.org/10.51387/23-nejsds35","url":null,"abstract":"Traditionally, research in nutritional epidemiology has focused on specific foods/food groups or single nutrients in their relation with disease outcomes, including cancer. Dietary pattern analysis have been introduced to examine potential cumulative and interactive effects of individual dietary components of the overall diet, in which foods are consumed in combination. Dietary patterns can be identified by using evidence-based investigator-defined approaches or by using data-driven approaches, which rely on either response independent (also named “a posteriori” dietary patterns) or response dependent (also named “mixed-type” dietary patterns) multivariate statistical methods. Within the open methodological challenges related to study design, dietary assessment, identification of dietary patterns, confounding phenomena, and cancer risk assessment, the current paper provides an updated landscape review of novel methodological developments in the statistical analysis of a posteriori/mixed-type dietary patterns and cancer risk. The review starts from standard a posteriori dietary patterns from principal component, factor, and cluster analyses, including mixture models, and examines mixed-type dietary patterns from reduced rank regression, partial least squares, classification and regression tree analysis, and least absolute shrinkage and selection operator. Novel statistical approaches reviewed include Bayesian factor analysis with modeling of sparsity through shrinkage and sparse priors and frequentist focused principal component analysis. Most novelties relate to the reproducibility of dietary patterns across studies where potentialities of the Bayesian approach to factor and cluster analysis work at best.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"25 3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89607622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Some Noteworthy Issues in Joint Species Distribution Modeling for Plant Data","authors":"A. Gelfand","doi":"10.51387/22-nejsds11","DOIUrl":"https://doi.org/10.51387/22-nejsds11","url":null,"abstract":"Joint species distribution modeling is attracting increasing attention in the literature these days, recognizing the fact that single species modeling fails to take into account expected dependence/interaction between species. This short paper offers discussion that attempts to illuminate five noteworthy technical issues associated with such modeling in the context of plant data. In this setting, the joint species distribution work in the literature considers several types of species data collection. For convenience of discussion, we focus on joint modeling of presence/absence data. For such data, the primary modeling strategy has been through introduction of latent multivariate normal random variables. These issues address the following: (i) how the observed presence/absence data is linked to the latent normal variables as well as the resulting implications with regard to modeling the data sites as independent or spatially dependent, (ii) the incompatibility of point referenced and areal referenced presence/absence data in spatial modeling of species distribution, (iii) the effect of modeling species independently/marginally rather than jointly within site, with regard to assessing species distribution, (iv) the interpretation of species dependence under the use of latent multivariate normal specification, and (v) the interpretation of clustering of species associated with specific joint species distribution modeling specifications. It is hoped that, by attempting to clarify these issues, ecological modelers and quantitative ecologists will be able to better appreciate some subtleties that are implicit in this growing collection of modeling ideas. In this regard, this paper can serve as a useful companion piece to the recent survey/comparison article by [33] in Methods in Ecology and Evolution.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"60 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83766303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Optimal Two-Period Multiarm Platform Design with New Experimental Arms Added During the Trial","authors":"H. Pan, Xiaomeng Yuan, Jingjing Ye","doi":"10.51387/22-nejsds15","DOIUrl":"https://doi.org/10.51387/22-nejsds15","url":null,"abstract":"Platform trials are multiarm clinical studies that allow the addition of new experimental arms after the activation of the trial. Statistical issues concerning “adding new arms”, however, have not been thoroughly discussed. This work was motivated by a “two-period” pediatric osteosarcoma study, starting with two experimental arms and one control arm and later adding two more pre-planned experimental arms. The common control arm will be shared among experimental arms across the trial. In this paper, we provide a principled approach, including how to modify the critical boundaries to control the family-wise error rate as new arms are added, how to re-estimate the sample sizes and provide the optimal control-to-experimental arms allocation ratio, in terms of minimizing the total sample size to achieve a desirable marginal power level. We examined the influence of the timing of adding new arms on the design’s operating characteristics, which provides a practical guide for deciding the timing. Other various numerical evaluations have also been conducted. A method for controlling the pair-wise error rate (PWER) has also been developed. We have published an R package, PlatformDesign, on CRAN for practitioners to easily implement this platform trial approach. A detailed step-by-step tutorial is provided in Appendix A.2.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"94 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83899326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}