{"title":"Model selection for generalized linear models with weak factors","authors":"Xin Zhou, Yan Dong, Qin Yu, Zemin Zheng","doi":"10.1002/sta4.697","DOIUrl":"https://doi.org/10.1002/sta4.697","url":null,"abstract":"The literature has witnessed an upsurge of interest in model selection in diverse fields and optimization applications. Despite the substantial progress, model selection remains a significant challenge when covariates are highly correlated, particularly within economic and financial datasets that exhibit cross‐sectional and serial dependency. In this paper, we introduce a novel methodology named factor augmented regularized model selection with weak factors (WeakFARM) for generalized linear models in the presence of correlated covariates with weak latent factor structure. By identifying weak latent factors and idiosyncratic components and employing them as predictors, WeakFARM converts the challenge from model selection with highly correlated covariates to that with weakly correlated ones. Furthermore, we develop a variable screening method based on the proposed WeakFARM method. Comprehensive theoretical guarantees including estimation consistency, model selection consistency and sure screening property are also provided. We demonstrate the effectiveness of our approach by extensive simulation studies and a real data application in economic forecasting.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"48 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141196624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mara Rojeski Blake, Emily Griffith, Steven J. Pierce, Rachel Levy, Micaela Parker, Marianne Huebner
{"title":"Tell your story: Metrics of success for academic data science collaboration and consulting programs","authors":"Mara Rojeski Blake, Emily Griffith, Steven J. Pierce, Rachel Levy, Micaela Parker, Marianne Huebner","doi":"10.1002/sta4.686","DOIUrl":"https://doi.org/10.1002/sta4.686","url":null,"abstract":"Measuring success plays a central role in justifying and advocating for a statistical or data science consulting or collaboration program (SDSP) within an academic institution. We present several specific metrics to report to targeted audiences to tell the story for success of a robust and sustainable program. While gathering such metrics includes challenges, we discuss potential data sources and possible practices for SDSPs to inform their own approaches. Emphasizing essential metrics for reporting, we also share the metric gathering and reporting practices of two programs in greater detail. New or existing SDSPs should evaluate their local environments and tailor their practice to gathering, analysing and reporting success metrics accordingly. This approach provides a strong foundation to use success metrics to tell compelling stories about the SDSP and enhance program sustainability. The area of success metrics provides ample opportunity for future research projects that leverage qualitative methods and consider mechanisms for adapting to the changing landscape of data science.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"42 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141196627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alberto Cabezas, Marco Battiston, Christopher Nemeth
{"title":"Robust Bayesian nonparametric variable selection for linear regression","authors":"Alberto Cabezas, Marco Battiston, Christopher Nemeth","doi":"10.1002/sta4.696","DOIUrl":"https://doi.org/10.1002/sta4.696","url":null,"abstract":"Spike‐and‐slab and horseshoe regressions are arguably the most popular Bayesian variable selection approaches for linear regression models. However, their performance can deteriorate if outliers and heteroskedasticity are present in the data, which are common features in many real‐world statistics and machine learning applications. This work proposes a Bayesian nonparametric approach to linear regression that performs variable selection while accounting for outliers and heteroskedasticity. Our proposed model is an instance of a Dirichlet process scale mixture model with the advantage that we can derive the full conditional distributions of all parameters in closed‐form, hence producing an efficient Gibbs sampler for posterior inference. Moreover, we present how to extend the model to account for heavy‐tailed response variables. The model's performance is tested against competing algorithms on synthetic and real‐world datasets.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"47 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141196648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ryan A. Peterson, Emily Slade, Gina‐Maria Pomann, Walter T. Ambrosius
{"title":"Working well with statisticians: Perceptions of practicing statisticians on their most successful collaborations","authors":"Ryan A. Peterson, Emily Slade, Gina‐Maria Pomann, Walter T. Ambrosius","doi":"10.1002/sta4.694","DOIUrl":"https://doi.org/10.1002/sta4.694","url":null,"abstract":"Statistical collaboration requires statisticians to work and communicate effectively with nonstatisticians, which can be challenging for many reasons. To identify common themes and lessons for working smoothly with nonstatistician collaborators, two focus groups of primarily academic collaborative statisticians were held. We identified qualities of collaborations that tend to yield fruitful relationships and those that tend to yield nothing (or worse, with one or both parties being dissatisfied). The initial goal was to share helpful knowledge and individual experiences that can facilitate more successful collaborative relationships for statisticians who work within academic statistical collaboration units. These findings were used to design a follow‐up survey to collect perspectives from a wider set of practicing statisticians on important qualities to consider when assessing potential collaborations. In this survey of practicing statisticians, we found widespread agreement on many good and bad qualities to promote and discourage, respectively. Interestingly, some negative and positive collaboration qualities were less agreed upon, suggesting that in such cases, a mix‐and‐match approach of domain experts to statisticians could alleviate friction and statistician burnout in team science settings. The perceived importance of some collaboration characteristics differed between faculty and staff, while others depended on experience.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"51 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141167300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christina Maimone, Julia L. Sharp, Ofira Schwartz‐Soicher, Jeffrey C. Oliver, Lencia Beltran
{"title":"Do good: Strategies for leading an inclusive data science or statistics consulting team","authors":"Christina Maimone, Julia L. Sharp, Ofira Schwartz‐Soicher, Jeffrey C. Oliver, Lencia Beltran","doi":"10.1002/sta4.687","DOIUrl":"https://doi.org/10.1002/sta4.687","url":null,"abstract":"Leading a data science or statistical consulting team in an academic environment can have many challenges, including institutional infrastructure, funding and technical expertise. Even in the most challenging environment, however, leading such a team with inclusive practices can be rewarding for the leader, the team members and collaborators. We describe nine leadership and management practices that are especially relevant to the dynamics of data science or statistics consulting teams and an academic environment: ensuring people get credit, making tacit knowledge explicit, establishing clear performance review processes, championing career development, empowering team members to work autonomously, learning from diverse experiences, supporting team members in navigating power dynamics, having difficult conversations and developing foundational management skills. Active engagement in these areas will help those who lead data science or statistics consulting groups – whether faculty or staff, regardless of title – create and support inclusive teams.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"24 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140936203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"High‐dimensional differential networks with sparsity and reduced‐rank","authors":"Yao Wang, Cheng Wang, Binyan Jiang","doi":"10.1002/sta4.690","DOIUrl":"https://doi.org/10.1002/sta4.690","url":null,"abstract":"Differential network analysis plays a crucial role in capturing nuanced changes in conditional correlations between two samples. Under the high‐dimensional setting, the differential network, that is, the difference between the two precision matrices are usually stylized with sparse signals and some low‐rank latent factors. Recognizing the distinctions inherent in the precision matrices of such networks, we introduce a novel approach, termed ‘SR‐Network’ for the estimation of sparse and reduced‐rank differential networks. This method directly assesses the differential network by formulating a convex empirical loss function with ‐norm and nuclear norm penalties. The study establishes finite‐sample error bounds for parameter estimation and highlights the superior performance of the proposed method through extensive simulations and real data studies. This research significantly contributes to the advancement of methodologies for accurate analysis of differential networks, particularly in the context of structures characterized by sparsity and low‐rank features.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"218 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140941729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Variational inference for the latent shrinkage position model","authors":"Xian Yao Gwee, Isobel Claire Gormley, Michael Fop","doi":"10.1002/sta4.685","DOIUrl":"https://doi.org/10.1002/sta4.685","url":null,"abstract":"The latent position model (LPM) is a popular method used in network data analysis where nodes are assumed to be positioned in a ‐dimensional latent space. The latent shrinkage position model (LSPM) is an extension of the LPM which automatically determines the number of effective dimensions of the latent space via a Bayesian nonparametric shrinkage prior. However, the LSPM's reliance on Markov chain Monte Carlo for inference, while rigorous, is computationally expensive, making it challenging to scale to networks with large numbers of nodes. We introduce a variational inference approach for the LSPM, aiming to reduce computational demands while retaining the model's ability to intrinsically determine the number of effective latent dimensions. The performance of the variational LSPM is illustrated through simulation studies and its application to real‐world network data. To promote wider adoption and ease of implementation, we also provide open‐source code.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"5 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140936205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alyssa Platt, Tracy Truong, Mary Boulos, Nichole E. Carlson, Manisha Desai, Monica M. Elam, Emily Slade, Alexandra L. Hanlon, Jillian H. Hurst, Maren K. Olsen, Laila M. Poisson, Lacey Rende, Gina‐Maria Pomann
{"title":"A guide to successful management of collaborative partnerships in quantitative research: An illustration of the science of team science","authors":"Alyssa Platt, Tracy Truong, Mary Boulos, Nichole E. Carlson, Manisha Desai, Monica M. Elam, Emily Slade, Alexandra L. Hanlon, Jillian H. Hurst, Maren K. Olsen, Laila M. Poisson, Lacey Rende, Gina‐Maria Pomann","doi":"10.1002/sta4.674","DOIUrl":"https://doi.org/10.1002/sta4.674","url":null,"abstract":"Data‐intensive research continues to expand with the goal of improving healthcare delivery, clinical decision‐making, and patient outcomes. Quantitative scientists, such as biostatisticians, epidemiologists, and informaticists, are tasked with turning data into health knowledge. In academic health centres, quantitative scientists are critical to the missions of biomedical discovery and improvement of health. Many academic health centres have developed centralized Quantitative Science Units which foster dual goals of professional development of quantitative scientists and producing high quality, reproducible domain research. Such units then develop teams of quantitative scientists who can collaborate with researchers. However, existing literature does not provide guidance on how such teams are formed or how to manage and sustain them. Leaders of Quantitative Science Units across six institutions formed a working group to examine common practices and tools that can serve as best practices for Quantitative Science Units that wish to achieve these dual goals through building long‐term partnerships with researchers. The results of this working group are presented to provide tools and guidance for Quantitative Science Units challenged with developing, managing, and evaluating Quantitative Science Teams. This guidance aims to help Quantitative Science Units effectively participate in and enhance the research that is conducted throughout the academic health centre—shaping their resources to fit evolving research needs.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"24 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140936210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An optimal exact interval for the risk ratio in the 2×2$$ 2times 2 $$ table with structural zero","authors":"Weizhen Wang, Xingyun Cao, Tianfa Xie","doi":"10.1002/sta4.681","DOIUrl":"https://doi.org/10.1002/sta4.681","url":null,"abstract":"The table with a structural zero represents a common scenario in clinical trials and epidemiology, characterized by a specific empty cell. In such cases, the risk ratio serves as a vital parameter for statistical inference. However, existing confidence intervals, such as those constructed through the score test and Bayesian methods, fail to achieve the prescribed nominal level. Our focus is on numerically constructing exact confidence intervals for the risk ratio. We achieve this by optimally combining the modified inferential model method and the ‐function method. The resulting interval is then compared with intervals generated by four existing methods: the score method, the exact score method, the Bayesian tailed‐based method and the inferential model method. This comparison is conducted based on the infimum coverage probability, average interval length and non‐coverage probability criteria. Remarkably, our proposed interval outperforms other exact intervals, being notably shorter. To illustrate the effectiveness of our approach, we discuss two examples in detail.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"9 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140936202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On quantifying heterogeneous treatment effects with regression‐based individualized treatment rules: Loss function families and bounds on estimation error","authors":"Michael T. Gorczyca, Chaeryon Kang","doi":"10.1002/sta4.680","DOIUrl":"https://doi.org/10.1002/sta4.680","url":null,"abstract":"SummaryHeterogeneity in response to treatment is a pervasive problem in medicine. Many researchers have proposed individualized treatment rule methods for this problem, which personalize treatment recommendations based on an individual's recorded covariates. A challenge with using these methods in practice is that they determine a treatment rule, rather than quantify treatment benefit. This can be problematic, as a recommended treatment could be burdensome and have negligible improvements in outcome for some individuals. With the aim of helping practitioners make informed modelling choices, we identify two families of loss functions to use with individualized treatment rule methods. Under the assumption of correct model specification, estimation with a loss function from one family ensures that the model's treatment recommendations can be interpreted in terms of the risk difference, while the other family of loss functions ensures that the model's treatment recommendations can be interpreted in terms of the risk ratio. We also derive two upper bounds for a model's error in risk difference and risk ratio estimation. Each upper bound can be calculated using observed data and can provide insight to practitioners regarding model error in estimating treatment effects. We illustrate our contributions with simulation studies as well as with data from the ACTG‐175 AIDS study.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"357 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140942412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}