Joddy Marchesoni, Kate Freeman, Alp Tezbasaran, Shannon W. Ricci
{"title":"The student staffing advantage: Data Science Consulting Service at NC State University Libraries","authors":"Joddy Marchesoni, Kate Freeman, Alp Tezbasaran, Shannon W. Ricci","doi":"10.1002/sta4.702","DOIUrl":"https://doi.org/10.1002/sta4.702","url":null,"abstract":"The primarily peer‐to‐peer, graduate student‐staffed Data Science Consulting Service at NC State University Libraries, within the Data & Visualization Services (DVS) department and collaborating closely with the Data Science Academy (DSA), has established a sustainable service and staffing model focused on providing broad data science analytic support to researchers across the university community. The service addresses the needs of university researchers who possess domain knowledge in their fields of study but a skills gap in the data science competencies required for research. The literature shows that it has been difficult for libraries to cover these needs with existing staffing models. Few universities follow the model practiced at NC State University, so a scan of the current landscape of data science consulting at universities across the country was performed to establish context. The support model and its advantages are described, including partnership with the DSA, student success, model sustainability and future directions for the service. Through a summary of the DVS assessment and needs evaluation process, the service's advantages in staying ahead of patron needs are illustrated. This scalable, sustainable, student‐focused model could be implemented by similar research institutions to expand the capacity of their technical research services.","PeriodicalId":56159,"journal":{"name":"Stat","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141400853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fabrice Moudjieu Leumbe, Frédéric Mortier, Patrice Soh Takam, Nicolas Picard, Allah‐Barem Félix, Baya Fidèle, M. Tadesse, Vivien Rossi
{"title":"Disentangling the impact of nested sources of variability on species growth processes: A mixture of multilevel mixed model approach.","authors":"Fabrice Moudjieu Leumbe, Frédéric Mortier, Patrice Soh Takam, Nicolas Picard, Allah‐Barem Félix, Baya Fidèle, M. Tadesse, Vivien Rossi","doi":"10.1002/sta4.695","DOIUrl":"https://doi.org/10.1002/sta4.695","url":null,"abstract":"The understanding of tree growth processes is crucial for promoting sustainable forest management strategies. This is a challenging task in highly biodiverse ecosystems where many tree species are observed on very few individuals and the small sample sizes hinder a good fit of species‐specific models. We propose the use of finite mixture of random coefficient regression models with multilevel nested random effects to infer guild specific fixed and random effects while evaluating the relative importance of the nested sources of variability on goodness‐of‐fit. This approach extends finite mixture of linear mixed model used for longitudinal or single group structured data contexts. A dedicated expectation–maximisation algorithm is introduced for parameter estimation. Simulations are performed for the evaluation of the misspecification of nested‐grouping structures. This work has been motivated by data collected biennially in Central African rainforests from 1986 to 2010. We show the accuracy of the proposed approach in successfully reproducing individual growth processes and classifying tree species into well‐differentiated clusters with clear ecological interpretations. Moreover, results confirm that interindividual variability appears as the most important factor to explain tropical tree species growth process variability from Central Africa forests.","PeriodicalId":56159,"journal":{"name":"Stat","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141277820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Variable selection for semivarying coefficient models via local averaging","authors":"Xinyi Qi, Mengjie Liu, Chuanlong Xie, Heng Peng","doi":"10.1002/sta4.703","DOIUrl":"https://doi.org/10.1002/sta4.703","url":null,"abstract":"This study aims to provide novel insights into variable selection in the semivarying coefficient model. We focus on the problem of variable selection and screening for the constant coefficient part. A common approach in the existing literature is to infer the constant coefficients by transforming the problem into a linear model scenario, utilizing a fine estimator of the varying coefficients. In this paper, we propose an approximation method for the varying coefficient functions using local averaging, which is characterized by its simplicity, rough and computational efficiency. Additionally, we introduce an adaptive lasso estimator and a forward regression algorithm specifically designed for semivarying coefficient models. Theoretical and experimental results highlight the effectiveness of the local averaging method in extending variable selection techniques from the linear model to the semivarying coefficient model. Our proposed approaches demonstrate a significant improvement in inference speed compared with baseline methods, with little loss of asymptotic efficiency.","PeriodicalId":56159,"journal":{"name":"Stat","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141403965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Clemontina A. Davenport, Hui‐Jie Lee, Quentin Ruiz‐Esparza, Nicholas Janes, Megan L. Neely, Lacey Rende, Gregory P. Samsa, Kelsey Stilley, Jesse D. Troy, Tracy Truong, S. Grambow, Gina‐Maria Pomann
{"title":"Accelerating resident research within quantitative collaboration units in academic healthcare","authors":"Clemontina A. Davenport, Hui‐Jie Lee, Quentin Ruiz‐Esparza, Nicholas Janes, Megan L. Neely, Lacey Rende, Gregory P. Samsa, Kelsey Stilley, Jesse D. Troy, Tracy Truong, S. Grambow, Gina‐Maria Pomann","doi":"10.1002/sta4.689","DOIUrl":"https://doi.org/10.1002/sta4.689","url":null,"abstract":"With increased access to biomedical and electronic health records data and the complexity of research questions, individuals in residency programmes who aim to conduct research require specialized educational programmes and biostatistics support. Biostatistics collaboration units in academic health centres often work with residents to conduct data‐intensive research. These units face numerous challenges related to providing training in statistical literacy and collaborating on resident‐led research within very restricted timelines. Since 2019, the Duke Biostatistics, Epidemiology, and Research Design (BERD) Methods Core has supported over 247 resident‐led projects by developing tools and resources to address these challenges. This manuscript presents novel processes and training materials that other institutions can use to help biostatistics collaboration units effectively support resident training programmes. We provide a framework to support the development of collaborative teams, along with specialized training materials for residents who collaborate with these teams.","PeriodicalId":56159,"journal":{"name":"Stat","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141276264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Model selection for generalized linear models with weak factors","authors":"Xin Zhou, Yan Dong, Qin Yu, Zemin Zheng","doi":"10.1002/sta4.697","DOIUrl":"https://doi.org/10.1002/sta4.697","url":null,"abstract":"The literature has witnessed an upsurge of interest in model selection in diverse fields and optimization applications. Despite the substantial progress, model selection remains a significant challenge when covariates are highly correlated, particularly within economic and financial datasets that exhibit cross‐sectional and serial dependency. In this paper, we introduce a novel methodology named factor augmented regularized model selection with weak factors (WeakFARM) for generalized linear models in the presence of correlated covariates with weak latent factor structure. By identifying weak latent factors and idiosyncratic components and employing them as predictors, WeakFARM converts the challenge from model selection with highly correlated covariates to that with weakly correlated ones. Furthermore, we develop a variable screening method based on the proposed WeakFARM method. Comprehensive theoretical guarantees including estimation consistency, model selection consistency and sure screening property are also provided. We demonstrate the effectiveness of our approach by extensive simulation studies and a real data application in economic forecasting.","PeriodicalId":56159,"journal":{"name":"Stat","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141196624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mara Rojeski Blake, Emily Griffith, Steven J. Pierce, Rachel Levy, Micaela Parker, Marianne Huebner
{"title":"Tell your story: Metrics of success for academic data science collaboration and consulting programs","authors":"Mara Rojeski Blake, Emily Griffith, Steven J. Pierce, Rachel Levy, Micaela Parker, Marianne Huebner","doi":"10.1002/sta4.686","DOIUrl":"https://doi.org/10.1002/sta4.686","url":null,"abstract":"Measuring success plays a central role in justifying and advocating for a statistical or data science consulting or collaboration program (SDSP) within an academic institution. We present several specific metrics to report to targeted audiences to tell the story for success of a robust and sustainable program. While gathering such metrics includes challenges, we discuss potential data sources and possible practices for SDSPs to inform their own approaches. Emphasizing essential metrics for reporting, we also share the metric gathering and reporting practices of two programs in greater detail. New or existing SDSPs should evaluate their local environments and tailor their practice to gathering, analysing and reporting success metrics accordingly. This approach provides a strong foundation to use success metrics to tell compelling stories about the SDSP and enhance program sustainability. The area of success metrics provides ample opportunity for future research projects that leverage qualitative methods and consider mechanisms for adapting to the changing landscape of data science.","PeriodicalId":56159,"journal":{"name":"Stat","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141196627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alberto Cabezas, Marco Battiston, Christopher Nemeth
{"title":"Robust Bayesian nonparametric variable selection for linear regression","authors":"Alberto Cabezas, Marco Battiston, Christopher Nemeth","doi":"10.1002/sta4.696","DOIUrl":"https://doi.org/10.1002/sta4.696","url":null,"abstract":"Spike‐and‐slab and horseshoe regressions are arguably the most popular Bayesian variable selection approaches for linear regression models. However, their performance can deteriorate if outliers and heteroskedasticity are present in the data, which are common features in many real‐world statistics and machine learning applications. This work proposes a Bayesian nonparametric approach to linear regression that performs variable selection while accounting for outliers and heteroskedasticity. Our proposed model is an instance of a Dirichlet process scale mixture model with the advantage that we can derive the full conditional distributions of all parameters in closed‐form, hence producing an efficient Gibbs sampler for posterior inference. Moreover, we present how to extend the model to account for heavy‐tailed response variables. The model's performance is tested against competing algorithms on synthetic and real‐world datasets.","PeriodicalId":56159,"journal":{"name":"Stat","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141196648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ryan A. Peterson, Emily Slade, Gina‐Maria Pomann, Walter T. Ambrosius
{"title":"Working well with statisticians: Perceptions of practicing statisticians on their most successful collaborations","authors":"Ryan A. Peterson, Emily Slade, Gina‐Maria Pomann, Walter T. Ambrosius","doi":"10.1002/sta4.694","DOIUrl":"https://doi.org/10.1002/sta4.694","url":null,"abstract":"Statistical collaboration requires statisticians to work and communicate effectively with nonstatisticians, which can be challenging for many reasons. To identify common themes and lessons for working smoothly with nonstatistician collaborators, two focus groups of primarily academic collaborative statisticians were held. We identified qualities of collaborations that tend to yield fruitful relationships and those that tend to yield nothing (or worse, with one or both parties being dissatisfied). The initial goal was to share helpful knowledge and individual experiences that can facilitate more successful collaborative relationships for statisticians who work within academic statistical collaboration units. These findings were used to design a follow‐up survey to collect perspectives from a wider set of practicing statisticians on important qualities to consider when assessing potential collaborations. In this survey of practicing statisticians, we found widespread agreement on many good and bad qualities to promote and discourage, respectively. Interestingly, some negative and positive collaboration qualities were less agreed upon, suggesting that in such cases, a mix‐and‐match approach of domain experts to statisticians could alleviate friction and statistician burnout in team science settings. The perceived importance of some collaboration characteristics differed between faculty and staff, while others depended on experience.","PeriodicalId":56159,"journal":{"name":"Stat","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141167300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sreeram Anantharaman, N. Ravishanker, Sumanta Basu
{"title":"Hierarchical modeling of irregularly spaced financial returns","authors":"Sreeram Anantharaman, N. Ravishanker, Sumanta Basu","doi":"10.1002/sta4.692","DOIUrl":"https://doi.org/10.1002/sta4.692","url":null,"abstract":"Volatility modeling is crucial in finance, especially when dealing with intraday transaction‐level asset returns. The irregular and high‐frequency nature of the data presents unique challenges. While stochastic volatility (SV) models are widely used for understanding patterns in volatility of daily stock returns which constitute regularly spaced time series, new classes of models must be introduced for analyzing volatility in irregularly spaced intraday data. Specifically these models must accommodate the random gaps between successive transactional events. By modeling the gaps using autoregressive conditional duration (ACD) models, we describe a hierarchical irregular SV autoregressive conditional duration (IR‐SV‐ACD) model for estimating and forecasting intertransaction gaps and the volatility of log‐returns. We carry out the analysis in the Bayesian framework via the Hamiltonian Monte Carlo (HMC) algorithm with No‐U‐turn sampler (NUTS) in R using the cmdstanr package. The fits and forecasts are obtained using Monte Carlo averages based on the posterior samples. We illustrate this approach using simulation studies and real data analysis for intraday prices available at microseconds level of health stocks traded on the New York Stock Exchange (NYSE). The log‐returns and gaps are calculated for the stocks and are used for modeling.","PeriodicalId":56159,"journal":{"name":"Stat","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141114647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Biostatistics deserts and scholarly productivity among US medical schools","authors":"Terrie Vasilopoulos, Dmitry Tumin","doi":"10.1002/sta4.693","DOIUrl":"https://doi.org/10.1002/sta4.693","url":null,"abstract":"The teaching and practice of biostatistics are essential to medical education, but access to biostatistical expertise is limited at many medical schools. Medical schools in rural and historically underserved areas may be doubly disadvantaged in accessing biostatistics expertise by their lack of financial resources and a lack of local programmes training future generations of biostatisticians. Using public data on US medical schools and biostatistics PhD programmes, we identified medical schools operating in ‘biostatistics deserts’ (institutions without an affiliated or colocated PhD programme in biostatistics) and correlated each medical school's location in a biostatistics desert with scholarly productivity, operationalized as the annual number of scholarly publications. Among 126 MD‐granting medical schools in our analysis, 46% were located in a biostatistics desert and had a median of 590 publications/year, compared to 993 for medical schools with a colocated PhD programme and 1,369 with an affiliated programme. On multivariable analysis, the presence of a Biostatistics, Epidemiology, and Research Design (BERD) programme, but not affiliation or colocation with a biostatistics PhD program, was associated with higher scholarly productivity. Structured biostatistics services, such as BERD programmes, may represent the best opportunity for medical schools to leverage the local biostatistics workforce to support scholarly publication.","PeriodicalId":56159,"journal":{"name":"Stat","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141125019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}