{"title":"Statistical Network Analysis: Past, Present, and Future","authors":"Srijan Sengupta","doi":"arxiv-2311.00122","DOIUrl":"https://doi.org/arxiv-2311.00122","url":null,"abstract":"This article provides a brief overview of statistical network analysis, a\u0000rapidly evolving field of statistics, which encompasses statistical models,\u0000algorithms, and inferential methods for analyzing data in the form of networks.\u0000Particular emphasis is given to connecting the historical developments in\u0000network science to today's statistical network analysis, and outlining\u0000important new areas for future research. This invited article is intended as a book chapter for the volume \"Frontiers\u0000of Statistics and Data Science\" edited by Subhashis Ghoshal and Anindya Roy for\u0000the International Indian Statistical Association Series on Statistics and Data\u0000Science, published by Springer. This review article covers the material from\u0000the short course titled \"Statistical Network Analysis: Past, Present, and\u0000Future\" taught by the author at the Annual Conference of the International\u0000Indian Statistical Association, June 6-10, 2023, at Golden, Colorado.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":"102 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138526835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving the Aggregation and Evaluation of NBA Mock Drafts","authors":"Jared D. Fisher, Colin Montague","doi":"arxiv-2310.16813","DOIUrl":"https://doi.org/arxiv-2310.16813","url":null,"abstract":"Many enthusiasts and experts publish forecasts of the order players are\u0000drafted into professional sports leagues, known as mock drafts. Using a novel\u0000dataset of mock drafts for the National Basketball Association (NBA), we\u0000analyze authors' mock draft accuracy over time and ask how we can reasonably\u0000use information from multiple authors. To measure how accurate mock drafts are,\u0000we assume that both mock drafts and the actual draft are ranked lists, and we\u0000propose that rank-biased distance (RBD) of Webber et al. (2010) is the\u0000appropriate error metric for mock draft accuracy. This is because RBD allows\u0000mock drafts to have a different length than the actual draft, accounts for\u0000players not appearing in both lists, and weights errors early in the draft more\u0000than errors later on. We validate that mock drafts, as expected, improve in\u0000accuracy over the course of a season, and that accuracy of the mock drafts\u0000produced right before their drafts is fairly stable across seasons. To be able\u0000to combine information from multiple mock drafts into a single consensus mock\u0000draft, we also propose a ranked-list combination method based on the ideas of\u0000ranked-choice voting. We show that our method provides improved forecasts over\u0000the standard Borda count combination method used for most similar analyses in\u0000sports, and that either combination method provides a more accurate forecast\u0000over time than any single author.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":"55 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138526834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Unraveling the Skillsets of Data Scientists: Text Mining Analysis of Dutch University Master Programs in Data Science and Artificial Intelligence","authors":"Mathijs J. Mol, Barbara Belfi, Zsuzsa Bakk","doi":"arxiv-2310.14726","DOIUrl":"https://doi.org/arxiv-2310.14726","url":null,"abstract":"The growing demand for data scientists in the global labor market and the\u0000Netherlands has led to a rise in data science and artificial intelligence (AI)\u0000master programs offered by universities. However, there is still a lack of\u0000clarity regarding the specific skillsets of data scientists. This study aims to\u0000address this issue by employing Correlated Topic Modeling (CTM) to analyse the\u0000content of 41 master programs offered by seven Dutch universities. We assess\u0000the differences and similarities in the core skills taught by these programs,\u0000determine the subject-specific and general nature of the skills, and provide a\u0000comparison between the different types of universities offering these programs.\u0000Our findings reveal that research, data processing, statistics and ethics are\u0000the predominant skills taught in Dutch data science and AI master programs,\u0000with general universities emphasizing research skills and technical\u0000universities focusing more on IT and electronic skills. This study contributes\u0000to a better understanding of the diverse skillsets of data scientists, which is\u0000essential for employers, universities, and prospective students.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":"7 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138526963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The evolving of Data Science and the Saudi Arabia case. How much have we changed in 13 years?","authors":"Igor Barahona","doi":"arxiv-2310.14808","DOIUrl":"https://doi.org/arxiv-2310.14808","url":null,"abstract":"A comprehensive examination of data science vocabulary usage over the past 13\u0000years in this work is conducted. The investigation commences with a dataset\u0000comprising 16,018 abstracts that feature the term \"data science\" in either the\u0000title, abstract, or keywords. The study involves the identification of\u0000documents that introduce novel vocabulary and subsequently explores how this\u0000vocabulary has been incorporated into scientific literature. To achieve these\u0000objectives, I employ techniques such as Exploratory Data Analysis, Latent\u0000Semantic Analysis, Latent Dirichlet Analysis, and N-grams Analysis. A\u0000comparison of scientific publications between overall results and those\u0000specific to Saudi Arabia is presented. Based on how the vocabulary is utilized,\u0000representative articles are identified.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":"27 10","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138526887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A p-value for Process Tracing and other N=1 Studies","authors":"Matias Lopez, Jake Bowers","doi":"arxiv-2310.13826","DOIUrl":"https://doi.org/arxiv-2310.13826","url":null,"abstract":"The paper introduces a (p)-value that summarizes the evidence against a\u0000rival causal theory that explains an observed outcome in a single case. We show\u0000how to represent the probability distribution characterizing a theorized rival\u0000hypothesis (the null) in the absence of randomization of treatment and when\u0000counting on qualitative data, for instance when conducting process tracing. As\u0000in Fisher's autocite*{fisher1935design} original design, our (p)-value\u0000indicates how frequently one would find the same observations or even more\u0000favorable observations under a theory that is compatible with our observations\u0000but antagonistic to the working hypothesis. We also present an extension that\u0000allows researchers assess the sensitivity of their results to confirmation\u0000bias. Finally, we illustrate the application of our hypothesis test using the\u0000study by Snow autocite*{Snow1855} about the cause of Cholera in Soho, a\u0000classic in Process Tracing, Epidemiology, and Microbiology. Our framework suits\u0000any type of case studies and evidence, such as data from interviews, archives,\u0000or participant observation.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138526837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reducing Uncertainty in Sea-level Rise Prediction: A Spatial-variability-aware Approach","authors":"Subhankar Ghosh, Shuai An, Arun Sharma, Jayant Gupta, Shashi Shekhar, Aneesh Subramanian","doi":"arxiv-2310.15179","DOIUrl":"https://doi.org/arxiv-2310.15179","url":null,"abstract":"Given multi-model ensemble climate projections, the goal is to accurately and\u0000reliably predict future sea-level rise while lowering the uncertainty. This\u0000problem is important because sea-level rise affects millions of people in\u0000coastal communities and beyond due to climate change's impacts on polar ice\u0000sheets and the ocean. This problem is challenging due to spatial variability\u0000and unknowns such as possible tipping points (e.g., collapse of Greenland or\u0000West Antarctic ice-shelf), climate feedback loops (e.g., clouds, permafrost\u0000thawing), future policy decisions, and human actions. Most existing climate\u0000modeling approaches use the same set of weights globally, during either\u0000regression or deep learning to combine different climate projections. Such\u0000approaches are inadequate when different regions require different weighting\u0000schemes for accurate and reliable sea-level rise predictions. This paper\u0000proposes a zonal regression model which addresses spatial variability and model\u0000inter-dependency. Experimental results show more reliable predictions using the\u0000weights learned via this approach on a regional scale.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":"38 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138526831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Group sequential two-stage preference designs","authors":"Ruyi Liu, Fan Li, Denise Esserman, Mary M. Ryan","doi":"arxiv-2310.11603","DOIUrl":"https://doi.org/arxiv-2310.11603","url":null,"abstract":"The two-stage preference design (TSPD) enables the inference for treatment\u0000efficacy while allowing for incorporation of patient preference to treatment.\u0000It can provide unbiased estimates for selection and preference effects, where a\u0000selection effect occurs when patients who prefer one treatment respond\u0000differently than those who prefer another, and a preference effect is the\u0000difference in response caused by an interaction between the patient's\u0000preference and the actual treatment they receive. One potential barrier to\u0000adopting TSPD in practice, however, is the relatively large sample size\u0000required to estimate selection and preference effects with sufficient power. To\u0000address this concern, we propose a group sequential two-stage preference design\u0000(GS-TSPD), which combines TSPD with sequential monitoring for early stopping.\u0000In the GS-TSPD, pre-planned sequential monitoring allows investigators to\u0000conduct repeated hypothesis tests on accumulated data prior to full enrollment\u0000to assess study eligibility for early trial termination without inflating type\u0000I error rates. Thus, the procedure allows investigators to terminate the study\u0000when there is sufficient evidence of treatment, selection, or preference\u0000effects during an interim analysis, thereby reducing the design resource in\u0000expectation. To formalize such a procedure, we verify the independent\u0000increments assumption for testing the selection and preference effects and\u0000apply group sequential stopping boundaries from the approximate sequential\u0000density functions. Simulations are then conducted to investigate the operating\u0000characteristics of our proposed GS-TSPD compared to the traditional TSPD. We\u0000demonstrate the applicability of the design using a study of Hepatitis C\u0000treatment modality.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":"27 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138526838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Leontine Alkema, Thomas Brendan Murphy, Adrian E. Raftery
{"title":"Interview with Adrian Raftery","authors":"Leontine Alkema, Thomas Brendan Murphy, Adrian E. Raftery","doi":"arxiv-2310.11095","DOIUrl":"https://doi.org/arxiv-2310.11095","url":null,"abstract":"Professor Adrian E. Raftery is the Boeing International Professor of\u0000Statistics and Sociology, and an adjunct professor of Atmospheric Sciences, at\u0000the University of Washington in Seattle. He was born in Dublin, Ireland, and\u0000obtained a B.A. in Mathematics and an M.Sc. in Statistics and Operations\u0000Research at Trinity College Dublin. He obtained a doctorate in mathematical\u0000statistics from the Universit'e Pierre et Marie Curie under the supervision of\u0000Paul Deheuvels. He was a lecturer in statistics at Trinity College Dublin, and\u0000then an associate and full professor of statistics and sociology at the\u0000University of Washington. He was the founding Director of the Center for\u0000Statistics and Social Sciences. Professor Raftery has published over 200 articles in peer-reviewed\u0000statistical, sociological and other journals. His research focuses on Bayesian\u0000model selection and Bayesian model averaging, model-based clustering, inference\u0000for deterministic models, and the development of new statistical methods for\u0000demography, sociology, and the environmental and health sciences. He is a member of the United States National Academy of Sciences, a Fellow of\u0000the American Academy of Arts and Sciences, an Honorary Member of the Royal\u0000Irish Academy, a member of the Washington State Academy of Sciences, a Fellow\u0000of the American Statistical Association, a Fellow of the Institute of\u0000Mathematical Statistics, and an elected Member of the Sociological Research\u0000Association. He has won multiple awards for his research. He was Coordinating\u0000and Applications Editor of the Journal of the American Statistical Association\u0000and Editor of Sociological Methodology. He was identified as the world's most\u0000cited researcher in mathematics for the period 1995-2005. Thirty-three students have obtained Ph.D.'s working under Raftery's\u0000supervision, of whom 21 hold or have held tenure-track university faculty\u0000positions.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":"59 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138526867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
František Bartoš, Alexandra Sarafoglou, Henrik R. Godmann, Amir Sahrani, David Klein Leunk, Pierre Y. Gui, David Voss, Kaleem Ullah, Malte J. Zoubek, Franziska Nippold, Frederik Aust, Felipe F. Vieira, Chris-Gabriel Islam, Anton J. Zoubek, Sara Shabani, Jonas Petter, Ingeborg B. Roos, Adam Finnemann, Aaron B. Lob, Madlen F. Hoffstadt, Jason Nak, Jill de Ron, Koen Derks, Karoline Huth, Sjoerd Terpstra, Thomas Bastelica5, Magda Matetovici, Vincent L. Ott, Andreea S. Zetea, Katharina Karnbach, Michelle C. Donzallaz, Arne John, Roy M. Moore, Franziska Assion, Riet van Bork, Theresa E. Leidinger, Xiaochang Zhao, Adrian Karami Motaghi, Ting Pang, Hannah Armstrong, Tianqi Peng, Mara Bialas, Joyce Y. -C. Pang, Bohan Fu, Shujun Yang, Xiaoyi Lin, Dana Sleiffer, Miklos Bognar, Balazs Aczel, Eric-Jan Wagenmakers
{"title":"Fair coins tend to land on the same side they started: Evidence from 350,757 Flips","authors":"František Bartoš, Alexandra Sarafoglou, Henrik R. Godmann, Amir Sahrani, David Klein Leunk, Pierre Y. Gui, David Voss, Kaleem Ullah, Malte J. Zoubek, Franziska Nippold, Frederik Aust, Felipe F. Vieira, Chris-Gabriel Islam, Anton J. Zoubek, Sara Shabani, Jonas Petter, Ingeborg B. Roos, Adam Finnemann, Aaron B. Lob, Madlen F. Hoffstadt, Jason Nak, Jill de Ron, Koen Derks, Karoline Huth, Sjoerd Terpstra, Thomas Bastelica5, Magda Matetovici, Vincent L. Ott, Andreea S. Zetea, Katharina Karnbach, Michelle C. Donzallaz, Arne John, Roy M. Moore, Franziska Assion, Riet van Bork, Theresa E. Leidinger, Xiaochang Zhao, Adrian Karami Motaghi, Ting Pang, Hannah Armstrong, Tianqi Peng, Mara Bialas, Joyce Y. -C. Pang, Bohan Fu, Shujun Yang, Xiaoyi Lin, Dana Sleiffer, Miklos Bognar, Balazs Aczel, Eric-Jan Wagenmakers","doi":"arxiv-2310.04153","DOIUrl":"https://doi.org/arxiv-2310.04153","url":null,"abstract":"Many people have flipped coins but few have stopped to ponder the statistical\u0000and physical intricacies of the process. In a preregistered study we collected\u0000350,757 coin flips to test the counterintuitive prediction from a physics model\u0000of human coin tossing developed by Persi Diaconis. The model asserts that when\u0000people flip an ordinary coin, it tends to land on the same side it started --\u0000Diaconis estimated the probability of a same-side outcome to be about 51%. Our\u0000data lend strong support to this precise prediction: the coins landed on the\u0000same side more often than not, $text{Pr}(text{same side}) = 0.508$, 95%\u0000credible interval (CI) [$0.506$, $0.509$], $text{BF}_{text{same-side bias}} =\u00002364$. Furthermore, the data revealed considerable between-people variation in\u0000the degree of this same-side bias. Our data also confirmed the generic\u0000prediction that when people flip an ordinary coin -- with the initial side-up\u0000randomly determined -- it is equally likely to land heads or tails:\u0000$text{Pr}(text{heads}) = 0.500$, 95% CI [$0.498$, $0.502$],\u0000$text{BF}_{text{heads-tails bias}} = 0.183$. Furthermore, this lack of\u0000heads-tails bias does not appear to vary across coins. Our data therefore\u0000provide strong evidence that when some (but not all) people flip a fair coin,\u0000it tends to land on the same side it started. Our data provide compelling\u0000statistical support for Diaconis' physics model of coin tossing.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":"21 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138526932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On Fractional Spherically Restricted Hyperbolic Diffusion Random Field","authors":"Nikolai Leonenko, Andriy Olenko, Jayme Vaz","doi":"arxiv-2310.03933","DOIUrl":"https://doi.org/arxiv-2310.03933","url":null,"abstract":"The paper investigates solutions of the fractional hyperbolic diffusion\u0000equation in its most general form with two fractional derivatives of distinct\u0000orders. The solutions are given as spatial-temporal homogeneous and isotropic\u0000random fields and their spherical restrictions are studied. The spectral\u0000representations of these fields are derived and the associated angular spectrum\u0000is analysed. The obtained mathematical results are illustrated by numerical\u0000examples. In addition, the numerical investigations assess the dependence of\u0000the covariance structure and other properties of these fields on the orders of\u0000fractional derivatives.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":"20 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138526832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}