Elena Moltchanova, Miguel Moyers-González, Geertrui Van de Voorde, José Felipe Voloch, Philipp Wacker
{"title":"How to survive the Squid Games using probability theory","authors":"Elena Moltchanova, Miguel Moyers-González, Geertrui Van de Voorde, José Felipe Voloch, Philipp Wacker","doi":"arxiv-2409.05263","DOIUrl":"https://doi.org/arxiv-2409.05263","url":null,"abstract":"In this paper, we consider how probability theory can be used to determine\u0000the survival strategy in two of the ``Squid Game\" and ``Squid Game: The\u0000Challenge\" challenges: the Hopscotch and the Warships. We show how Hopscotch\u0000can be easily tackled with the knowledge of the binomial distribution, taught\u0000in introductory statistics courses, while Warships is a much more complex\u0000problem, which can be tackled at different levels.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":"15 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142179815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Censored Data Forecasting: Applying Tobit Exponential Smoothing with Time Aggregation","authors":"Diego J. Pedregal, Juan R. Trapero","doi":"arxiv-2409.05412","DOIUrl":"https://doi.org/arxiv-2409.05412","url":null,"abstract":"This study introduces a novel approach to forecasting by Tobit Exponential\u0000Smoothing with time aggregation constraints. This model, a particular case of\u0000the Tobit Innovations State Space system, handles censored observed time series\u0000effectively, such as sales data, with known and potentially variable censoring\u0000levels over time. The paper provides a comprehensive analysis of the model\u0000structure, including its representation in system equations and the optimal\u0000recursive estimation of states. It also explores the benefits of time\u0000aggregation in state space systems, particularly for inventory management and\u0000demand forecasting. Through a series of case studies, the paper demonstrates\u0000the effectiveness of the model across various scenarios, including hourly and\u0000daily censoring levels. The results highlight the model's ability to produce\u0000accurate forecasts and confidence bands comparable to those from uncensored\u0000models, even under severe censoring conditions. The study further discusses the\u0000implications for inventory policy, emphasizing the importance of avoiding\u0000spiral-down effects in demand estimation. The paper concludes by showcasing the\u0000superiority of the proposed model over standard methods, particularly in\u0000reducing lost sales and excess stock, thereby optimizing inventory costs. This\u0000research contributes to the field of forecasting by offering a robust model\u0000that effectively addresses the challenges of censored data and time\u0000aggregation.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":"154 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142179809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bianca-Elena Mihăilă, Marian-Gabriel Hâncean, Matjaž Perc, Jürgen Lerner, Iulian Oană, Marius Geantă, José Luis Molina, Cosmina Cioroboiu
{"title":"Cross-sectional personal network analysis of adult smoking in rural areas","authors":"Bianca-Elena Mihăilă, Marian-Gabriel Hâncean, Matjaž Perc, Jürgen Lerner, Iulian Oană, Marius Geantă, José Luis Molina, Cosmina Cioroboiu","doi":"arxiv-2408.14832","DOIUrl":"https://doi.org/arxiv-2408.14832","url":null,"abstract":"While research on adolescent smoking is extensive, little attention has been\u0000given to smoking behaviors among rural middle-aged and older adults. This study\u0000examines the role of personal networks and sociodemographic factors in\u0000predicting smoking status in a rural Romanian community. Using a link-tracing\u0000sampling method, we gathered data from 76 participants out of 83 in Leresti,\u0000Arges County. Face-to-face interviews collected sociodemographic data and\u0000network information, including smoking status and relational dynamics. We\u0000applied multilevel logistic regression models to predict smoking behaviors\u0000(current smokers, former smokers, and non-smokers) based on individual\u0000characteristics and network influences. Results indicate that social networks\u0000significantly influence smoking behaviors. For current smokers, having a\u0000smoking family member greatly increased the odds of smoking (OR = 2.51, 95% CI:\u00001.62, 3.91, p < 0.001). Similarly, non-smoking family members increased the\u0000likelihood of being a non-smoker (OR = 1.64, 95% CI: 1.04, 2.61, p < 0.05).\u0000Women were less likely to smoke, highlighting sex differences in behavior.\u0000These findings emphasize the critical role of social networks in shaping\u0000smoking habits, advocating for targeted interventions in rural areas.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":"183 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142179810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alina Dubovskaya, Caroline B. Pena, David J. P. O'Sullivan
{"title":"Modeling information spread across networks with communities using a multitype branching process framework","authors":"Alina Dubovskaya, Caroline B. Pena, David J. P. O'Sullivan","doi":"arxiv-2408.04456","DOIUrl":"https://doi.org/arxiv-2408.04456","url":null,"abstract":"The dynamics of information diffusion in complex networks is widely studied\u0000in an attempt to understand how individuals communicate and how information\u0000travels and reaches individuals through interactions. However, complex networks\u0000often present community structure, and tools to analyse information diffusion\u0000on networks with communities are needed. In this paper, we develop theoretical\u0000tools using multi-type branching processes to model and analyse simple\u0000contagion information spread on a broad class of networks with community\u0000structure. We show how, by using limited information about the network -- the\u0000degree distribution within and between communities -- we can calculate standard\u0000statistical characteristics of the dynamics of information diffusion, such as\u0000the extinction probability, hazard function, and cascade size distribution.\u0000These properties can be estimated not only for the entire network but also for\u0000each community separately. Furthermore, we estimate the probability of\u0000information spreading from one community to another where it is not currently\u0000spreading. We demonstrate the accuracy of our framework by applying it to two\u0000specific examples: the Stochastic Block Model and a log-normal network with\u0000community structure. We show how the initial seeding location affects the\u0000observed cascade size distribution on a heavy-tailed network and that our\u0000framework accurately captures this effect.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":"42 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141939997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jose Antonio Roldan-Nofuentes, Saad bouh Sidaty-regad
{"title":"Asymptotic confidence intervals for the difference and the ratio of the weighted kappa coefficients of two diagnostic tests subject to a paired design","authors":"Jose Antonio Roldan-Nofuentes, Saad bouh Sidaty-regad","doi":"arxiv-2407.21387","DOIUrl":"https://doi.org/arxiv-2407.21387","url":null,"abstract":"The weighted kappa coefficient of a binary diagnostic test is a measure of\u0000the beyond-chance agreement between the diagnostic test and the gold standard,\u0000and depends on the sensitivity and specificity of the diagnostic test, on the\u0000disease prevalence and on the relative importance between the false positives\u0000and the false negatives. This article studies the comparison of the weighted\u0000kappa coefficients of two binary diagnostic tests subject to a paired design\u0000through confidence intervals. Three asymptotic confidence intervals are studied\u0000for the difference between the parameters and five other intervals for the\u0000ratio. Simulation experiments were carried out to study the coverage\u0000probabilities and the average lengths of the intervals, giving some general\u0000rules for application. A method is also proposed to calculate the sample size\u0000necessary to compare the two weighted kappa coefficients through a confidence\u0000interval. A program in R has been written to solve the problem studied and it\u0000is available as supplementary material. The results were applied to a real\u0000example of the diagnosis of malaria.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":"74 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141868991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jose Antonio Roldan-Nofuentes, Saad Bouh Sidaty-Regad
{"title":"Comparison of the likelihood ratios of two diagnostic tests subject to a paired design: confidence intervals and sample size","authors":"Jose Antonio Roldan-Nofuentes, Saad Bouh Sidaty-Regad","doi":"arxiv-2407.21382","DOIUrl":"https://doi.org/arxiv-2407.21382","url":null,"abstract":"Positive and negative likelihood ratios are parameters which are used to\u0000assess and compare the effectiveness of binary diagnostic tests. Both\u0000parameters only depend on the sensitivity and specificity of the diagnostic\u0000test and are equivalent to a relative risk. This article studies the comparison\u0000of the likelihood ratios of two binary diagnostic tests subject to a paired\u0000design through confidence intervals. Six approximate confidence intervals are\u0000presented for the ratio of the likelihood ratios, and simulation experiments\u0000are carried out to study the coverage probabilities and the average lengths of\u0000the intervals considered, and some general rules of application are proposed. A\u0000method is also proposed to determine the sample size necessary to estimate the\u0000ratio between the likelihood ratios with a determined precision. The results\u0000were applied to the diagnosis of coronary artery disease.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":"77 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141868994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Computational methods to simultaneously compare the predictive values of two diagnostic tests with missing data: EM-SEM algorithms and multiple imputation","authors":"Jose Antonio Roldan-Nofuentes","doi":"arxiv-2407.21190","DOIUrl":"https://doi.org/arxiv-2407.21190","url":null,"abstract":"Predictive values are measures of the clinical accuracy of a binary\u0000diagnostic test, and depend on the sensitivity and the specificity of the\u0000diagnostic test and on the disease prevalence among the population being\u0000studied. This article studies hypothesis tests to simultaneously compare the\u0000predictive values of two binary diagnostic tests in the presence of missing\u0000data. The hypothesis tests were solved applying two computational methods: the\u0000expectation maximization and the supplemented expectation maximization\u0000algorithms, and multiple imputation. Simulation experiments were carried out to\u0000study the sizes and the powers of the hypothesis tests, giving some general\u0000rules of application. Two R programmes were written to apply each method, and\u0000they are available as supplementary material for the manuscript. The results\u0000were applied to the diagnosis of Alzheimer's disease.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":"212 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141868992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"How Books Tell a History of Statistics in Portugal: Works of Foreigners, Estrangeirados, and Others","authors":"Dinis Pestana, Rui Santos","doi":"arxiv-2407.19433","DOIUrl":"https://doi.org/arxiv-2407.19433","url":null,"abstract":"Foreigners and \"estrangeirados\", an expression meaning \"people going to a\u0000foreign country [\"estrangeiro\"] getting there further education\", had a leading\u0000role in the development of Mathematical Statistics in Portugal. In what\u0000concerns Statistics, \"estrangeirados\" in the nineteenth century were mainly\u0000liberal intellectuals exiled for political reasons. From 1930 onwards, the\u0000research funding authority sent university professors abroad, and hired foreign\u0000researchers to stay in Portuguese institutions, and some of them were\u0000instrumental in the importation of new concepts and methods of inferential\u0000statistics. After 1970, there was a huge program of sending young researchers\u0000abroad for doctoral studies. At the same time, many new universities and\u0000polytechnic institutes have been created in Portugal. After that, aside from\u0000foreigners who choose to have a research career in those institutions and the\u0000\"estrangeirados\" who had returned and created programs of doctoral studies,\u0000others, who hadn't the opportunity of studying abroad, began to play a decisive\u0000role in the development of Statistics in Portugal. The publication of handbooks\u0000on Probability and Statistics, thesis and core papers in Portuguese scientific\u0000journals, and also of works for the layman, reveals how Statistics progressed\u0000from descriptive to a mathematical discipline used for inference in all fields\u0000of knowledge, from natural sciences to methodology of scientific research.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":"34 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141868993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Impact of Foreign Players in the English Premier League: A Mathematical Analys","authors":"Amit K Chattopadhyay, A. Abdul, Sudhir Jain","doi":"arxiv-2407.19285","DOIUrl":"https://doi.org/arxiv-2407.19285","url":null,"abstract":"We undertake extensive analysis of English Premier League data over the\u0000period 2009/10 to 2017/18 to identify and rank key factors affecting the\u0000economic and footballing performances of the teams. Alternative end-of-season\u0000league tables are generated by re-ranking the teams based on five different\u0000descriptors - total expenditure, total funds spent on players, total funds\u0000spent on foreign players, the ratio of foreign to British players and the\u0000overall profit. The unequal distribution of resources and expenditure between\u0000the clubs is analyzed through Lorenz curves. A comparative analysis of the\u0000differences between the alternative tables and the conventional end-of-season\u0000league table establishes the most likely factors to influence the performances\u0000of the teams that we also rank using Principal Component Analysis. We find that\u0000the top teams in the league are also those that tend to have the highest\u0000expenditure overall, for all players, including foreign players; they also have\u0000the highest ratios of foreign to British players. Our statistical and machine\u0000learning study also indicates that successful performance on the field may not\u0000guarantee healthy profits at the end of the season.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141868996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Lucas Makinen, Tom Charnock, Natalia Porqueres, Axel Lapel, Alan Heavens, Benjamin D. Wandelt
{"title":"Hybrid summary statistics: neural weak lensing inference beyond the power spectrum","authors":"T. Lucas Makinen, Tom Charnock, Natalia Porqueres, Axel Lapel, Alan Heavens, Benjamin D. Wandelt","doi":"arxiv-2407.18909","DOIUrl":"https://doi.org/arxiv-2407.18909","url":null,"abstract":"In inference problems, we often have domain knowledge which allows us to\u0000define summary statistics that capture most of the information content in a\u0000dataset. In this paper, we present a hybrid approach, where such physics-based\u0000summaries are augmented by a set of compressed neural summary statistics that\u0000are optimised to extract the extra information that is not captured by the\u0000predefined summaries. The resulting statistics are very powerful inputs to\u0000simulation-based or implicit inference of model parameters. We apply this\u0000generalisation of Information Maximising Neural Networks (IMNNs) to parameter\u0000constraints from tomographic weak gravitational lensing convergence maps to\u0000find summary statistics that are explicitly optimised to complement angular\u0000power spectrum estimates. We study several dark matter simulation resolutions\u0000in low- and high-noise regimes. We show that i) the information-update\u0000formalism extracts at least $3times$ and up to $8times$ as much information\u0000as the angular power spectrum in all noise regimes, ii) the network summaries\u0000are highly complementary to existing 2-point summaries, and iii) our formalism\u0000allows for networks with smaller, physically-informed architectures to match\u0000much larger regression networks with far fewer simulations needed to obtain\u0000asymptotically optimal inference.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":"21 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141873119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}