{"title":"Who's the GOAT? Sports Rankings and Data-Driven Random Walks on the Symmetric Group","authors":"Gian-Gabriel P. Garcia, J. Carlos Martínez Mori","doi":"arxiv-2409.12107","DOIUrl":"https://doi.org/arxiv-2409.12107","url":null,"abstract":"Given a collection of historical sports rankings, can one tell which player\u0000is the greatest of all time (i.e., the GOAT)? In this work, we design a\u0000data-driven random walk on the symmetric group to obtain a stationary\u0000distribution over player rankings, spanning across different time periods in\u0000sports history. We combine this distribution with a notion of stochastic\u0000dominance to obtain a partial order over the players. We implement our methods\u0000using publicly available data from the Association of Tennis Professionals\u0000(ATP) and the Women's Tennis Association (WTA) to find the GOATs in the\u0000respective categories.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Conformity assessment of processes and lots in the framework of JCGM 106:2012","authors":"Rainer Göb, Steffen Uhlig, Bernard Colson","doi":"arxiv-2409.11912","DOIUrl":"https://doi.org/arxiv-2409.11912","url":null,"abstract":"ISO/IEC 17000:2020 defines conformity assessment as an \"activity to determine\u0000whether specified requirements relating to a product, process, system, person\u0000or body are fulfilled\". JCGM (2012) establishes a framework for accounting for\u0000measurement uncertainty in conformity assessment. The focus of JCGM (2012) is\u0000on the conformity assessment of individual units of product based on\u0000measurements on a cardinal continuous scale. However, the scheme can also be\u0000applied to composite assessment targets like finite lots of product or\u0000manufacturing processes, and to the evaluation of characteristics in discrete\u0000cardinal or nominal scales. We consider the application of the JCGM scheme in the conformity assessment\u0000of finite lots or processes of discrete units subject to a dichotomous quality\u0000classification as conforming and nonconforming. A lot or process is classified\u0000as conforming if the actual proportion nonconforming does not exceed a\u0000prescribed upper tolerance limit, otherwise the lot or process is classified as\u0000nonconforming. The measurement on the lot or process is a statistical\u0000estimation of the proportion nonconforming based on attributes or variables\u0000sampling, and meassurement uncertainty is sampling uncertainty. Following JCGM\u0000(2012), we analyse the effect of measurement uncertainty (sampling uncertainty)\u0000in attributes sampling, and we calculate key conformity assessment parameters,\u0000in particular the producer's and consumer's risk. We suggest to integrate such\u0000parameters as a useful add-on into ISO acceptance sampling standards such as\u0000the ISO 2859 series.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Joshua C. Macdonald, Javier Blanco-Portillo, Marcus W. Feldman, Yoav Ram
{"title":"Bayesian estimation of the number of significant principal components for cultural data","authors":"Joshua C. Macdonald, Javier Blanco-Portillo, Marcus W. Feldman, Yoav Ram","doi":"arxiv-2409.12129","DOIUrl":"https://doi.org/arxiv-2409.12129","url":null,"abstract":"Principal component analysis (PCA) is often used to analyze multivariate data\u0000together with cluster analysis, which depends on the number of principal\u0000components used. It is therefore important to determine the number of\u0000significant principal components (PCs) extracted from a data set. Here we use a\u0000variational Bayesian version of classical PCA, to develop a new method for\u0000estimating the number of significant PCs in contexts where the number of\u0000samples is of a similar to or greater than the number of features. This\u0000eliminates guesswork and potential bias in manually determining the number of\u0000principal components and avoids overestimation of variance by filtering noise.\u0000This framework can be applied to datasets of different shapes (number of rows\u0000and columns), different data types (binary, ordinal, categorical, continuous),\u0000and with noisy and missing data. Therefore, it is especially useful for data\u0000with arbitrary encodings and similar numbers of rows and columns, such as\u0000cultural, ecological, morphological, and behavioral datasets. We tested our\u0000method on both synthetic data and empirical datasets and found that it may\u0000underestimate but not overestimate the number of principal components for the\u0000synthetic data. A small number of components was found for each empirical\u0000dataset. These results suggest that it is broadly applicable across the life\u0000sciences.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimal Visual Search with Highly Heuristic Decision Rules","authors":"Anqi Zhang, Wilson S. Geisler","doi":"arxiv-2409.12124","DOIUrl":"https://doi.org/arxiv-2409.12124","url":null,"abstract":"Visual search is a fundamental natural task for humans and other animals. We\u0000investigated the decision processes humans use when searching briefly presented\u0000displays having well-separated potential target-object locations. Performance\u0000was compared with the Bayesian-optimal decision process under the assumption\u0000that the information from the different potential target locations is\u0000statistically independent. Surprisingly, humans performed slightly better than\u0000optimal, despite humans' substantial loss of sensitivity in the fovea, and the\u0000implausibility of the human brain replicating the optimal computations. We show\u0000that three factors can quantitatively explain these seemingly paradoxical\u0000results. Most importantly, simple and fixed heuristic decision rules reach near\u0000optimal search performance. Secondly, foveal neglect primarily affects only the\u0000central potential target location. Finally, spatially correlated neural noise\u0000causes search performance to exceed that predicted for independent noise. These\u0000findings have far-reaching implications for understanding visual search tasks\u0000and other identification tasks in humans and other animals.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Equity considerations in COVID-19 vaccine allocation modelling: a literature review","authors":"Eva Rumpler, Marc Lipsitch","doi":"arxiv-2409.11462","DOIUrl":"https://doi.org/arxiv-2409.11462","url":null,"abstract":"We conducted a literature review of COVID-19 vaccine allocation modelling\u0000papers, specifically looking for publications that considered equity. We found\u0000that most models did not take equity into account, with the vast majority of\u0000publications presenting aggregated results and no results by any subgroup (e.g.\u0000age, race, geography, etc). We then give examples of how modelling can be\u0000useful to answer equity questions, and highlight some of the findings from the\u0000publications that did. Lastly, we describe seven considerations that seem\u0000important to consider when including equity in future vaccine allocation\u0000models.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Testing for racial bias using inconsistent perceptions of race","authors":"Nora Gera, Emma Pierson","doi":"arxiv-2409.11269","DOIUrl":"https://doi.org/arxiv-2409.11269","url":null,"abstract":"Tests for racial bias commonly assess whether two people of different races\u0000are treated differently. A fundamental challenge is that, because two people\u0000may differ in many ways, factors besides race might explain differences in\u0000treatment. Here, we propose a test for bias which circumvents the difficulty of\u0000comparing two people by instead assessing whether the $textit{same person}$ is\u0000treated differently when their race is perceived differently. We apply our\u0000method to test for bias in police traffic stops, finding that the same driver\u0000is likelier to be searched or arrested by police when they are perceived as\u0000Hispanic than when they are perceived as white. Our test is broadly applicable\u0000to other datasets where race, gender, or other identity data are perceived\u0000rather than self-reported, and the same person is observed multiple times.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Spatio-Temporal-Network Point Processes for Modeling Crime Events with Landmarks","authors":"Zheng Dong, Jorge Mateu, Yao Xie","doi":"arxiv-2409.10882","DOIUrl":"https://doi.org/arxiv-2409.10882","url":null,"abstract":"Self-exciting point processes are widely used to model the contagious effects\u0000of crime events living within continuous geographic space, using their\u0000occurrence time and locations. However, in urban environments, most events are\u0000naturally constrained within the city's street network structure, and the\u0000contagious effects of crime are governed by such a network geography.\u0000Meanwhile, the complex distribution of urban infrastructures also plays an\u0000important role in shaping crime patterns across space. We introduce a novel\u0000spatio-temporal-network point process framework for crime modeling that\u0000integrates these urban environmental characteristics by incorporating\u0000self-attention graph neural networks. Our framework incorporates the street\u0000network structure as the underlying event space, where crime events can occur\u0000at random locations on the network edges. To realistically capture criminal\u0000movement patterns, distances between events are measured using street network\u0000distances. We then propose a new mark for a crime event by concatenating the\u0000event's crime category with the type of its nearby landmark, aiming to capture\u0000how the urban design influences the mixing structures of various crime types. A\u0000graph attention network architecture is adopted to learn the existence of\u0000mark-to-mark interactions. Extensive experiments on crime data from Valencia,\u0000Spain, demonstrate the effectiveness of our framework in understanding the\u0000crime landscape and forecasting crime risks across regions.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xinyu LiJason, DayongJason, Wu, Xinyue Ye, Quan Sun
{"title":"Leveraging Connected Vehicle Data for Near-Crash Detection and Analysis in Urban Environments","authors":"Xinyu LiJason, DayongJason, Wu, Xinyue Ye, Quan Sun","doi":"arxiv-2409.11341","DOIUrl":"https://doi.org/arxiv-2409.11341","url":null,"abstract":"Urban traffic safety is a pressing concern in modern transportation systems,\u0000especially in rapidly growing metropolitan areas where increased traffic\u0000congestion, complex road networks, and diverse driving behaviors exacerbate the\u0000risk of traffic incidents. Traditional traffic crash data analysis offers\u0000valuable insights but often overlooks a broader range of road safety risks.\u0000Near-crash events, which occur more frequently and signal potential collisions,\u0000provide a more comprehensive perspective on traffic safety. However, city-scale\u0000analysis of near-crash events remains limited due to the significant challenges\u0000in large-scale real-world data collection, processing, and analysis. This study\u0000utilizes one month of connected vehicle data, comprising billions of records,\u0000to detect and analyze near-crash events across the road network in the City of\u0000San Antonio, Texas. We propose an efficient framework integrating\u0000spatial-temporal buffering and heading algorithms to accurately identify and\u0000map near-crash events. A binary logistic regression model is employed to assess\u0000the influence of road geometry, traffic volume, and vehicle types on near-crash\u0000risks. Additionally, we examine spatial and temporal patterns, including\u0000variations by time of day, day of the week, and road category. The findings of\u0000this study show that the vehicles on more than half of road segments will be\u0000involved in at least one near-crash event. In addition, more than 50%\u0000near-crash events involved vehicles traveling at speeds over 57.98 mph, and\u0000many occurred at short distances between vehicles. The analysis also found that\u0000wider roadbeds and multiple lanes reduced near-crash risks, while single-unit\u0000trucks slightly increased the likelihood of near-crash events. Finally, the\u0000spatial-temporal analysis revealed that near-crash risks were most prominent\u0000during weekday peak hours, especially in downtown areas.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Arianna Burzacchi, Nicoletta D'Angelo, David Payares-Garcia, Jorge Mateu
{"title":"A point process approach for the classification of noisy calcium imaging data","authors":"Arianna Burzacchi, Nicoletta D'Angelo, David Payares-Garcia, Jorge Mateu","doi":"arxiv-2409.10409","DOIUrl":"https://doi.org/arxiv-2409.10409","url":null,"abstract":"We study noisy calcium imaging data, with a focus on the classification of\u0000spike traces. As raw traces obscure the true temporal structure of neuron's\u0000activity, we performed a tuned filtering of the calcium concentration using two\u0000methods: a biophysical model and a kernel mapping. The former characterizes\u0000spike trains related to a particular triggering event, while the latter filters\u0000out the signal and refines the selection of the underlying neuronal response.\u0000Transitioning from traditional time series analysis to point process theory,\u0000the study explores spike-time distance metrics and point pattern prototypes to\u0000describe repeated observations. We assume that the analyzed neuron's firing\u0000events, i.e. spike occurrences, are temporal point process events. In\u0000particular, the study aims to categorize 47 point patterns by depth, assuming\u0000the similarity of spike occurrences within specific depth categories. The\u0000results highlight the pivotal roles of depth and stimuli in discerning diverse\u0000temporal structures of neuron firing events, confirming the point process\u0000approach based on prototype analysis is largely useful in the classification of\u0000spike traces.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hui Liu, Jiacheng Gu, Xiyuan Huang, Junjie Shi, Tongtong Feng, Ning He
{"title":"TCDformer-based Momentum Transfer Model for Long-term Sports Prediction","authors":"Hui Liu, Jiacheng Gu, Xiyuan Huang, Junjie Shi, Tongtong Feng, Ning He","doi":"arxiv-2409.10176","DOIUrl":"https://doi.org/arxiv-2409.10176","url":null,"abstract":"Accurate sports prediction is a crucial skill for professional coaches, which\u0000can assist in developing effective training strategies and scientific\u0000competition tactics. Traditional methods often use complex mathematical\u0000statistical techniques to boost predictability, but this often is limited by\u0000dataset scale and has difficulty handling long-term predictions with variable\u0000distributions, notably underperforming when predicting point-set-game\u0000multi-level matches. To deal with this challenge, this paper proposes TM2, a\u0000TCDformer-based Momentum Transfer Model for long-term sports prediction, which\u0000encompasses a momentum encoding module and a prediction module based on\u0000momentum transfer. TM2 initially encodes momentum in large-scale unstructured\u0000time series using the local linear scaling approximation (LLSA) module. Then it\u0000decomposes the reconstructed time series with momentum transfer into trend and\u0000seasonal components. The final prediction results are derived from the additive\u0000combination of a multilayer perceptron (MLP) for predicting trend components\u0000and wavelet attention mechanisms for seasonal components. Comprehensive\u0000experimental results show that on the 2023 Wimbledon men's tournament datasets,\u0000TM2 significantly surpasses existing sports prediction models in terms of\u0000performance, reducing MSE by 61.64% and MAE by 63.64%.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}