{"title":"ceylon: An R package for plotting the maps of Sri Lanka","authors":"Thiyanga S. Talagala","doi":"arxiv-2401.02467","DOIUrl":"https://doi.org/arxiv-2401.02467","url":null,"abstract":"The rapid evolution in the fields of computer science, data science, and\u0000artificial intelligence has significantly transformed the utilisation of data\u0000for decision-making. Data visualisation plays a critical role in any work that\u0000involves data. Visualising data on maps is frequently encountered in many\u0000fields. Visualising data on maps not only transforms raw data into visually\u0000comprehensible representations but also converts complex spatial information\u0000into simple, understandable form. Locating the data files necessary for map\u0000creation can be a challenging task. Establishing a centralised repository can\u0000alleviate the challenging task of finding shape files, allowing users to\u0000efficiently discover geographic data. The ceylon R package is designed to make\u0000simple feature data related to Sri Lanka's administrative boundaries and rivers\u0000and streams accessible for a diverse range of R users. With straightforward\u0000functionalities, this package allows users to quickly plot and explore\u0000administrative boundaries and rivers and streams in Sri Lanka.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139398202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Facilitating the Integration of Ethical Reasoning into Quantitative Courses: Stakeholder Analysis, Ethical Practice Standards, and Case Studies","authors":"Rochelle E. Tractenberg, Suzanne Thorton","doi":"arxiv-2401.01973","DOIUrl":"https://doi.org/arxiv-2401.01973","url":null,"abstract":"Case studies are typically used to teach 'ethics', but in quantitative\u0000courses it can seem distracting, for both instructor and learner, to introduce\u0000a case analysis. Moreover, case analyses are typically focused on issues\u0000relating to people: obtaining consent, dealing with research team members,\u0000and/or potential institutional policy violations. While relevant to some\u0000research, not all students in quantitative courses plan to become researchers,\u0000and ethical practice is an essential topic for students of of mathematics,\u0000statistics, data science, and computing regardless of whether or not the\u0000learner intends to do research. Ethical reasoning is a way of thinking that\u0000requires the individual to assess what they know about a potential ethical\u0000problem (their prerequisite knowledge), and in some cases, how behaviors they\u0000observe, are directed to perform, or have performed, diverge from what they\u0000know to be ethical behavior. Ethical reasoning is a learnable, improvable set\u0000of knowledge, skills, and abilities that enable learners to recognize what they\u0000do and do not know about what constitutes 'ethical practice' of a discipline,\u0000and in some cases, to contemplate alternative decisions about how to first\u0000recognize, and then proceed past, or respond to, such divergences. A\u0000stakeholder analysis is part of prerequisite knowledge, and can be used whether\u0000there is or is not an actual case or situation to react to. In courses with\u0000mainly quantitative content, a stakeholder analysis is a useful tool for\u0000instruction and assessment. It can be used to both integrate authentic ethical\u0000content and encourage careful quantitative thought. It is a mistake to treat\u0000'training in ethical practice' and 'training in responsible conduct of\u0000research' as the same thing. This paper discusses how to introduce ethical\u0000reasoning, stakeholder analysis, and ethical practice standards authentically\u0000in quantitative courses.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":"7 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139103985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Revisiting the effect of greediness on the efficacy of exchange algorithms for generating exact optimal experimental designs","authors":"William T. Gullion, Stephen J. Walsh","doi":"arxiv-2312.12645","DOIUrl":"https://doi.org/arxiv-2312.12645","url":null,"abstract":"Coordinate exchange (CEXCH) is a popular algorithm for generating exact\u0000optimal experimental designs. The authors of CEXCH advocated for a highly\u0000greedy implementation - one that exchanges and optimizes single element\u0000coordinates of the design matrix. We revisit the effect of greediness on CEXCHs\u0000efficacy for generating highly efficient designs. We implement the\u0000single-element CEXCH (most greedy), a design-row (medium greedy) optimization\u0000exchange, and particle swarm optimization (PSO; least greedy) on 21 exact\u0000response surface design scenarios, under the $D$- and $I-$criterion, which have\u0000well-known optimal designs that have been reproduced by several researchers. We\u0000found essentially no difference in performance of the most greedy CEXCH and the\u0000medium greedy CEXCH. PSO did exhibit better efficacy for generating $D$-optimal\u0000designs, and for most $I$-optimal designs than CEXCH, but not to a strong\u0000degree under our parametrization. This work suggests that further investigation\u0000of the greediness dimension and its effect on CEXCH efficacy on a wider suite\u0000of models and criterion is warranted.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":"73 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138825071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Harald Vilhelm Skat-Rørdam, Mia Hang Knudsen, Simon Nørby Knudsen, Nicole Nadine Lønfeldt, Sneha Das, Line Katrine Harder Clemmensen
{"title":"Applying Pre-Trained Deep-Learning Model on Wrist Angel Data -- An Analysis Plan","authors":"Harald Vilhelm Skat-Rørdam, Mia Hang Knudsen, Simon Nørby Knudsen, Nicole Nadine Lønfeldt, Sneha Das, Line Katrine Harder Clemmensen","doi":"arxiv-2312.09052","DOIUrl":"https://doi.org/arxiv-2312.09052","url":null,"abstract":"We aim to investigate if we can improve predictions of stress caused by OCD\u0000symptoms using pre-trained models, and present our statistical analysis plan in\u0000this paper. With the methods presented in this plan, we aim to avoid bias from\u0000data knowledge and thereby strengthen our hypotheses and findings. The Wrist\u0000Angel study, which this statistical analysis plan concerns, contains data from\u0000nine participants, between 8 and 17 years old, diagnosed with\u0000obsessive-compulsive disorder (OCD). The data was obtained by an Empatica E4\u0000wristband, which the participants wore during waking hours for 8 weeks. The\u0000purpose of the study is to assess the feasibility of predicting the in-the-wild\u0000OCD events captured during this period. In our analysis, we aim to investigate\u0000if we can improve predictions of stress caused by OCD symptoms, and to do this\u0000we have created a pre-trained model, trained on four open-source data for\u0000stress prediction. We intend to apply this pre-trained model to the Wrist Angel\u0000data by fine-tuning, thereby utilizing transfer learning. The pre-trained model\u0000is a convolutional neural network that uses blood volume pulse, heart rate,\u0000electrodermal activity, and skin temperature as time series windows to predict\u0000OCD events. Furthermore, using accelerometer data, another model filters\u0000physical activity to further improve performance, given that physical activity\u0000is physiologically similar to stress. By evaluating various ways of applying\u0000our model (fine-tuned, non-fine-tuned, pre-trained, non-pre-trained, and with\u0000or without activity classification), we contextualize the problem such that it\u0000can be assessed if transfer learning is a viable strategy in this domain.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":"104 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138685147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Introduction to probability and statistics: a computational framework of randomness","authors":"Lakshman Mahto","doi":"arxiv-2401.08622","DOIUrl":"https://doi.org/arxiv-2401.08622","url":null,"abstract":"This text presents an unified approach of probability and statistics in the\u0000pursuit of understanding and computation of randomness in engineering or\u0000physical or social system with prediction with generalizability. Starting from\u0000elementary probability and theory of distributions, the material progresses\u0000towards conceptual and advances in prediction and generalization in statistical\u0000models and large sample theory. We also pay special attention to unified\u0000derivation approach and one-shot proof of each and every probabilistic concept.\u0000Our presentation of intuitive and computation framework of conditional\u0000distribution and probability are strongly influenced by unified patterns of\u0000linear models for regression and for classification. The text ends with a\u0000future note on the unified approximation of the linear models, the generalized\u0000linear models and the discovery models to neural networks and a summarized ML\u0000system.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":"35 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139499337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Conversation with A. Philip Dawid","authors":"Vladimir Vovk, Glenn Shafer","doi":"arxiv-2312.00632","DOIUrl":"https://doi.org/arxiv-2312.00632","url":null,"abstract":"Beginning in the 1970s, Alexander Philip Dawid has been a leading contributor\u0000to the foundations of statistics and especially to the development and\u0000application of Bayesian statistics. He is also known for his work on causality,\u0000especially his notation for conditional independence and his critique of the\u0000overuse of counterfactuals, and for his contributions to forensic statistics. Dawid was born in Lancashire, England, on February 1, 1946. His family moved\u0000to London soon afterwards, and he attended the City of London School from 1956\u0000to 1963. He studied mathematics at Cambridge, earning a BA (Bachelor of Arts)\u0000degree in 1966. After earning a Diploma in Mathematical Statistics in the\u0000academic year 1966-1967, he studied for a PhD at Imperial, then at UCL, where\u0000he became a Lecturer in Statistics in 1969. In 1978, he left UCL for a position\u0000as Professor of Statistics in the Department of Mathematics, The City\u0000University, London, where he served as Head of Statistics Section and Director\u0000of the Statistical Laboratory. He returned to the Department of Statistics at\u0000UCL in 1981, serving as Head of Department from 1983 to 1993. He moved to the\u0000University of Cambridge in 2007, becoming Professor of Statistics and Fellow of\u0000Darwin College. He has continued his work in mathematical statistics after\u0000retiring from Cambridge in 2013 and was elected Fellow of the Royal Society in\u00002018.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":"4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138526942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"edibble: An R package to encapsulate elements of experimental designs for better planning, management and workflow","authors":"Emi Tanaka","doi":"arxiv-2311.09705","DOIUrl":"https://doi.org/arxiv-2311.09705","url":null,"abstract":"I present an R package called edibble that facilitates the design of\u0000experiments by encapsulating elements of the experiment in a series of\u0000composable functions. This package is an interpretation of \"the grammar of\u0000experimental designs\" by Tanaka (2023) in the R programming language. The main\u0000features of the edibble package are demonstrated, illustrating how it can be\u0000used to create a wide array of experimental designs. The implemented system\u0000aims to encourage cognitive thinking for holistic planning and data management\u0000of experiments in a streamlined workflow. This workflow can increase the\u0000inherent value of experimental data by reducing potential errors or noise with\u0000careful preplanning, as well as, ensuring fit-for-purpose analysis of\u0000experimental data.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138526836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Directional Gaussian spatial processes for South African wind data","authors":"Jacobus S. Blom, Priyanka Nagar, Andriette Bekker","doi":"arxiv-2311.05954","DOIUrl":"https://doi.org/arxiv-2311.05954","url":null,"abstract":"Accurate wind pattern modelling is crucial for various applications,\u0000including renewable energy, agriculture, and climate adaptation. In this paper,\u0000we introduce the wrapped Gaussian spatial process (WGSP), as well as the\u0000projected Gaussian spatial process (PGSP) custom-tailored for South Africa's\u0000intricate wind behaviour. Unlike conventional models struggling with the\u0000circular nature of wind direction, the WGSP and PGSP adeptly incorporate\u0000circular statistics to address this challenge. Leveraging historical data\u0000sourced from meteorological stations throughout South Africa, the WGSP and PGSP\u0000significantly increase predictive accuracy while capturing the nuanced spatial\u0000dependencies inherent to wind patterns. The superiority of the PGSP model in\u0000capturing the structural characteristics of the South African wind data is\u0000evident. As opposed to the PGSP, the WGSP model is computationally less\u0000demanding, allows for the use of less informative priors, and its parameters\u0000are more easily interpretable. The implications of this study are far-reaching,\u0000offering potential benefits ranging from the optimisation of renewable energy\u0000systems to the informed decision-making in agriculture and climate adaptation\u0000strategies. The WGSP and PGSP emerge as robust and invaluable tools,\u0000facilitating precise modelling of wind patterns within the dynamic context of\u0000South Africa.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":"21 6","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138526868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yangdi Jiang, Xiaotian Chang, Yi Liu, Lei Ding, Linglong Kong, Bei Jiang
{"title":"Gaussian Differential Privacy on Riemannian Manifolds","authors":"Yangdi Jiang, Xiaotian Chang, Yi Liu, Lei Ding, Linglong Kong, Bei Jiang","doi":"arxiv-2311.10101","DOIUrl":"https://doi.org/arxiv-2311.10101","url":null,"abstract":"We develop an advanced approach for extending Gaussian Differential Privacy\u0000(GDP) to general Riemannian manifolds. The concept of GDP stands out as a\u0000prominent privacy definition that strongly warrants extension to manifold\u0000settings, due to its central limit properties. By harnessing the power of the\u0000renowned Bishop-Gromov theorem in geometric analysis, we propose a Riemannian\u0000Gaussian distribution that integrates the Riemannian distance, allowing us to\u0000achieve GDP in Riemannian manifolds with bounded Ricci curvature. To the best\u0000of our knowledge, this work marks the first instance of extending the GDP\u0000framework to accommodate general Riemannian manifolds, encompassing curved\u0000spaces, and circumventing the reliance on tangent space summaries. We provide a\u0000simple algorithm to evaluate the privacy budget $mu$ on any one-dimensional\u0000manifold and introduce a versatile Markov Chain Monte Carlo (MCMC)-based\u0000algorithm to calculate $mu$ on any Riemannian manifold with constant\u0000curvature. Through simulations on one of the most prevalent manifolds in\u0000statistics, the unit sphere $S^d$, we demonstrate the superior utility of our\u0000Riemannian Gaussian mechanism in comparison to the previously proposed\u0000Riemannian Laplace mechanism for implementing GDP.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138526888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christian Chan, Xiaotian Dai, Thierry Chekouo, Quan Long, Xuewen Lu
{"title":"Broken Adaptive Ridge Method for Variable Selection in Generalized Partly Linear Models with Application to the Coronary Artery Disease Data","authors":"Christian Chan, Xiaotian Dai, Thierry Chekouo, Quan Long, Xuewen Lu","doi":"arxiv-2311.00210","DOIUrl":"https://doi.org/arxiv-2311.00210","url":null,"abstract":"Motivated by the CATHGEN data, we develop a new statistical learning method\u0000for simultaneous variable selection and parameter estimation under the context\u0000of generalized partly linear models for data with high-dimensional covariates.\u0000The method is referred to as the broken adaptive ridge (BAR) estimator, which\u0000is an approximation of the $L_0$-penalized regression by iteratively performing\u0000reweighted squared $L_2$-penalized regression. The generalized partly linear\u0000model extends the generalized linear model by including a non-parametric\u0000component to construct a flexible model for modeling various types of covariate\u0000effects. We employ the Bernstein polynomials as the sieve space to approximate\u0000the non-parametric functions so that our method can be implemented easily using\u0000the existing R packages. Extensive simulation studies suggest that the proposed\u0000method performs better than other commonly used penalty-based variable\u0000selection methods. We apply the method to the CATHGEN data with a binary\u0000response from a coronary artery disease study, which motivated our research,\u0000and obtained new findings in both high-dimensional genetic and low-dimensional\u0000non-genetic covariates.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":"21 7","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138526865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}