arXiv - STAT - Other Statistics最新文献_第7页

A framework for understanding data science 了解数据科学的框架

arXiv - STAT - Other Statistics Pub Date : 2024-02-14 DOI: arxiv-2403.00776

Michael L Brodie

{"title":"A framework for understanding data science","authors":"Michael L Brodie","doi":"arxiv-2403.00776","DOIUrl":"https://doi.org/arxiv-2403.00776","url":null,"abstract":"The objective of this research is to provide a framework with which the data\u0000science community can understand, define, and develop data science as a field\u0000of inquiry. The framework is based on the classical reference framework\u0000(axiology, ontology, epistemology, methodology) used for 200 years to define\u0000knowledge discovery paradigms and disciplines in the humanities, sciences,\u0000algorithms, and now data science. I augmented it for automated problem-solving\u0000with (methods, technology, community). The resulting data science reference\u0000framework is used to define the data science knowledge discovery paradigm in\u0000terms of the philosophy of data science addressed in previous papers and the\u0000data science problem-solving paradigm, i.e., the data science method, and the\u0000data science problem-solving workflow, both addressed in this paper. The\u0000framework is a much called for unifying framework for data science as it\u0000contains the components required to define data science. For insights to better\u0000understand data science, this paper uses the framework to define the emerging,\u0000often enigmatic, data science problem-solving paradigm and workflow, and to\u0000compare them with their well-understood scientific counterparts, scientific\u0000problem-solving paradigm and workflow.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140034111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A scalable, synergy-first backbone decomposition of higher-order structures in complex systems 对复杂系统中的高阶结构进行可扩展、协同效应优先的骨干分解

arXiv - STAT - Other Statistics Pub Date : 2024-02-13 DOI: arxiv-2402.08135

Thomas F. Varley

{"title":"A scalable, synergy-first backbone decomposition of higher-order structures in complex systems","authors":"Thomas F. Varley","doi":"arxiv-2402.08135","DOIUrl":"https://doi.org/arxiv-2402.08135","url":null,"abstract":"Since its introduction in 2011, the partial information decomposition (PID)\u0000has triggered an explosion of interest in the field of multivariate information\u0000theory and the study of emergent, higher-order (\"synergistic\") interactions in\u0000complex systems. Despite its power, however, the PID has a number of\u0000limitations that restrict its general applicability: it scales poorly with\u0000system size and the standard approach to decomposition hinges on a definition\u0000of \"redundancy\", leaving synergy only vaguely defined as \"that information not\u0000redundant.\" Other heuristic measures, such as the O-information, have been\u0000introduced, although these measures typically only provided a summary statistic\u0000of redundancy/synergy dominance, rather than direct insight into the synergy\u0000itself. To address this issue, we present an alternative decomposition that is\u0000synergy-first, scales much more gracefully than the PID, and has a\u0000straightforward interpretation. Our approach defines synergy as that\u0000information in a set that would be lost following the minimally invasive\u0000perturbation on any single element. By generalizing this idea to sets of\u0000elements, we construct a totally ordered \"backbone\" of partial synergy atoms\u0000that sweeps systems scales. Our approach starts with entropy, but can be\u0000generalized to the Kullback-Leibler divergence, and by extension, to the total\u0000correlation and the single-target mutual information. Finally, we show that\u0000this approach can be used to decompose higher-order interactions beyond just\u0000information theory: we demonstrate this by showing how synergistic combinations\u0000of pairwise edges in a complex network supports signal communicability and\u0000global integration. We conclude by discussing how this perspective on\u0000synergistic structure (information-based or otherwise) can deepen our\u0000understanding of part-whole relationships in complex systems.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139764541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Using Mathlink Cubes to Introduce Data Wrangling with Examples in R 使用 Mathlink 立方体介绍数据整理，附 R 示例

arXiv - STAT - Other Statistics Pub Date : 2024-02-10 DOI: arxiv-2402.07029

Lucy D'Agostino McGowan

引用次数: 0

Quantitative Analysis of AI-Generated Texts in Academic Research: A Study of AI Presence in Arxiv Submissions using AI Detection Tool 学术研究中人工智能生成文本的定量分析：使用人工智能检测工具对 Arxiv 论文中人工智能存在情况的研究

arXiv - STAT - Other Statistics Pub Date : 2024-02-09 DOI: arxiv-2403.13812

Arslan Akram

引用次数: 0

Malaria incidence and prevalence: An ecological analysis through Six Sigma approach 疟疾发病率和流行率：通过六西格玛方法进行生态分析

arXiv - STAT - Other Statistics Pub Date : 2024-02-03 DOI: arxiv-2402.02233

Md. Al-Amin, Kesava Chandran Vijaya Bhaskar, Walaa Enab, Reza Kamali Miab, Jennifer Slavin, Nigar Sultana

{"title":"Malaria incidence and prevalence: An ecological analysis through Six Sigma approach","authors":"Md. Al-Amin, Kesava Chandran Vijaya Bhaskar, Walaa Enab, Reza Kamali Miab, Jennifer Slavin, Nigar Sultana","doi":"arxiv-2402.02233","DOIUrl":"https://doi.org/arxiv-2402.02233","url":null,"abstract":"Malaria is the leading cause of death globally, especially in sub-Saharan\u0000African countries claiming over 400,000 deaths globally each year, underscoring\u0000the critical need for continued efforts to combat this preventable and\u0000treatable disease. The objective of this study is to provide statistical\u0000guidance on the optimal preventive and control measures against malaria. Data\u0000have been collected from reliable sources, such as World Health Organization,\u0000UNICEF, Our World in Data, and STATcompiler. Data were categorized according to\u0000the factors and sub-factors related to deaths caused by malaria. These factors\u0000and sub-factors were determined based on root cause analysis and data sources.\u0000Using JMP 16 Pro software, both linear and multiple linear regression were\u0000conducted to analyze the data. The analyses aimed to establish a linear\u0000relationship between the dependent variable (malaria deaths in the overall\u0000population) and independent variables, such as life expectancy, malaria\u0000prevalence in children, net usage, indoor residual spraying usage, literate\u0000population, and population with inadequate sanitation in each selected sample\u0000country. The statistical analysis revealed that using insecticide treated nets\u0000(ITNs) by children and individuals significantly decreased the death count, as\u00001,000 individuals sleeping under ITNs could reduce the death count by eight.\u0000Based on the statistical analysis, this study suggests more rigorous research\u0000on the usage of ITNs.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139767019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Gerontologic Biostatistics 2.0: Developments over 10+ years in the age of data science 老年生物统计学 2.0：数据科学时代的 10 多年发展历程

arXiv - STAT - Other Statistics Pub Date : 2024-02-02 DOI: arxiv-2402.01112

Chixiang Chen, Michelle Shardell, Jaime Lynn Speiser, Karen Bandeen-Roche, Heather Allore, Thomas G Travison, Michael Griswold, Terrence E. Murphy

{"title":"Gerontologic Biostatistics 2.0: Developments over 10+ years in the age of data science","authors":"Chixiang Chen, Michelle Shardell, Jaime Lynn Speiser, Karen Bandeen-Roche, Heather Allore, Thomas G Travison, Michael Griswold, Terrence E. Murphy","doi":"arxiv-2402.01112","DOIUrl":"https://doi.org/arxiv-2402.01112","url":null,"abstract":"Background: Introduced in 2010, the sub-discipline of gerontologic\u0000biostatistics (GBS) was conceptualized to address the specific challenges in\u0000analyzing data from research studies involving older adults. However, the\u0000evolving technological landscape has catalyzed data science and statistical\u0000advancements since the original GBS publication, greatly expanding the scope of\u0000gerontologic research. There is a need to describe how these advancements\u0000enhance the analysis of multi-modal data and complex phenotypes that are\u0000hallmarks of gerontologic research. Methods: This paper introduces GBS 2.0, an\u0000updated and expanded set of analytical methods reflective of the practice of\u0000gerontologic biostatistics in contemporary and future research. Results: GBS\u00002.0 topics and relevant software resources include cutting-edge methods in\u0000experimental design; analytical techniques that include adaptations of machine\u0000learning, quantifying deep phenotypic measurements, high-dimensional -omics\u0000analysis; the integration of information from multiple studies, and strategies\u0000to foster reproducibility, replicability, and open science. Discussion: The\u0000methodological topics presented here seek to update and expand GBS. By\u0000facilitating the synthesis of biostatistics and data science in gerontology, we\u0000aim to foster the next generation of gerontologic researchers.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":"236 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139690246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A review of regularised estimation methods and cross-validation in spatiotemporal statistics 时空统计中的正则化估计方法和交叉验证综述

arXiv - STAT - Other Statistics Pub Date : 2024-01-31 DOI: arxiv-2402.00183

Philipp Otto, Alessandro Fassò, Paolo Maranzano

引用次数: 0

Human-Centric and Integrative Lighting Asset Management in Public Libraries: Qualitative Insights and Challenges from a Swedish Field Study 公共图书馆以人为本的综合照明资产管理：来自瑞典实地研究的定性见解和挑战

arXiv - STAT - Other Statistics Pub Date : 2024-01-19 DOI: arxiv-2401.11000

Jing Lin, Per Olof Hedekvist, Nina Mylly, Math Bollen, Jingchun Shen, Jiawei Xiong, Christofer Silfvenius

引用次数: 0

Radius selection using kernel density estimation for the computation of nonlinear measures 利用核密度估计进行半径选择，以计算非线性测量值

arXiv - STAT - Other Statistics Pub Date : 2024-01-08 DOI: arxiv-2401.03891

Johan Medrano, Abderrahmane Kheddar, Annick Lesne, Sofiane Ramdani

{"title":"Radius selection using kernel density estimation for the computation of nonlinear measures","authors":"Johan Medrano, Abderrahmane Kheddar, Annick Lesne, Sofiane Ramdani","doi":"arxiv-2401.03891","DOIUrl":"https://doi.org/arxiv-2401.03891","url":null,"abstract":"When nonlinear measures are estimated from sampled temporal signals with\u0000finite-length, a radius parameter must be carefully selected to avoid a poor\u0000estimation. These measures are generally derived from the correlation integral\u0000which quantifies the probability of finding neighbors, i.e. pair of points\u0000spaced by less than the radius parameter. While each nonlinear measure comes\u0000with several specific empirical rules to select a radius value, we provide a\u0000systematic selection method. We show that the optimal radius for nonlinear\u0000measures can be approximated by the optimal bandwidth of a Kernel Density\u0000Estimator (KDE) related to the correlation sum. The KDE framework provides\u0000non-parametric tools to approximate a density function from finite samples\u0000(e.g. histograms) and optimal methods to select a smoothing parameter, the\u0000bandwidth (e.g. bin width in histograms). We use results from KDE to derive a\u0000closed-form expression for the optimal radius. The latter is used to compute\u0000the correlation dimension and to construct recurrence plots yielding an\u0000estimate of Kolmogorov-Sinai entropy. We assess our method through numerical\u0000experiments on signals generated by nonlinear systems and experimental\u0000electroencephalographic time series.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":"254 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139412931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Quotient geometry of bounded or fixed rank correlation matrices 有界或固定秩相关矩阵的商几何

arXiv - STAT - Other Statistics Pub Date : 2024-01-06 DOI: arxiv-2401.03126

Hengchao Chen

引用次数: 0