arXiv - STAT - Other Statistics最新文献

筛选
英文 中文
A framework for understanding data science 了解数据科学的框架
arXiv - STAT - Other Statistics Pub Date : 2024-02-14 DOI: arxiv-2403.00776
Michael L Brodie
{"title":"A framework for understanding data science","authors":"Michael L Brodie","doi":"arxiv-2403.00776","DOIUrl":"https://doi.org/arxiv-2403.00776","url":null,"abstract":"The objective of this research is to provide a framework with which the data\u0000science community can understand, define, and develop data science as a field\u0000of inquiry. The framework is based on the classical reference framework\u0000(axiology, ontology, epistemology, methodology) used for 200 years to define\u0000knowledge discovery paradigms and disciplines in the humanities, sciences,\u0000algorithms, and now data science. I augmented it for automated problem-solving\u0000with (methods, technology, community). The resulting data science reference\u0000framework is used to define the data science knowledge discovery paradigm in\u0000terms of the philosophy of data science addressed in previous papers and the\u0000data science problem-solving paradigm, i.e., the data science method, and the\u0000data science problem-solving workflow, both addressed in this paper. The\u0000framework is a much called for unifying framework for data science as it\u0000contains the components required to define data science. For insights to better\u0000understand data science, this paper uses the framework to define the emerging,\u0000often enigmatic, data science problem-solving paradigm and workflow, and to\u0000compare them with their well-understood scientific counterparts, scientific\u0000problem-solving paradigm and workflow.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140034111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A scalable, synergy-first backbone decomposition of higher-order structures in complex systems 对复杂系统中的高阶结构进行可扩展、协同效应优先的骨干分解
arXiv - STAT - Other Statistics Pub Date : 2024-02-13 DOI: arxiv-2402.08135
Thomas F. Varley
{"title":"A scalable, synergy-first backbone decomposition of higher-order structures in complex systems","authors":"Thomas F. Varley","doi":"arxiv-2402.08135","DOIUrl":"https://doi.org/arxiv-2402.08135","url":null,"abstract":"Since its introduction in 2011, the partial information decomposition (PID)\u0000has triggered an explosion of interest in the field of multivariate information\u0000theory and the study of emergent, higher-order (\"synergistic\") interactions in\u0000complex systems. Despite its power, however, the PID has a number of\u0000limitations that restrict its general applicability: it scales poorly with\u0000system size and the standard approach to decomposition hinges on a definition\u0000of \"redundancy\", leaving synergy only vaguely defined as \"that information not\u0000redundant.\" Other heuristic measures, such as the O-information, have been\u0000introduced, although these measures typically only provided a summary statistic\u0000of redundancy/synergy dominance, rather than direct insight into the synergy\u0000itself. To address this issue, we present an alternative decomposition that is\u0000synergy-first, scales much more gracefully than the PID, and has a\u0000straightforward interpretation. Our approach defines synergy as that\u0000information in a set that would be lost following the minimally invasive\u0000perturbation on any single element. By generalizing this idea to sets of\u0000elements, we construct a totally ordered \"backbone\" of partial synergy atoms\u0000that sweeps systems scales. Our approach starts with entropy, but can be\u0000generalized to the Kullback-Leibler divergence, and by extension, to the total\u0000correlation and the single-target mutual information. Finally, we show that\u0000this approach can be used to decompose higher-order interactions beyond just\u0000information theory: we demonstrate this by showing how synergistic combinations\u0000of pairwise edges in a complex network supports signal communicability and\u0000global integration. We conclude by discussing how this perspective on\u0000synergistic structure (information-based or otherwise) can deepen our\u0000understanding of part-whole relationships in complex systems.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139764541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using Mathlink Cubes to Introduce Data Wrangling with Examples in R 使用 Mathlink 立方体介绍数据整理,附 R 示例
arXiv - STAT - Other Statistics Pub Date : 2024-02-10 DOI: arxiv-2402.07029
Lucy D'Agostino McGowan
{"title":"Using Mathlink Cubes to Introduce Data Wrangling with Examples in R","authors":"Lucy D'Agostino McGowan","doi":"arxiv-2402.07029","DOIUrl":"https://doi.org/arxiv-2402.07029","url":null,"abstract":"This paper explores an innovative approach to teaching data wrangling skills\u0000to students through hands-on activities before transitioning to coding. Data\u0000wrangling, a critical aspect of data analysis, involves cleaning, transforming,\u0000and restructuring data. We introduce the use of a physical tool, mathlink\u0000cubes, to facilitate a tangible understanding of data sets. This approach helps\u0000students grasp the concepts of data wrangling before implementing them in\u0000coding languages such as R. We detail a classroom activity that includes\u0000hands-on tasks paralleling common data wrangling processes such as filtering,\u0000selecting, and mutating, followed by their coding equivalents using R's `dplyr`\u0000package.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139764450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quantitative Analysis of AI-Generated Texts in Academic Research: A Study of AI Presence in Arxiv Submissions using AI Detection Tool 学术研究中人工智能生成文本的定量分析:使用人工智能检测工具对 Arxiv 论文中人工智能存在情况的研究
arXiv - STAT - Other Statistics Pub Date : 2024-02-09 DOI: arxiv-2403.13812
Arslan Akram
{"title":"Quantitative Analysis of AI-Generated Texts in Academic Research: A Study of AI Presence in Arxiv Submissions using AI Detection Tool","authors":"Arslan Akram","doi":"arxiv-2403.13812","DOIUrl":"https://doi.org/arxiv-2403.13812","url":null,"abstract":"Many people are interested in ChatGPT since it has become a prominent AIGC\u0000model that provides high-quality responses in various contexts, such as\u0000software development and maintenance. Misuse of ChatGPT might cause significant\u0000issues, particularly in public safety and education, despite its immense\u0000potential. The majority of researchers choose to publish their work on Arxiv.\u0000The effectiveness and originality of future work depend on the ability to\u0000detect AI components in such contributions. To address this need, this study\u0000will analyze a method that can see purposely manufactured content that academic\u0000organizations use to post on Arxiv. For this study, a dataset was created using\u0000physics, mathematics, and computer science articles. Using the newly built\u0000dataset, the following step is to put originality.ai through its paces. The\u0000statistical analysis shows that Originality.ai is very accurate, with a rate of\u000098%.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":"87 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140205779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Malaria incidence and prevalence: An ecological analysis through Six Sigma approach 疟疾发病率和流行率:通过六西格玛方法进行生态分析
arXiv - STAT - Other Statistics Pub Date : 2024-02-03 DOI: arxiv-2402.02233
Md. Al-Amin, Kesava Chandran Vijaya Bhaskar, Walaa Enab, Reza Kamali Miab, Jennifer Slavin, Nigar Sultana
{"title":"Malaria incidence and prevalence: An ecological analysis through Six Sigma approach","authors":"Md. Al-Amin, Kesava Chandran Vijaya Bhaskar, Walaa Enab, Reza Kamali Miab, Jennifer Slavin, Nigar Sultana","doi":"arxiv-2402.02233","DOIUrl":"https://doi.org/arxiv-2402.02233","url":null,"abstract":"Malaria is the leading cause of death globally, especially in sub-Saharan\u0000African countries claiming over 400,000 deaths globally each year, underscoring\u0000the critical need for continued efforts to combat this preventable and\u0000treatable disease. The objective of this study is to provide statistical\u0000guidance on the optimal preventive and control measures against malaria. Data\u0000have been collected from reliable sources, such as World Health Organization,\u0000UNICEF, Our World in Data, and STATcompiler. Data were categorized according to\u0000the factors and sub-factors related to deaths caused by malaria. These factors\u0000and sub-factors were determined based on root cause analysis and data sources.\u0000Using JMP 16 Pro software, both linear and multiple linear regression were\u0000conducted to analyze the data. The analyses aimed to establish a linear\u0000relationship between the dependent variable (malaria deaths in the overall\u0000population) and independent variables, such as life expectancy, malaria\u0000prevalence in children, net usage, indoor residual spraying usage, literate\u0000population, and population with inadequate sanitation in each selected sample\u0000country. The statistical analysis revealed that using insecticide treated nets\u0000(ITNs) by children and individuals significantly decreased the death count, as\u00001,000 individuals sleeping under ITNs could reduce the death count by eight.\u0000Based on the statistical analysis, this study suggests more rigorous research\u0000on the usage of ITNs.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139767019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Gerontologic Biostatistics 2.0: Developments over 10+ years in the age of data science 老年生物统计学 2.0:数据科学时代的 10 多年发展历程
arXiv - STAT - Other Statistics Pub Date : 2024-02-02 DOI: arxiv-2402.01112
Chixiang Chen, Michelle Shardell, Jaime Lynn Speiser, Karen Bandeen-Roche, Heather Allore, Thomas G Travison, Michael Griswold, Terrence E. Murphy
{"title":"Gerontologic Biostatistics 2.0: Developments over 10+ years in the age of data science","authors":"Chixiang Chen, Michelle Shardell, Jaime Lynn Speiser, Karen Bandeen-Roche, Heather Allore, Thomas G Travison, Michael Griswold, Terrence E. Murphy","doi":"arxiv-2402.01112","DOIUrl":"https://doi.org/arxiv-2402.01112","url":null,"abstract":"Background: Introduced in 2010, the sub-discipline of gerontologic\u0000biostatistics (GBS) was conceptualized to address the specific challenges in\u0000analyzing data from research studies involving older adults. However, the\u0000evolving technological landscape has catalyzed data science and statistical\u0000advancements since the original GBS publication, greatly expanding the scope of\u0000gerontologic research. There is a need to describe how these advancements\u0000enhance the analysis of multi-modal data and complex phenotypes that are\u0000hallmarks of gerontologic research. Methods: This paper introduces GBS 2.0, an\u0000updated and expanded set of analytical methods reflective of the practice of\u0000gerontologic biostatistics in contemporary and future research. Results: GBS\u00002.0 topics and relevant software resources include cutting-edge methods in\u0000experimental design; analytical techniques that include adaptations of machine\u0000learning, quantifying deep phenotypic measurements, high-dimensional -omics\u0000analysis; the integration of information from multiple studies, and strategies\u0000to foster reproducibility, replicability, and open science. Discussion: The\u0000methodological topics presented here seek to update and expand GBS. By\u0000facilitating the synthesis of biostatistics and data science in gerontology, we\u0000aim to foster the next generation of gerontologic researchers.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":"236 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139690246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A review of regularised estimation methods and cross-validation in spatiotemporal statistics 时空统计中的正则化估计方法和交叉验证综述
arXiv - STAT - Other Statistics Pub Date : 2024-01-31 DOI: arxiv-2402.00183
Philipp Otto, Alessandro Fassò, Paolo Maranzano
{"title":"A review of regularised estimation methods and cross-validation in spatiotemporal statistics","authors":"Philipp Otto, Alessandro Fassò, Paolo Maranzano","doi":"arxiv-2402.00183","DOIUrl":"https://doi.org/arxiv-2402.00183","url":null,"abstract":"This review article focuses on regularised estimation procedures applicable\u0000to geostatistical and spatial econometric models. These methods are\u0000particularly relevant in the case of big geospatial data for dimensionality\u0000reduction or model selection. To structure the review, we initially consider\u0000the most general case of multivariate spatiotemporal processes (i.e., $g > 1$\u0000dimensions of the spatial domain, a one-dimensional temporal domain, and $q\u0000geq 1$ random variables). Then, the idea of regularised/penalised estimation\u0000procedures and different choices of shrinkage targets are discussed. Finally,\u0000guided by the elements of a mixed-effects model, which allows for a variety of\u0000spatiotemporal models, we show different regularisation procedures and how they\u0000can be used for the analysis of geo-referenced data, e.g. for selection of\u0000relevant regressors, dimensionality reduction of the covariance matrices,\u0000detection of conditionally independent locations, or the estimation of a full\u0000spatial interaction matrix.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":"2 5 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139668263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Human-Centric and Integrative Lighting Asset Management in Public Libraries: Qualitative Insights and Challenges from a Swedish Field Study 公共图书馆以人为本的综合照明资产管理:来自瑞典实地研究的定性见解和挑战
arXiv - STAT - Other Statistics Pub Date : 2024-01-19 DOI: arxiv-2401.11000
Jing Lin, Per Olof Hedekvist, Nina Mylly, Math Bollen, Jingchun Shen, Jiawei Xiong, Christofer Silfvenius
{"title":"Human-Centric and Integrative Lighting Asset Management in Public Libraries: Qualitative Insights and Challenges from a Swedish Field Study","authors":"Jing Lin, Per Olof Hedekvist, Nina Mylly, Math Bollen, Jingchun Shen, Jiawei Xiong, Christofer Silfvenius","doi":"arxiv-2401.11000","DOIUrl":"https://doi.org/arxiv-2401.11000","url":null,"abstract":"Traditional lighting source reliability evaluations, often covering just half\u0000of a lamp's volume, can misrepresent real-world performance. To overcome these\u0000limitations,adopting advanced asset management strategies for a more holistic\u0000evaluation is crucial. This paper investigates human-centric and integrative\u0000lighting asset management in Swedish public libraries. Through field\u0000observations, interviews, and gap analysis, the study highlights a disparity\u0000between current lighting conditions and stakeholder expectations, with issues\u0000like eye strain suggesting significant improvement potential. We propose a\u0000shift towards more dynamic lighting asset management and reliability\u0000evaluations, emphasizing continuous enhancement and comprehensive training in\u0000human-centric and integrative lighting principles.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":"117 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139556295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Radius selection using kernel density estimation for the computation of nonlinear measures 利用核密度估计进行半径选择,以计算非线性测量值
arXiv - STAT - Other Statistics Pub Date : 2024-01-08 DOI: arxiv-2401.03891
Johan Medrano, Abderrahmane Kheddar, Annick Lesne, Sofiane Ramdani
{"title":"Radius selection using kernel density estimation for the computation of nonlinear measures","authors":"Johan Medrano, Abderrahmane Kheddar, Annick Lesne, Sofiane Ramdani","doi":"arxiv-2401.03891","DOIUrl":"https://doi.org/arxiv-2401.03891","url":null,"abstract":"When nonlinear measures are estimated from sampled temporal signals with\u0000finite-length, a radius parameter must be carefully selected to avoid a poor\u0000estimation. These measures are generally derived from the correlation integral\u0000which quantifies the probability of finding neighbors, i.e. pair of points\u0000spaced by less than the radius parameter. While each nonlinear measure comes\u0000with several specific empirical rules to select a radius value, we provide a\u0000systematic selection method. We show that the optimal radius for nonlinear\u0000measures can be approximated by the optimal bandwidth of a Kernel Density\u0000Estimator (KDE) related to the correlation sum. The KDE framework provides\u0000non-parametric tools to approximate a density function from finite samples\u0000(e.g. histograms) and optimal methods to select a smoothing parameter, the\u0000bandwidth (e.g. bin width in histograms). We use results from KDE to derive a\u0000closed-form expression for the optimal radius. The latter is used to compute\u0000the correlation dimension and to construct recurrence plots yielding an\u0000estimate of Kolmogorov-Sinai entropy. We assess our method through numerical\u0000experiments on signals generated by nonlinear systems and experimental\u0000electroencephalographic time series.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":"254 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139412931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quotient geometry of bounded or fixed rank correlation matrices 有界或固定秩相关矩阵的商几何
arXiv - STAT - Other Statistics Pub Date : 2024-01-06 DOI: arxiv-2401.03126
Hengchao Chen
{"title":"Quotient geometry of bounded or fixed rank correlation matrices","authors":"Hengchao Chen","doi":"arxiv-2401.03126","DOIUrl":"https://doi.org/arxiv-2401.03126","url":null,"abstract":"This paper studies the quotient geometry of bounded or fixed-rank correlation\u0000matrices. The set of bounded-rank correlation matrices is in bijection with a\u0000quotient set of a spherical product manifold by an orthogonal group. We show\u0000that it admits an orbit space structure and its stratification is determined by\u0000the rank of the matrices. Also, the principal stratum has a compatible\u0000Riemannian quotient manifold structure. We develop efficient Riemannian\u0000optimization algorithms for computing the distance and the weighted Frechet\u0000mean in the orbit space. We prove that any minimizing geodesic in the orbit\u0000space has constant rank on the interior of the segment. Moreover, we examine\u0000geometric properties of the quotient manifold, including horizontal and\u0000vertical spaces, Riemannian metric, injectivity radius, exponential and\u0000logarithmic map, gradient and Hessian.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":"16 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139412935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信