Alexander C. Murph, Justin D. Strait, Kelly R. Moran, Jeffrey D. Hyman, Philip H. Stauffer
{"title":"Visualisation and outlier detection for probability density function ensembles","authors":"Alexander C. Murph, Justin D. Strait, Kelly R. Moran, Jeffrey D. Hyman, Philip H. Stauffer","doi":"10.1002/sta4.662","DOIUrl":null,"url":null,"abstract":"Exploratory data analysis (EDA) for functional data—data objects where observations are entire functions—is a difficult problem that has seen significant attention in recent literature. This surge in interest is motivated by the ubiquitous nature of functional data, which are prevalent in applications across fields such as meteorology, biology, medicine and engineering. Empirical probability density functions (PDFs) can be viewed as constrained functional data objects that must integrate to one and be nonnegative. They show up in contexts such as yearly income distributions, zooplankton size structure in oceanography and in connectivity patterns in the brain, among others. While PDF data are certainly common in modern research, little attention has been given to EDA specifically for PDFs. In this paper, we extend several methods for EDA on functional data for PDFs and compare them on simulated data that exhibit different types of variation, designed to mimic that seen in real-world applications. We then use our new methods to perform EDA on the breakthrough curves observed in gas transport simulations for underground fracture networks.","PeriodicalId":0,"journal":{"name":"","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1002/sta4.662","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Exploratory data analysis (EDA) for functional data—data objects where observations are entire functions—is a difficult problem that has seen significant attention in recent literature. This surge in interest is motivated by the ubiquitous nature of functional data, which are prevalent in applications across fields such as meteorology, biology, medicine and engineering. Empirical probability density functions (PDFs) can be viewed as constrained functional data objects that must integrate to one and be nonnegative. They show up in contexts such as yearly income distributions, zooplankton size structure in oceanography and in connectivity patterns in the brain, among others. While PDF data are certainly common in modern research, little attention has been given to EDA specifically for PDFs. In this paper, we extend several methods for EDA on functional data for PDFs and compare them on simulated data that exhibit different types of variation, designed to mimic that seen in real-world applications. We then use our new methods to perform EDA on the breakthrough curves observed in gas transport simulations for underground fracture networks.
函数数据的探索性数据分析(EDA)--观测值是整个函数的数据对象--是一个难题,最近的文献对此给予了极大关注。函数数据无处不在,在气象学、生物学、医学和工程学等领域的应用中十分普遍,因此人们对函数数据的兴趣大增。经验概率密度函数(PDF)可视为受约束的函数数据对象,必须积分为一且为非负。它们出现在年收入分布、海洋学中浮游动物的大小结构和大脑的连接模式等方面。虽然 PDF 数据在现代研究中很常见,但很少有人关注专门针对 PDF 的 EDA。在本文中,我们扩展了几种针对 PDF 函数数据的 EDA 方法,并在模拟数据上对这些方法进行了比较,模拟数据表现出不同类型的变化,旨在模拟真实世界中的应用。然后,我们使用新方法对地下断裂网络气体输送模拟中观察到的突破曲线进行 EDA。