{"title":"Nested Fusion: A Method for Learning High Resolution Latent Structure of Multi-Scale Measurement Data on Mars","authors":"Austin P. Wright, Scott Davidoff, Duen Horng Chau","doi":"arxiv-2409.05874","DOIUrl":"https://doi.org/arxiv-2409.05874","url":null,"abstract":"The Mars Perseverance Rover represents a generational change in the scale of\u0000measurements that can be taken on Mars, however this increased resolution\u0000introduces new challenges for techniques in exploratory data analysis. The\u0000multiple different instruments on the rover each measures specific properties\u0000of interest to scientists, so analyzing how underlying phenomena affect\u0000multiple different instruments together is important to understand the full\u0000picture. However each instrument has a unique resolution, making the mapping\u0000between overlapping layers of data non-trivial. In this work, we introduce\u0000Nested Fusion, a method to combine arbitrarily layered datasets of different\u0000resolutions and produce a latent distribution at the highest possible\u0000resolution, encoding complex interrelationships between different measurements\u0000and scales. Our method is efficient for large datasets, can perform inference\u0000even on unseen data, and outperforms existing methods of dimensionality\u0000reduction and latent analysis on real-world Mars rover data. We have deployed\u0000our method Nested Fusion within a Mars science team at NASA Jet Propulsion\u0000Laboratory (JPL) and through multiple rounds of participatory design enabled\u0000greatly enhanced exploratory analysis workflows for real scientists. To ensure\u0000the reproducibility of our work we have open sourced our code on GitHub at\u0000https://github.com/pixlise/NestedFusion.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":"184 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142224458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qiquan Wang, Anna Song, Antoniana Batsivari, Dominique Bonnet, Anthea Monod
{"title":"A Topological Gaussian Mixture Model for Bone Marrow Morphology in Leukaemia","authors":"Qiquan Wang, Anna Song, Antoniana Batsivari, Dominique Bonnet, Anthea Monod","doi":"arxiv-2408.13685","DOIUrl":"https://doi.org/arxiv-2408.13685","url":null,"abstract":"Acute myeloid leukaemia (AML) is a type of blood and bone marrow cancer\u0000characterized by the proliferation of abnormal clonal haematopoietic cells in\u0000the bone marrow leading to bone marrow failure. Over the course of the disease,\u0000angiogenic factors released by leukaemic cells drastically alter the bone\u0000marrow vascular niches resulting in observable structural abnormalities. We use\u0000a technique from topological data analysis - persistent homology - to quantify\u0000the images and infer on the disease through the imaged morphological features.\u0000We find that persistent homology uncovers succinct dissimilarities between the\u0000control, early, and late stages of AML development. We then integrate\u0000persistent homology into stage-dependent Gaussian mixture models for the first\u0000time, proposing a new class of models which are applicable to persistent\u0000homology summaries and able to both infer patterns in morphological changes\u0000between different stages of progression as well as provide a basis for\u0000prediction.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":"60 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142187948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"When is truncated stop loss optimal?","authors":"Erik Bølviken, Yinzhi Wang","doi":"arxiv-2408.12933","DOIUrl":"https://doi.org/arxiv-2408.12933","url":null,"abstract":"The paper examines how reinsurance can be used to strike a balance between\u0000expected profit and VaR/CVaR risk. Conditions making truncated stop loss\u0000contracts optimal are derived, and it is argued that those are usually\u0000satisfied in practice. One of the prerequisites is that reinsurance is not too\u0000cheap, and an argument resembling arbitrage suggests that it is not.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":"38 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142187951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Broad versus narrow research questions in evidence synthesis: a parallel to (and plea for) estimands","authors":"Antonio Remiro-Azócar, Anders Gorst-Rasmussen","doi":"arxiv-2408.12932","DOIUrl":"https://doi.org/arxiv-2408.12932","url":null,"abstract":"There has been a transition from broad to more specific research questions in\u0000the practice of network meta-analysis (NMA). Such convergence is also taking\u0000place in the context of individual registrational trials, following the recent\u0000introduction of the estimand framework, which is impacting the design, data\u0000collection strategy, analysis and interpretation of clinical trials. The\u0000language of estimands has much to offer to NMA, particularly given the \"narrow\"\u0000perspective of treatments and target populations taken in health technology\u0000assessment.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":"7 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142187954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andrius Burnelis, Vojta Kejzlar, Daniel R. Phillips
{"title":"Variational inference of effective range parameters for ${}^3$He-${}^4$He scattering","authors":"Andrius Burnelis, Vojta Kejzlar, Daniel R. Phillips","doi":"arxiv-2408.13250","DOIUrl":"https://doi.org/arxiv-2408.13250","url":null,"abstract":"We use two different methods, Monte Carlo sampling and variational inference\u0000(VI), to perform a Bayesian calibration of the effective-range parameters in\u0000${}^3$He-${}^4$He elastic scattering. The parameters are calibrated to data\u0000from a recent set of $^{3}$He-${}^4$He elastic scattering differential cross\u0000section measurements. Analysis of these data for $E_{rm lab} leq 4.3$ MeV\u0000yields a unimodal posterior for which both methods obtain the same structure.\u0000However, the effective-range expansion amplitude does not account for the\u0000$7/2^-$ state of ${}^7$Be so, even after calibration, the description of data\u0000at the upper end of this energy range is poor. The data up to $E_{rm lab}=2.6$\u0000MeV can be well described, but calibration to this lower-energy subset of the\u0000data yields a bimodal posterior. After adapting VI to treat such a multi-modal\u0000posterior we find good agreement between the VI results and those obtained with\u0000parallel-tempered Monte Carlo sampling.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":"63 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142187959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Causal Hierarchy in the Financial Market Network -- Uncovered by the Helmholtz-Hodge-Kodaira Decomposition","authors":"Tobias Wand, Oliver Kamps, Hiroshi Iyetomi","doi":"arxiv-2408.12839","DOIUrl":"https://doi.org/arxiv-2408.12839","url":null,"abstract":"Granger causality can uncover the cause and effect relationships in financial\u0000networks. However, such networks can be convoluted and difficult to interpret,\u0000but the Helmholtz-Hodge-Kodaira decomposition can split them into a rotational\u0000and gradient component which reveals the hierarchy of Granger causality flow.\u0000Using Kenneth French's business sector return time series, it is revealed that\u0000during the Covid crisis, precious metals and pharmaceutical products are causal\u0000drivers of the financial network. Moreover, the estimated Granger causality\u0000network shows a high connectivity during crisis which means that the research\u0000presented here can be especially useful to better understand crises in the\u0000market by revealing the dominant drivers of the crisis dynamics.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":"7 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142187955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Multivariate Space-Time Dynamic Model for Characterizing the Atmospheric Impacts Following the Mt Pinatubo Eruptio","authors":"Robert Garrett, Lyndsay Shand, J. Gabriel Huerta","doi":"arxiv-2408.13392","DOIUrl":"https://doi.org/arxiv-2408.13392","url":null,"abstract":"The June 1991 Mt. Pinatubo eruption resulted in a massive increase of sulfate\u0000aerosols in the atmosphere, absorbing radiation and leading to global changes\u0000in surface and stratospheric temperatures. A volcanic eruption of this\u0000magnitude serves as a natural analog for stratospheric aerosol injection, a\u0000proposed solar radiation modification method to combat the warming climate. The\u0000impacts of such an event are multifaceted and region-specific. Our goal is to\u0000characterize the multivariate and dynamic nature of the climate impacts\u0000following the Mt. Pinatubo eruption. We developed a multivariate space-time\u0000dynamic linear model to understand the full extent of the spatially- and\u0000temporally-varying impacts. Specifically, spatial variation is modeled using a\u0000flexible set of basis functions for which the basis coefficients are allowed to\u0000vary in time through a vector autoregressive (VAR) structure. This novel model\u0000is caste in a Dynamic Linear Model (DLM) framework and estimated via a\u0000customized MCMC approach. We demonstrate how the model quantifies the\u0000relationships between key atmospheric parameters following the Mt. Pinatubo\u0000eruption with reanalysis data from MERRA-2 and highlight when such model is\u0000advantageous over univariate models.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":"26 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142187949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Web-Based Solution for Federated Learning with LLM-Based Automation","authors":"Chamith Mawela, Chaouki Ben Issaid, Mehdi Bennis","doi":"arxiv-2408.13010","DOIUrl":"https://doi.org/arxiv-2408.13010","url":null,"abstract":"Federated Learning (FL) offers a promising approach for collaborative machine\u0000learning across distributed devices. However, its adoption is hindered by the\u0000complexity of building reliable communication architectures and the need for\u0000expertise in both machine learning and network programming. This paper presents\u0000a comprehensive solution that simplifies the orchestration of FL tasks while\u0000integrating intent-based automation. We develop a user-friendly web application\u0000supporting the federated averaging (FedAvg) algorithm, enabling users to\u0000configure parameters through an intuitive interface. The backend solution\u0000efficiently manages communication between the parameter server and edge nodes.\u0000We also implement model compression and scheduling algorithms to optimize FL\u0000performance. Furthermore, we explore intent-based automation in FL using a\u0000fine-tuned Language Model (LLM) trained on a tailored dataset, allowing users\u0000to conduct FL tasks using high-level prompts. We observe that the LLM-based\u0000automated solution achieves comparable test accuracy to the standard web-based\u0000solution while reducing transferred bytes by up to 64% and CPU time by up to\u000046% for FL tasks. Also, we leverage the neural architecture search (NAS) and\u0000hyperparameter optimization (HPO) using LLM to improve the performance. We\u0000observe that by using this approach test accuracy can be improved by 10-20% for\u0000the carried out FL tasks.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142224457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gabrielle Thivierge, Aaron Rumack, F. William Townes
{"title":"Does Spatial Information Improve Influenza Forecasting?","authors":"Gabrielle Thivierge, Aaron Rumack, F. William Townes","doi":"arxiv-2408.12722","DOIUrl":"https://doi.org/arxiv-2408.12722","url":null,"abstract":"Seasonal influenza forecasting is critical for public health and individual\u0000decision making. We investigate whether the inclusion of data about influenza\u0000activity in neighboring states can improve point predictions and distribution\u0000forecasting of influenza-like illness (ILI) in each US state using statistical\u0000regression models. Using CDC FluView ILI data from 2010-2019, we forecast\u0000weekly ILI in each US state with quantile, linear, and Poisson autoregressive\u0000models fit using different combinations of ILI data from the target state,\u0000neighboring states, and US weighted average. Scoring with root mean squared\u0000error and weighted interval score indicated that the variants including\u0000neighbors and/or the US average showed slightly higher accuracy than models fit\u0000only using lagged ILI in the target state, on average. Additionally, the\u0000improvement in performance when including neighbors was similar to the\u0000improvement when including the US average instead, suggesting the proximity of\u0000the neighboring states is not the driver of the slight increase in accuracy.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142187953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evaluating Four Methods for Detecting Differential Item Functioning in Large-Scale Assessments with More Than Two Groups","authors":"Dandan Chen Kaptur, Jinming Zhang","doi":"arxiv-2408.11922","DOIUrl":"https://doi.org/arxiv-2408.11922","url":null,"abstract":"This study evaluated four multi-group differential item functioning (DIF)\u0000methods (the root mean square deviation approach, Wald-1, generalized logistic\u0000regression procedure, and generalized Mantel-Haenszel method) via Monte Carlo\u0000simulation of controlled testing conditions. These conditions varied in the\u0000number of groups, the ability and sample size of the DIF-contaminated group,\u0000the parameter associated with DIF, and the proportion of DIF items. When\u0000comparing Type-I error rates and powers of the methods, we showed that the RMSD\u0000approach yielded the best Type-I error rates when it was used with\u0000model-predicted cutoff values. Also, this approach was found to be overly\u0000conservative when used with the commonly used cutoff value of 0.1. Implications\u0000for future research for educational researchers and practitioners were\u0000discussed.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142187957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}