{"title":"Interpret the estimand framework from a causal inference perspective","authors":"Jinghong Zeng","doi":"arxiv-2407.00292","DOIUrl":"https://doi.org/arxiv-2407.00292","url":null,"abstract":"The estimand framework proposed by ICH in 2017 has brought fundamental\u0000changes in the pharmaceutical industry. It clearly describes how a treatment\u0000effect in a clinical question should be precisely defined and estimated,\u0000through attributes including treatments, endpoints and intercurrent events.\u0000However, ideas around the estimand framework are commonly in text, and\u0000different interpretations on this framework may exist. This article aims to\u0000interpret the estimand framework through its underlying theories, the causal\u0000inference framework based on potential outcomes. The statistical origin and\u0000formula of an estimand is given through the causal inference framework, with\u0000all attributes translated into statistical terms. How five strategies proposed\u0000by ICH to analyze intercurrent events are incorporated in the statistical\u0000formula of an estimand is described, and a new strategy to analyze intercurrent\u0000events is also suggested. The roles of target populations and analysis sets in\u0000the estimand framework are compared and discussed based on the statistical\u0000formula of an estimand. This article recommends continuing study of causal\u0000inference theories behind the estimand framework and improving the estimand\u0000framework with greater methodological comprehensibility and availability.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":"47 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141507880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fractal dimension, and the problems traps of its estimation","authors":"Carlos Sevcik","doi":"arxiv-2406.19885","DOIUrl":"https://doi.org/arxiv-2406.19885","url":null,"abstract":"This chapter deals with error and uncertainty in data. Treats their measuring\u0000methods and meaning. It shows that uncertainty is a natural property of many\u0000data sets. Uncertainty is fundamental for the survival os living species,\u0000Uncertainty of the \"chaos\" type occurs in many systems, is fundamental to\u0000understand these systems.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":"66 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141530559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Transfer Learning for High Dimensional Robust Regression","authors":"Xiaohui Yuan, Shujie Ren","doi":"arxiv-2406.17567","DOIUrl":"https://doi.org/arxiv-2406.17567","url":null,"abstract":"Transfer learning has become an essential technique for utilizing information\u0000from source datasets to improve the performance of the target task. However, in\u0000the context of high-dimensional data, heterogeneity arises due to\u0000heteroscedastic variance or inhomogeneous covariate effects. To solve this\u0000problem, this paper proposes a robust transfer learning based on the Huber\u0000regression, specifically designed for scenarios where the transferable source\u0000data set is known. This method effectively mitigates the impact of data\u0000heteroscedasticity, leading to improvements in estimation and prediction\u0000accuracy. Moreover, when the transferable source data set is unknown, the paper\u0000introduces an efficient detection algorithm to identify informative sources.\u0000The effectiveness of the proposed method is proved through numerical simulation\u0000and empirical analysis using superconductor data.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":"60 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141529733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Riddhiman Bhattacharya, Thanh Nguyen, Will Wei Sun, Mohit Tawarmalani
{"title":"Active Learning for Fair and Stable Online Allocations","authors":"Riddhiman Bhattacharya, Thanh Nguyen, Will Wei Sun, Mohit Tawarmalani","doi":"arxiv-2406.14784","DOIUrl":"https://doi.org/arxiv-2406.14784","url":null,"abstract":"We explore an active learning approach for dynamic fair resource allocation\u0000problems. Unlike previous work that assumes full feedback from all agents on\u0000their allocations, we consider feedback from a select subset of agents at each\u0000epoch of the online resource allocation process. Despite this restriction, our\u0000proposed algorithms provide regret bounds that are sub-linear in number of\u0000time-periods for various measures that include fairness metrics commonly used\u0000in resource allocation problems and stability considerations in matching\u0000mechanisms. The key insight of our algorithms lies in adaptively identifying\u0000the most informative feedback using dueling upper and lower confidence bounds.\u0000With this strategy, we show that efficient decision-making does not require\u0000extensive feedback and produces efficient outcomes for a variety of problem\u0000classes.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":"38 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141507881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Stock Volume Forecasting with Advanced Information by Conditional Variational Auto-Encoder","authors":"Parley R Yang, Alexander Y Shestopaloff","doi":"arxiv-2406.19414","DOIUrl":"https://doi.org/arxiv-2406.19414","url":null,"abstract":"We demonstrate the use of Conditional Variational Encoder (CVAE) to improve\u0000the forecasts of daily stock volume time series in both short and long term\u0000forecasting tasks, with the use of advanced information of input variables such\u0000as rebalancing dates. CVAE generates non-linear time series as out-of-sample\u0000forecasts, which have better accuracy and closer fit of correlation to the\u0000actual data, compared to traditional linear models. These generative forecasts\u0000can also be used for scenario generation, which aids interpretation. We further\u0000discuss correlations in non-stationary time series and other potential\u0000extensions from the CVAE forecasts.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":"22 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141530560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Theodoros Evrenoglou, Adriani Nikolakopoulou, Guido Schwarzer, Gerta Rücker, Anna Chaimani
{"title":"Producing treatment hierarchies in network meta-analysis using probabilistic models and treatment-choice criteria","authors":"Theodoros Evrenoglou, Adriani Nikolakopoulou, Guido Schwarzer, Gerta Rücker, Anna Chaimani","doi":"arxiv-2406.10612","DOIUrl":"https://doi.org/arxiv-2406.10612","url":null,"abstract":"A key output of network meta-analysis (NMA) is the relative ranking of the\u0000treatments; nevertheless, it has attracted a lot of criticism. This is mainly\u0000due to the fact that ranking is an influential output and prone to\u0000over-interpretations even when relative effects imply small differences between\u0000treatments. To date, common ranking methods rely on metrics that lack a\u0000straightforward interpretation, while it is still unclear how to measure their\u0000uncertainty. We introduce a novel framework for estimating treatment\u0000hierarchies in NMA. At first, we formulate a mathematical expression that\u0000defines a treatment choice criterion (TCC) based on clinically important\u0000values. This TCC is applied to the study treatment effects to generate paired\u0000data indicating treatment preferences or ties. Then, we synthesize the paired\u0000data across studies using an extension of the so-called \"Bradley-Terry\" model.\u0000We assign to each treatment a latent variable interpreted as the treatment\u0000\"ability\" and we estimate the ability parameters within a regression model.\u0000Higher ability estimates correspond to higher positions in the final ranking.\u0000We further extend our model to adjust for covariates that may affect treatment\u0000selection. We illustrate the proposed approach and compare it with alternatives\u0000in two datasets: a network comparing 18 antidepressants for major depression\u0000and a network comparing 6 antihypertensives for the incidence of diabetes. Our\u0000approach provides a robust and interpretable treatment hierarchy which accounts\u0000for clinically important values and is presented alongside with uncertainty\u0000measures. Overall, the proposed framework offers a novel approach for ranking\u0000in NMA based on concrete criteria and preserves from over-interpretation of\u0000unimportant differences between treatments.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141507882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Megha Patel, Nimish Magre, Himanshi Motwani, Nik Bear Brown
{"title":"Advances in Machine Learning, Statistical Methods, and AI for Single-Cell RNA Annotation Using Raw Count Matrices in scRNA-seq Data","authors":"Megha Patel, Nimish Magre, Himanshi Motwani, Nik Bear Brown","doi":"arxiv-2406.05258","DOIUrl":"https://doi.org/arxiv-2406.05258","url":null,"abstract":"Single-cell RNA sequencing (scRNA-seq) has revolutionized our ability to\u0000analyze gene expression at the resolution of individual cells, providing\u0000unprecedented insights into cellular heterogeneity and complex biological\u0000systems. This paper reviews various advanced computational and machine learning\u0000techniques tailored for the analysis of scRNA-seq data, emphasizing their roles\u0000in different stages of the data processing pipeline.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":"145 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141529734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bayesian Nonparametric Quasi Likelihood","authors":"Antonio R. Linero","doi":"arxiv-2405.20601","DOIUrl":"https://doi.org/arxiv-2405.20601","url":null,"abstract":"A recent trend in Bayesian research has been revisiting generalizations of\u0000the likelihood that enable Bayesian inference without requiring the\u0000specification of a model for the data generating mechanism. This paper focuses\u0000on a Bayesian nonparametric extension of Wedderburn's quasi-likelihood, using\u0000Bayesian additive regression trees to model the mean function. Here, the\u0000analyst posits only a structural relationship between the mean and variance of\u0000the outcome. We show that this approach provides a unified, computationally\u0000efficient, framework for extending Bayesian decision tree ensembles to many new\u0000settings, including simplex-valued and heavily heteroskedastic data. We also\u0000introduce Bayesian strategies for inferring the dispersion parameter of the\u0000quasi-likelihood, a task which is complicated by the fact that the\u0000quasi-likelihood itself does not contain information about this parameter;\u0000despite these challenges, we are able to inject updates for the dispersion\u0000parameter into a Markov chain Monte Carlo inference scheme in a way that, in\u0000the parametric setting, leads to a Bernstein-von Mises result for the\u0000stationary distribution of the resulting Markov chain. We illustrate the\u0000utility of our approach on a variety of both synthetic and non-synthetic\u0000datasets.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":"41 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141258299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Differentially Private Boxplots","authors":"Kelly Ramsay, Jairo Diaz-Rodriguez","doi":"arxiv-2405.20415","DOIUrl":"https://doi.org/arxiv-2405.20415","url":null,"abstract":"Despite the potential of differentially private data visualization to\u0000harmonize data analysis and privacy, research in this area remains relatively\u0000underdeveloped. Boxplots are a widely popular visualization used for\u0000summarizing a dataset and for comparison of multiple datasets. Consequentially,\u0000we introduce a differentially private boxplot. We evaluate its effectiveness\u0000for displaying location, scale, skewness and tails of a given empirical\u0000distribution. In our theoretical exposition, we show that the location and\u0000scale of the boxplot are estimated with optimal sample complexity, and the\u0000skewness and tails are estimated consistently. In simulations, we show that\u0000this boxplot performs similarly to a non-private boxplot, and it outperforms a\u0000boxplot naively constructed from existing differentially private quantile\u0000algorithms. Additionally, we conduct a real data analysis of Airbnb listings,\u0000which shows that comparable analysis can be achieved through differentially\u0000private boxplot visualization.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":"56 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141258388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Scaling up archival text analysis with the blockmodeling of n-gram networks -- A case study of Bulgaria's representation in the Osservatore Romano (January-May 1877)","authors":"Fabio Ashtar Telarico","doi":"arxiv-2405.20156","DOIUrl":"https://doi.org/arxiv-2405.20156","url":null,"abstract":"This paper seeks to bridge the gap between archival text analysis and network\u0000analysis by applying network clustering methods to analyze the coverage of\u0000Bulgaria in 123 issues of the newspaper Osservatore Romano published between\u0000January and May 1877. Utilizing optical character recognition and generalized\u0000homogeneity blockmodeling, the study constructs networks of relevant keywords.\u0000Those including the sets Bulgaria and Russia are rather isomorphic and they\u0000largely overlap with those for Germany, Britain, and War. In structural terms,\u0000the blockmodel of the two networks exhibits a clear\u0000core-semiperiphery-periphery structure that reflects relations between concepts\u0000in the newpaper's coverage. The newspaper's lexical choices effectively\u0000delegitimised the Bulgarian national revival, highlighting the influence of the\u0000Holy See on the newspaper's editorial line.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141196923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}