Statistical Analysis and Data Mining: The ASA Data Science Journal最新文献_第2页

Semiparametric detection of changepoints in location, scale, and copula 半参数检测的变化点在位置，规模，和copula

Statistical Analysis and Data Mining: The ASA Data Science Journal Pub Date : 2023-04-29 DOI: 10.1002/sam.11622

Gaurav Agarwal, I. Eckley, P. Fearnhead

引用次数: 0

Association rules and decision rules 关联规则和决策规则

Statistical Analysis and Data Mining: The ASA Data Science Journal Pub Date : 2023-04-14 DOI: 10.1002/sam.11620

A. Mokkadem, M. Pelletier, Louis Raimbault

引用次数: 0

Lq regularization for fair artificial intelligence robust to covariate shift 对协变量移位具有鲁棒性的公平人工智能的Lq正则化

Statistical Analysis and Data Mining: The ASA Data Science Journal Pub Date : 2023-02-22 DOI: 10.1002/sam.11616

Seonghyeon Kim, Sara Kim, Kunwoong Kim, Yongdai Kim

{"title":"Lq regularization for fair artificial intelligence robust to covariate shift","authors":"Seonghyeon Kim, Sara Kim, Kunwoong Kim, Yongdai Kim","doi":"10.1002/sam.11616","DOIUrl":"https://doi.org/10.1002/sam.11616","url":null,"abstract":"It is well recognized that historical biases exist in training data against a certain sensitive group (e.g., non‐White, women) which are socially unacceptable, and these unfair biases are inherited in trained artificial intelligence (AI) models. Various learning algorithms have been proposed to remove or alleviate unfair biases in trained AI models. In this paper, we consider another type of bias in training data so‐called covariate shift in view of fair AI. Here, covariate shift means that training data do not represent the population of interest well. Covariate shift occurs when special sampling designs (e.g., stratified sampling) are used when collecting training data, or the population where training data are collected is different from the population of interest. When covariate shift exists, fair AI models on training data may not be fair in test data. To ensure fairness on test data, we develop computationally efficient learning algorithms robust to covariate shifts. In particular, we propose a robust fairness constraint based on the Lq norm which is a generic algorithm to be applied to various fairness AI problems without much hampering. By analyzing multiple benchmark datasets, we show that our proposed robust fairness AI algorithm improves existing fair AI algorithms much in terms of the fairness‐accuracy tradeoff to covariate shift and has significant computational advantages compared to other robust fair AI algorithms.","PeriodicalId":342679,"journal":{"name":"Statistical Analysis and Data Mining: The ASA Data Science Journal","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134429513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Buckley–James estimation of generalized additive accelerated lifetime model with ultrahigh‐dimensional data 超高维数据下广义加性加速寿命模型的Buckley-James估计

Statistical Analysis and Data Mining: The ASA Data Science Journal Pub Date : 2023-02-22 DOI: 10.1002/sam.11615

Zichang Li, Xuejing Zhao

{"title":"Buckley–James estimation of generalized additive accelerated lifetime model with ultrahigh‐dimensional data","authors":"Zichang Li, Xuejing Zhao","doi":"10.1002/sam.11615","DOIUrl":"https://doi.org/10.1002/sam.11615","url":null,"abstract":"High‐dimensional covariates in lifetime data is a challenge in survival analysis, especially in gene expression profile. The objective of this paper is to propose an efficient algorithm to extend the generalized additive model to survival data with high‐dimensional covariates. The algorithm is combined of generalized additive (GAM) model and Buckley–James estimation, which makes a nonparametric extension to the nonlinear model, where the GAM is exploited to illustrate the nonlinear effect of the covariates and the Buckley–James estimation is used to address the regression model with right‐censored response. In addition, we use maximal‐information‐coefficient (MIC)‐type variable screening and weighted p‐value to reduce dimension in high‐dimensional situations. The performance of the proposed algorithm is compared with the three benchmark models: Cox proportional hazards regression model, random survival forest, and BJ‐AFT on a simulated dataset and two real survival datasets. The results, evaluated by concordance index (C‐index) as well as modified mean squared error (mMSE), illustrated the superiority of the proposed algorithm.","PeriodicalId":342679,"journal":{"name":"Statistical Analysis and Data Mining: The ASA Data Science Journal","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133964222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Estimation of disease progression for ischemic heart disease using latent Markov with covariates 用带有协变量的潜马尔可夫估计缺血性心脏病的疾病进展

Statistical Analysis and Data Mining: The ASA Data Science Journal Pub Date : 2023-02-01 DOI: 10.1002/sam.11589

Zarina Oflaz, Ceylan Yozgatlıgil, A. S. Selcuk-Kestel

{"title":"Estimation of disease progression for ischemic heart disease using latent Markov with covariates","authors":"Zarina Oflaz, Ceylan Yozgatlıgil, A. S. Selcuk-Kestel","doi":"10.1002/sam.11589","DOIUrl":"https://doi.org/10.1002/sam.11589","url":null,"abstract":"Contemporaneous monitoring of disease progression, in addition to early diagnosis, is important for the treatment of patients with chronic conditions. Chronic disease‐related factors are not easily tractable, and the existing data sets do not clearly reflect them, making diagnosis difficult. The primary issue is that databases maintained by health care, insurance, or governmental organizations typically do not contain clinical information and instead focus on patient appointments and demographic profiles. Due to the lack of thorough information on potential risk factors for a single patient, investigations on the nature of disease are imprecise. We suggest the use of a latent Markov model with variables in a latent process because it enables the panel analysis of many forms of data. The purpose of this study is to evaluate unobserved factors in ischemic heart disease (IHD) using longitudinal data from electronic health records. Based on the results we designate states as healthy, light, moderate, and severe to represent stages of disease progression. This study demonstrates that gender, patient age, and hospital visit frequency are all significant factors in the development of the disease. Females acquire IHD more rapidly than males, frequently developing from moderate and severe disease. In addition, it demonstrates that individuals under the age of 20 bypass the light state of IHD and proceed directly to the moderate state.","PeriodicalId":342679,"journal":{"name":"Statistical Analysis and Data Mining: The ASA Data Science Journal","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114991576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Adaptive boosting for ordinal target variables using neural networks 基于神经网络的有序目标变量自适应增强

Statistical Analysis and Data Mining: The ASA Data Science Journal Pub Date : 2023-01-26 DOI: 10.1002/sam.11613

Insung Um, Geonseok Lee, K. Lee

引用次数: 0

Bilateral‐Weighted Online Adaptive Isolation Forest for anomaly detection in streaming data 用于流数据异常检测的双边加权在线自适应隔离林

Statistical Analysis and Data Mining: The ASA Data Science Journal Pub Date : 2023-01-14 DOI: 10.1002/sam.11612

Gabor Hannak, G. Horváth, Attila Kádár, Márk Dániel Szalai

引用次数: 0

Model selection with bootstrap validation 带自举验证的模型选择

Statistical Analysis and Data Mining: The ASA Data Science Journal Pub Date : 2023-01-04 DOI: 10.1002/sam.11606

Rafael Savvides, Jarmo Mäkelä, K. Puolamäki

引用次数: 1

Hierarchy‐assisted gene expression regulatory network analysis 层级辅助基因表达调控网络分析

Statistical Analysis and Data Mining: The ASA Data Science Journal Pub Date : 2023-01-04 DOI: 10.1002/sam.11609

Han Yan, Sanguo Zhang, Shuangge Ma

{"title":"Hierarchy‐assisted gene expression regulatory network analysis","authors":"Han Yan, Sanguo Zhang, Shuangge Ma","doi":"10.1002/sam.11609","DOIUrl":"https://doi.org/10.1002/sam.11609","url":null,"abstract":"Gene expressions have been extensively studied in biomedical research. With gene expression, network analysis, which takes a system perspective and examines the interconnections among genes, has been established as highly important and meaningful. In the construction of gene expression networks, a commonly adopted technique is high‐dimensional regularized regression. Network construction can be unadjusted (which focuses on gene expressions only) and adjusted (which also incorporates regulators of gene expressions), and the two types of construction have different implications and can be equally important. In this article, we propose a variable selection hierarchy to connect the unadjusted regression‐based network construction with the adjusted construction that incorporates two or more types of regulators. This hierarchy is sensible and amounts to additional information for both constructions, thus having the potential of improving variable selection and estimation. An effective computational algorithm is developed, and extensive simulation demonstrates the superiority of the proposed construction over multiple closely relevant alternatives. The analysis of TCGA data further demonstrates the practical utility of the proposed approach.","PeriodicalId":342679,"journal":{"name":"Statistical Analysis and Data Mining: The ASA Data Science Journal","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124058475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Robust deep neural network surrogate models with uncertainty quantification via adversarial training 基于对抗训练的不确定性量化鲁棒深度神经网络代理模型

Statistical Analysis and Data Mining: The ASA Data Science Journal Pub Date : 2023-01-04 DOI: 10.1002/sam.11610

Lixiang Zhang, Jia Li

{"title":"Robust deep neural network surrogate models with uncertainty quantification via adversarial training","authors":"Lixiang Zhang, Jia Li","doi":"10.1002/sam.11610","DOIUrl":"https://doi.org/10.1002/sam.11610","url":null,"abstract":"Surrogate models have been used to emulate mathematical simulators of physical or biological processes for computational efficiency. High‐speed simulation is crucial for conducting uncertainty quantification (UQ) when the simulation must repeat over many randomly sampled input points (aka the Monte Carlo method). A simulator can be so computationally intensive that UQ is only feasible with a surrogate model. Recently, deep neural network (DNN) surrogate models have gained popularity for their state‐of‐the‐art emulation accuracy. However, it is well‐known that DNN is prone to severe errors when input data are perturbed in particular ways, the very phenomenon which has inspired great interest in adversarial training. In the case of surrogate models, the concern is less about a deliberate attack exploiting the vulnerability of a DNN but more of the high sensitivity of its accuracy to input directions, an issue largely ignored by researchers using emulation models. In this paper, we show the severity of this issue through empirical studies and hypothesis testing. Furthermore, we adopt methods in adversarial training to enhance the robustness of DNN surrogate models. Experiments demonstrate that our approaches significantly improve the robustness of the surrogate models without compromising emulation accuracy.","PeriodicalId":342679,"journal":{"name":"Statistical Analysis and Data Mining: The ASA Data Science Journal","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123711917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0