Statistical Analysis and Data Mining最新文献

筛选
英文 中文
Sequential metamodel‐based approaches to level‐set estimation under heteroscedasticity 基于序列元模型的异方差下水平集估计方法
IF 1.3 4区 数学
Statistical Analysis and Data Mining Pub Date : 2024-05-29 DOI: 10.1002/sam.11697
Yutong Zhang, Xi Chen
{"title":"Sequential metamodel‐based approaches to level‐set estimation under heteroscedasticity","authors":"Yutong Zhang, Xi Chen","doi":"10.1002/sam.11697","DOIUrl":"https://doi.org/10.1002/sam.11697","url":null,"abstract":"This paper proposes two sequential metamodel‐based methods for level‐set estimation (LSE) that leverage the uniform bound built on stochastic kriging: predictive variance reduction (PVR) and expected classification improvement (ECI). We show that PVR and ECI possess desirable theoretical performance guarantees and provide closed‐form expressions for their respective sequential sampling criteria to seek the next design point for performing simulation runs, allowing computationally efficient one‐iteration look‐ahead updates. To enhance understanding, we reveal the connection between PVR and ECI's sequential sampling criteria. Additionally, we propose integrating a budget allocation feature with PVR and ECI, which improves computational efficiency and potentially enhances robustness to the impacts of heteroscedasticity. Numerical studies demonstrate the superior performance of the proposed methods compared to state‐of‐the‐art benchmarking approaches when given a fixed simulation budget, highlighting their effectiveness in addressing LSE problems.","PeriodicalId":48684,"journal":{"name":"Statistical Analysis and Data Mining","volume":"88 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141188851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards accelerating particle‐resolved direct numerical simulation with neural operators 利用神经算子加速粒子分辨直接数值模拟
IF 1.3 4区 数学
Statistical Analysis and Data Mining Pub Date : 2024-05-29 DOI: 10.1002/sam.11690
Mohammad Atif, Vanessa López‐Marrero, Tao Zhang, Abdullah Al Muti Sharfuddin, Kwangmin Yu, Jiaqi Yang, Fan Yang, Foluso Ladeinde, Yangang Liu, Meifeng Lin, Lingda Li
{"title":"Towards accelerating particle‐resolved direct numerical simulation with neural operators","authors":"Mohammad Atif, Vanessa López‐Marrero, Tao Zhang, Abdullah Al Muti Sharfuddin, Kwangmin Yu, Jiaqi Yang, Fan Yang, Foluso Ladeinde, Yangang Liu, Meifeng Lin, Lingda Li","doi":"10.1002/sam.11690","DOIUrl":"https://doi.org/10.1002/sam.11690","url":null,"abstract":"We present our ongoing work aimed at accelerating a particle‐resolved direct numerical simulation model designed to study aerosol–cloud–turbulence interactions. The dynamical model consists of two main components—a set of fluid dynamics equations for air velocity, temperature, and humidity, coupled with a set of equations for particle (i.e., cloud droplet) tracing. Rather than attempting to replace the original numerical solution method in its entirety with a machine learning (ML) method, we consider developing a hybrid approach. We exploit the potential of neural operator learning to yield fast and accurate surrogate models and, in this study, develop such surrogates for the velocity and vorticity fields. We discuss results from numerical experiments designed to assess the performance of ML architectures under consideration as well as their suitability for capturing the behavior of relevant dynamical systems.","PeriodicalId":48684,"journal":{"name":"Statistical Analysis and Data Mining","volume":"41 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141188840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Nonparametric mean and variance adaptive classification rule for high‐dimensional data with heteroscedastic variances 具有异方差的高维数据的非参数均值和方差自适应分类规则
IF 1.3 4区 数学
Statistical Analysis and Data Mining Pub Date : 2024-05-20 DOI: 10.1002/sam.11689
Seungyeon Oh, Hoyoung Park
{"title":"Nonparametric mean and variance adaptive classification rule for high‐dimensional data with heteroscedastic variances","authors":"Seungyeon Oh, Hoyoung Park","doi":"10.1002/sam.11689","DOIUrl":"https://doi.org/10.1002/sam.11689","url":null,"abstract":"In this study, we introduce an innovative methodology aimed at enhancing Fisher's Linear Discriminant Analysis (LDA) in the context of high‐dimensional data classification scenarios, specifically addressing situations where each feature exhibits distinct variances. Our approach leverages Nonparametric Maximum Likelihood Estimation (NPMLE) techniques to estimate both the mean and variance parameters. By accommodating varying variances among features, our proposed method leads to notable improvements in classification performance. In particular, unlike numerous prior studies that assume the distribution of heterogeneous variances follows a right‐skewed inverse gamma distribution, our proposed method demonstrates excellent performance even when the distribution of heterogeneous variances takes on left‐skewed, symmetric, or right‐skewed forms. We conducted a series of rigorous experiments to empirically validate the effectiveness of our approach. The results of these experiments demonstrate that our proposed methodology excels in accurately classifying high‐dimensional data characterized by heterogeneous variances.","PeriodicalId":48684,"journal":{"name":"Statistical Analysis and Data Mining","volume":"20 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141147814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semiparametric estimation of average treatment effects in observational studies 观察性研究中平均治疗效果的半参数估计
IF 1.3 4区 数学
Statistical Analysis and Data Mining Pub Date : 2024-05-18 DOI: 10.1002/sam.11688
Jun Wang, Yujiao Guo
{"title":"Semiparametric estimation of average treatment effects in observational studies","authors":"Jun Wang, Yujiao Guo","doi":"10.1002/sam.11688","DOIUrl":"https://doi.org/10.1002/sam.11688","url":null,"abstract":"We propose a semiparametric method to estimate average treatment effects in observational studies based on the assumption of unconfoundedness. Assume that the propensity score model and outcome model are a general single index model, which are estimated by the kernel method and the unknown index parameter is estimated via linearized maximum rank correlation method. The proposed estimator is computationally tractable, allows for large dimension covariates and not involves the approximation of link functions. We showed that the proposed estimator is consistent and asymptotically normally distributed. In general, the proposed estimator is superior to existing methods when the model is incorrectly specified. We also provide an empirical analysis on the average treatment effect and average treatment effect on the treated of 401(k) eligibility on net financial assets.","PeriodicalId":48684,"journal":{"name":"Statistical Analysis and Data Mining","volume":"133 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141062785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prior effective sample size for exponential family distributions with multiple parameters 多参数指数族分布的先验有效样本量
IF 1.3 4区 数学
Statistical Analysis and Data Mining Pub Date : 2024-05-09 DOI: 10.1002/sam.11685
Ryota Tamanoi
{"title":"Prior effective sample size for exponential family distributions with multiple parameters","authors":"Ryota Tamanoi","doi":"10.1002/sam.11685","DOIUrl":"https://doi.org/10.1002/sam.11685","url":null,"abstract":"The setting of priors is an important issue in Bayesian analysis. In particular, when external information is applied, a prior with too much information can dominate the posterior inferences. To prevent this effect, the effective sample size (ESS) can be used. Various ESSs have been proposed recently; however, all have the problem of limiting the applicable prior distributions. For example, one ESS can only be used with a prior that can be approximated by a normal distribution, and another ESS cannot be applied when the parameters are multidimensional. We propose an ESS to be applied to more prior distributions when the sampling model belongs to an exponential family (including the normal model and logistic regression models). This ESS has the predictive consistency and can be used with multidimensional parameters. It is confirmed from normally distributed data with the Student's‐<jats:italic>t</jats:italic> priors that this ESS behaves as well as an existing predictively consistent ESS for one‐parameter exponential families. As examples of multivariate parameters, ESSs for linear and logistic regression models are also discussed.","PeriodicalId":48684,"journal":{"name":"Statistical Analysis and Data Mining","volume":"16 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140933289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Density estimation via measure transport: Outlook for applications in the biological sciences 通过测量传输进行密度估算:生物科学应用前景
IF 1.3 4区 数学
Statistical Analysis and Data Mining Pub Date : 2024-05-04 DOI: 10.1002/sam.11687
Vanessa López‐Marrero, Patrick R. Johnstone, Gilchan Park, Xihaier Luo
{"title":"Density estimation via measure transport: Outlook for applications in the biological sciences","authors":"Vanessa López‐Marrero, Patrick R. Johnstone, Gilchan Park, Xihaier Luo","doi":"10.1002/sam.11687","DOIUrl":"https://doi.org/10.1002/sam.11687","url":null,"abstract":"One among several advantages of measure transport methods is that they allow or a unified framework for processing and analysis of data distributed according to a wide class of probability measures. Within this context, we present results from computational studies aimed at assessing the potential of measure transport techniques, specifically, the use of triangular transport maps, as part of a workflow intended to support research in the biological sciences. Scenarios characterized by the availability of limited amount of sample data, which are common in domains such as radiation biology, are of particular interest. We find that when estimating a distribution density function given limited amount of sample data, adaptive transport maps are advantageous. In particular, statistics gathered from computing series of adaptive transport maps, trained on a series of randomly chosen subsets of the set of available data samples, leads to uncovering information hidden in the data. As a result, in the radiation biology application considered here, this approach provides a tool for generating hypotheses about gene relationships and their dynamics under radiation exposure.","PeriodicalId":48684,"journal":{"name":"Statistical Analysis and Data Mining","volume":"10 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140839558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Individualized image region detection with total variation 具有总体变化的个性化图像区域检测
IF 1.3 4区 数学
Statistical Analysis and Data Mining Pub Date : 2024-05-01 DOI: 10.1002/sam.11684
Sanyou Wu, Fuying Wang, Long Feng
{"title":"Individualized image region detection with total variation","authors":"Sanyou Wu, Fuying Wang, Long Feng","doi":"10.1002/sam.11684","DOIUrl":"https://doi.org/10.1002/sam.11684","url":null,"abstract":"Medical image data have emerged to be an indispensable component of modern medicine. Different from many general image problems that focus on outcome prediction or image recognition, medical image analysis pays more attention to model interpretation. For instance, given a list of medical images and corresponding labels of patients' health status, it is often of greater importance to identify the image regions that could differentiate the outcome status, compared to simply predicting labels of new images. Moreover, medical image data often demonstrate strong individual heterogeneity. In other words, the image regions associated with an outcome could be different across patients. As a consequence, the traditional one‐model‐fits‐all approach not only omits patient heterogeneity but also possibly leads to misleading or even wrong conclusions. In this article, we introduce a novel statistical framework to detect individualized regions that are associated with a binary outcome, that is, whether a patient has a certain disease or not. Moreover, we propose a total variation‐based penalization for individualized image region detection under a local label‐free scenario. Considering that local labeling is often difficult to obtain for medical image data, our approach may potentially have a wider range of applications in medical research. The effectiveness of our proposed approach is validated by two real histopathology databases: Colon Cancer and Camelyon16.","PeriodicalId":48684,"journal":{"name":"Statistical Analysis and Data Mining","volume":"105 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140839503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The analysis of association rules: Latent class analysis 关联规则分析潜类分析
IF 1.3 4区 数学
Statistical Analysis and Data Mining Pub Date : 2024-05-01 DOI: 10.1002/sam.11686
Ron S. Kenett, Chris Gotwalt
{"title":"The analysis of association rules: Latent class analysis","authors":"Ron S. Kenett, Chris Gotwalt","doi":"10.1002/sam.11686","DOIUrl":"https://doi.org/10.1002/sam.11686","url":null,"abstract":"Association rules are used to extract information from transactional databases with a collection of items also called “tokens” or “words.” The aim of association rule analysis is to indicate what and how items go with what items in a set of transactions called “documents.” This approach is used in the analysis of text records, of blogs in social media and of shopping baskets. We present here an approach to analyze documents using latent class analysis (LCA) clustering of document term matrices. A document term matrix (DTM) consists of rows referring to documents and columns corresponding to items. In binary weights, “1” indicates the presence of a term in a document and “0” otherwise. The clustering of similar documents provides stratified data sets used to enhance the interpretability of measures of interest such as lift, odds ratios and relative linkage disequilibrium. The article demonstrates the approach with two case studies. A first example consists of comments recorded in a survey aimed at pet owners. A second, much larger example, is based on online reviews to crocs sandals. Association rules describe combinations of terms in the pet survey and crocs reviews. In Section 3, we compute, for these case studies, association rule measures of interest defined in Section 2. We first introduce the case studies to motivate the methods proposed here. In Section 4, we provide a new approach with an enhanced interpretations of measures such as lift by comparing them across clusters derived from an LCA of the DTM. A key result is the application of clustered data in analyzing observational data. This enhances generalizability and interpretability of findings from text analytics. The article concludes with a discussion in Section 5.","PeriodicalId":48684,"journal":{"name":"Statistical Analysis and Data Mining","volume":"104 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140839390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian relative composite quantile regression approach of ordinal latent regression model with L1/2 regularization 具有 L1/2 正则化的贝叶斯相对复合量回归方法的序潜回归模型
IF 1.3 4区 数学
Statistical Analysis and Data Mining Pub Date : 2024-04-15 DOI: 10.1002/sam.11683
Tian Yu-Zhu, Wu Chun-Ho, Tai Ling-Nan, Mian Zhi-Bao, Tian Mao-Zai
{"title":"Bayesian relative composite quantile regression approach of ordinal latent regression model with L1/2 regularization","authors":"Tian Yu-Zhu, Wu Chun-Ho, Tai Ling-Nan, Mian Zhi-Bao, Tian Mao-Zai","doi":"10.1002/sam.11683","DOIUrl":"https://doi.org/10.1002/sam.11683","url":null,"abstract":"Ordinal data frequently occur in various fields such as knowledge level assessment, credit rating, clinical disease diagnosis, and psychological evaluation. The classic models including cumulative logistic regression or probit regression are often used to model such ordinal data. But these modeling approaches conditionally depict the mean characteristic of response variable on a cluster of predictive variables, which often results in non-robust estimation results. As a considerable alternative, composite quantile regression (CQR) approach is usually employed to gain more robust and relatively efficient results. In this paper, we propose a Bayesian CQR modeling approach for ordinal latent regression model. In order to overcome the recognizability problem of the considered model and obtain more robust estimation results, we advocate to using the Bayesian relative CQR approach to estimate regression parameters. Additionally, in regression modeling, it is a highly desirable task to obtain a parsimonious model that retains only important covariates. We incorporate the Bayesian <span data-altimg=\"/cms/asset/27e745bc-8e93-4391-8ba3-d551069a4246/sam11683-math-0003.png\"></span><math altimg=\"urn:x-wiley:19321864:media:sam11683:sam11683-math-0003\" display=\"inline\" location=\"graphic/sam11683-math-0003.png\" overflow=\"scroll\">\u0000<semantics>\u0000<mrow>\u0000<msub>\u0000<mi>L</mi>\u0000<mrow>\u0000<mn>1</mn>\u0000<mo stretchy=\"false\">/</mo>\u0000<mn>2</mn>\u0000</mrow>\u0000</msub>\u0000</mrow>\u0000$$ {L}_{1/2} $$</annotation>\u0000</semantics></math> penalty into the ordinal latent CQR regression model to simultaneously conduct parameter estimation and variable selection. Finally, the proposed Bayesian relative CQR approach is illustrated by Monte Carlo simulations and a real data application. Simulation results and real data examples show that the suggested Bayesian relative CQR approach has good performance for the ordinal regression models.","PeriodicalId":48684,"journal":{"name":"Statistical Analysis and Data Mining","volume":"207 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140593189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transfer learning under the Cox model with interval‐censored data 考克斯模型下的转移学习与区间删失数据
IF 1.3 4区 数学
Statistical Analysis and Data Mining Pub Date : 2024-04-10 DOI: 10.1002/sam.11680
Mengqi Xie, Tao Hu, Jie Zhou
{"title":"Transfer learning under the Cox model with interval‐censored data","authors":"Mengqi Xie, Tao Hu, Jie Zhou","doi":"10.1002/sam.11680","DOIUrl":"https://doi.org/10.1002/sam.11680","url":null,"abstract":"Transfer learning, focusing on information borrowing to address limited sample size issues, has gained increasing attention in recent years. Our method aims to utilize data from other population groups as a complement to enhance risk factor discernment and failure time prediction among underrepresented subgroups. However, a literature gap exists in effective knowledge transfer from the source to the target for risk assessment with interval‐censored data while accommodating population incomparability and privacy constraints. Our objective is to bridge this gap by developing a transfer learning approach under the Cox proportional hazards model. We introduce the tuning‐free Trans‐Cox‐MIC algorithm, enabling adaptable information sharing in regression coefficients and baseline hazards, while ensuring computational efficiency. Our approach accommodates covariate distribution shifts, coefficient variations, and baseline hazard discrepancies. Extensive simulations showcase the method's accuracy, robustness, and efficiency. Application to the prostate cancer screening data demonstrates enhanced risk estimation precision and predictive performance in the African American population.","PeriodicalId":48684,"journal":{"name":"Statistical Analysis and Data Mining","volume":"58 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140593188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信
小红书