Statistical Analysis and Data Mining最新文献_第2页

Semiparametric estimation of average treatment effects in observational studies 观察性研究中平均治疗效果的半参数估计

IF 1.3 4区数学

Statistical Analysis and Data Mining Pub Date : 2024-05-18 DOI: 10.1002/sam.11688

Jun Wang, Yujiao Guo

引用次数: 0

Prior effective sample size for exponential family distributions with multiple parameters 多参数指数族分布的先验有效样本量

IF 1.3 4区数学

Statistical Analysis and Data Mining Pub Date : 2024-05-09 DOI: 10.1002/sam.11685

Ryota Tamanoi

引用次数: 0

Density estimation via measure transport: Outlook for applications in the biological sciences 通过测量传输进行密度估算：生物科学应用前景

IF 1.3 4区数学

Statistical Analysis and Data Mining Pub Date : 2024-05-04 DOI: 10.1002/sam.11687

Vanessa López‐Marrero, Patrick R. Johnstone, Gilchan Park, Xihaier Luo

{"title":"Density estimation via measure transport: Outlook for applications in the biological sciences","authors":"Vanessa López‐Marrero, Patrick R. Johnstone, Gilchan Park, Xihaier Luo","doi":"10.1002/sam.11687","DOIUrl":"https://doi.org/10.1002/sam.11687","url":null,"abstract":"One among several advantages of measure transport methods is that they allow or a unified framework for processing and analysis of data distributed according to a wide class of probability measures. Within this context, we present results from computational studies aimed at assessing the potential of measure transport techniques, specifically, the use of triangular transport maps, as part of a workflow intended to support research in the biological sciences. Scenarios characterized by the availability of limited amount of sample data, which are common in domains such as radiation biology, are of particular interest. We find that when estimating a distribution density function given limited amount of sample data, adaptive transport maps are advantageous. In particular, statistics gathered from computing series of adaptive transport maps, trained on a series of randomly chosen subsets of the set of available data samples, leads to uncovering information hidden in the data. As a result, in the radiation biology application considered here, this approach provides a tool for generating hypotheses about gene relationships and their dynamics under radiation exposure.","PeriodicalId":48684,"journal":{"name":"Statistical Analysis and Data Mining","volume":"10 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140839558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Individualized image region detection with total variation 具有总体变化的个性化图像区域检测

IF 1.3 4区数学

Statistical Analysis and Data Mining Pub Date : 2024-05-01 DOI: 10.1002/sam.11684

Sanyou Wu, Fuying Wang, Long Feng

{"title":"Individualized image region detection with total variation","authors":"Sanyou Wu, Fuying Wang, Long Feng","doi":"10.1002/sam.11684","DOIUrl":"https://doi.org/10.1002/sam.11684","url":null,"abstract":"Medical image data have emerged to be an indispensable component of modern medicine. Different from many general image problems that focus on outcome prediction or image recognition, medical image analysis pays more attention to model interpretation. For instance, given a list of medical images and corresponding labels of patients' health status, it is often of greater importance to identify the image regions that could differentiate the outcome status, compared to simply predicting labels of new images. Moreover, medical image data often demonstrate strong individual heterogeneity. In other words, the image regions associated with an outcome could be different across patients. As a consequence, the traditional one‐model‐fits‐all approach not only omits patient heterogeneity but also possibly leads to misleading or even wrong conclusions. In this article, we introduce a novel statistical framework to detect individualized regions that are associated with a binary outcome, that is, whether a patient has a certain disease or not. Moreover, we propose a total variation‐based penalization for individualized image region detection under a local label‐free scenario. Considering that local labeling is often difficult to obtain for medical image data, our approach may potentially have a wider range of applications in medical research. The effectiveness of our proposed approach is validated by two real histopathology databases: Colon Cancer and Camelyon16.","PeriodicalId":48684,"journal":{"name":"Statistical Analysis and Data Mining","volume":"105 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140839503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The analysis of association rules: Latent class analysis 关联规则分析潜类分析

IF 1.3 4区数学

Statistical Analysis and Data Mining Pub Date : 2024-05-01 DOI: 10.1002/sam.11686

Ron S. Kenett, Chris Gotwalt

{"title":"The analysis of association rules: Latent class analysis","authors":"Ron S. Kenett, Chris Gotwalt","doi":"10.1002/sam.11686","DOIUrl":"https://doi.org/10.1002/sam.11686","url":null,"abstract":"Association rules are used to extract information from transactional databases with a collection of items also called “tokens” or “words.” The aim of association rule analysis is to indicate what and how items go with what items in a set of transactions called “documents.” This approach is used in the analysis of text records, of blogs in social media and of shopping baskets. We present here an approach to analyze documents using latent class analysis (LCA) clustering of document term matrices. A document term matrix (DTM) consists of rows referring to documents and columns corresponding to items. In binary weights, “1” indicates the presence of a term in a document and “0” otherwise. The clustering of similar documents provides stratified data sets used to enhance the interpretability of measures of interest such as lift, odds ratios and relative linkage disequilibrium. The article demonstrates the approach with two case studies. A first example consists of comments recorded in a survey aimed at pet owners. A second, much larger example, is based on online reviews to crocs sandals. Association rules describe combinations of terms in the pet survey and crocs reviews. In Section 3, we compute, for these case studies, association rule measures of interest defined in Section 2. We first introduce the case studies to motivate the methods proposed here. In Section 4, we provide a new approach with an enhanced interpretations of measures such as lift by comparing them across clusters derived from an LCA of the DTM. A key result is the application of clustered data in analyzing observational data. This enhances generalizability and interpretability of findings from text analytics. The article concludes with a discussion in Section 5.","PeriodicalId":48684,"journal":{"name":"Statistical Analysis and Data Mining","volume":"104 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140839390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Bayesian relative composite quantile regression approach of ordinal latent regression model with L1/2 regularization 具有 L1/2 正则化的贝叶斯相对复合量回归方法的序潜回归模型

IF 1.3 4区数学

Statistical Analysis and Data Mining Pub Date : 2024-04-15 DOI: 10.1002/sam.11683

Tian Yu-Zhu, Wu Chun-Ho, Tai Ling-Nan, Mian Zhi-Bao, Tian Mao-Zai

{"title":"Bayesian relative composite quantile regression approach of ordinal latent regression model with L1/2 regularization","authors":"Tian Yu-Zhu, Wu Chun-Ho, Tai Ling-Nan, Mian Zhi-Bao, Tian Mao-Zai","doi":"10.1002/sam.11683","DOIUrl":"https://doi.org/10.1002/sam.11683","url":null,"abstract":"Ordinal data frequently occur in various fields such as knowledge level assessment, credit rating, clinical disease diagnosis, and psychological evaluation. The classic models including cumulative logistic regression or probit regression are often used to model such ordinal data. But these modeling approaches conditionally depict the mean characteristic of response variable on a cluster of predictive variables, which often results in non-robust estimation results. As a considerable alternative, composite quantile regression (CQR) approach is usually employed to gain more robust and relatively efficient results. In this paper, we propose a Bayesian CQR modeling approach for ordinal latent regression model. In order to overcome the recognizability problem of the considered model and obtain more robust estimation results, we advocate to using the Bayesian relative CQR approach to estimate regression parameters. Additionally, in regression modeling, it is a highly desirable task to obtain a parsimonious model that retains only important covariates. We incorporate the Bayesian <span data-altimg=\"/cms/asset/27e745bc-8e93-4391-8ba3-d551069a4246/sam11683-math-0003.png\"></span><math altimg=\"urn:x-wiley:19321864:media:sam11683:sam11683-math-0003\" display=\"inline\" location=\"graphic/sam11683-math-0003.png\" overflow=\"scroll\">\u0000<semantics>\u0000<mrow>\u0000<msub>\u0000<mi>L</mi>\u0000<mrow>\u0000<mn>1</mn>\u0000<mo stretchy=\"false\">/</mo>\u0000<mn>2</mn>\u0000</mrow>\u0000</msub>\u0000</mrow>\u0000$$ {L}_{1/2} $$</annotation>\u0000</semantics></math> penalty into the ordinal latent CQR regression model to simultaneously conduct parameter estimation and variable selection. Finally, the proposed Bayesian relative CQR approach is illustrated by Monte Carlo simulations and a real data application. Simulation results and real data examples show that the suggested Bayesian relative CQR approach has good performance for the ordinal regression models.","PeriodicalId":48684,"journal":{"name":"Statistical Analysis and Data Mining","volume":"207 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140593189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A treeless absolutely random forest with closed‐form estimators of expected proximities 无树绝对随机森林与预期邻近度的闭式估计值

IF 1.3 4区数学

Statistical Analysis and Data Mining Pub Date : 2024-04-10 DOI: 10.1002/sam.11678

Eugene Laska, Ziqiang Lin, Carole Siegel, Charles Marmar

{"title":"A treeless absolutely random forest with closed‐form estimators of expected proximities","authors":"Eugene Laska, Ziqiang Lin, Carole Siegel, Charles Marmar","doi":"10.1002/sam.11678","DOIUrl":"https://doi.org/10.1002/sam.11678","url":null,"abstract":"We introduce a simple variant of a purely random forest, called an absolute random forest (ARF) used for clustering. At every node, splits of units are determined by a randomly chosen feature and a random threshold drawn from a uniform distribution whose support, the range of the selected feature in the root node, does not change. This enables closed‐form estimators of parameters, such as pairwise proximities, to be obtained without having to grow a forest. The probabilistic structure corresponding to an ARF is called a treeless absolute random forest (TARF). With high probability, the algorithm will split units whose feature vectors are far apart and keep together units whose feature vectors are similar. Thus, the underlying structure of the data drives the growth of the tree. The expected value of pairwise proximities is obtained for three pathway functions. One, a completely common pathway function, is an indicator of whether a pair of units follow the same path from the root to the leaf node. The properties of TARF‐based proximity estimators for clustering and classification are compared to other methods in eight real‐world datasets and in simulations. Results show substantial performance and computing efficiencies of particular value for large datasets.","PeriodicalId":48684,"journal":{"name":"Statistical Analysis and Data Mining","volume":"37 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140593227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Transfer learning under the Cox model with interval‐censored data 考克斯模型下的转移学习与区间删失数据

IF 1.3 4区数学

Statistical Analysis and Data Mining Pub Date : 2024-04-10 DOI: 10.1002/sam.11680

Mengqi Xie, Tao Hu, Jie Zhou

{"title":"Transfer learning under the Cox model with interval‐censored data","authors":"Mengqi Xie, Tao Hu, Jie Zhou","doi":"10.1002/sam.11680","DOIUrl":"https://doi.org/10.1002/sam.11680","url":null,"abstract":"Transfer learning, focusing on information borrowing to address limited sample size issues, has gained increasing attention in recent years. Our method aims to utilize data from other population groups as a complement to enhance risk factor discernment and failure time prediction among underrepresented subgroups. However, a literature gap exists in effective knowledge transfer from the source to the target for risk assessment with interval‐censored data while accommodating population incomparability and privacy constraints. Our objective is to bridge this gap by developing a transfer learning approach under the Cox proportional hazards model. We introduce the tuning‐free Trans‐Cox‐MIC algorithm, enabling adaptable information sharing in regression coefficients and baseline hazards, while ensuring computational efficiency. Our approach accommodates covariate distribution shifts, coefficient variations, and baseline hazard discrepancies. Extensive simulations showcase the method's accuracy, robustness, and efficiency. Application to the prostate cancer screening data demonstrates enhanced risk estimation precision and predictive performance in the African American population.","PeriodicalId":48684,"journal":{"name":"Statistical Analysis and Data Mining","volume":"58 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140593188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Data-driven stochastic model for quantifying the interplay between amyloid-beta and calcium levels in Alzheimer's disease 量化阿尔茨海默病中淀粉样蛋白-β和钙水平之间相互作用的数据驱动型随机模型

IF 1.3 4区数学

Statistical Analysis and Data Mining Pub Date : 2024-04-09 DOI: 10.1002/sam.11679

Hina Shaheen, Roderick Melnik, Sundeep Singh

引用次数: 0

Randomized multiarm bandits: An improved adaptive data collection method 随机多臂匪帮：一种改进的自适应数据收集方法

IF 1.3 4区数学

Statistical Analysis and Data Mining Pub Date : 2024-04-06 DOI: 10.1002/sam.11681

Zhigen Zhao, Tong Wang, Bo Ji

引用次数: 0