Journal of data science : JDS最新文献_第5页

A Joint Analysis for Field Goal Attempts and Percentages of Professional Basketball Players: Bayesian Nonparametric Resource 职业篮球运动员投篮命中率与投篮命中率的联合分析:贝叶斯非参数资源

Journal of data science : JDS Pub Date : 2022-01-01 DOI: 10.6339/22-jds1062

Eliot Wong-Toi, Hou‐Cheng Yang, Weining Shen, Guanyu Hu

引用次数: 0

Hierarchical Ridge Regression for Incorporating Prior Information in Genomic Studies. 在基因组研究中纳入先验信息的层次岭回归。

Journal of data science : JDS Pub Date : 2022-01-01 Epub Date: 2021-12-13 DOI: 10.6339/21-jds1030

Eric S Kawaguchi, Sisi Li, Garrett M Weaver, Juan Pablo Lewinger

{"title":"Hierarchical Ridge Regression for Incorporating Prior Information in Genomic Studies.","authors":"Eric S Kawaguchi, Sisi Li, Garrett M Weaver, Juan Pablo Lewinger","doi":"10.6339/21-jds1030","DOIUrl":"10.6339/21-jds1030","url":null,"abstract":"<p><p>There is a great deal of prior knowledge about gene function and regulation in the form of annotations or prior results that, if directly integrated into individual prognostic or diagnostic studies, could improve predictive performance. For example, in a study to develop a predictive model for cancer survival based on gene expression, effect sizes from previous studies or the grouping of genes based on pathways constitute such prior knowledge. However, this external information is typically only used post-analysis to aid in the interpretation of any findings. We propose a new hierarchical two-level ridge regression model that can integrate external information in the form of \"meta features\" to predict an outcome. We show that the model can be fit efficiently using cyclic coordinate descent by recasting the problem as a single-level regression model. In a simulation-based evaluation we show that the proposed method outperforms standard ridge regression and competing methods that integrate prior information, in terms of prediction performance when the meta features are informative on the mean of the features, and that there is no loss in performance when the meta features are uninformative. We demonstrate our approach with applications to the prediction of chronological age based on methylation features and breast cancer mortality based on gene expression features.</p>","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":"20 1","pages":"34-50"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9581069/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10451046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Accelerating Fixed-Point Algorithms in Statistics and Data Science: A State-of-Art Review 加速统计和数据科学中的定点算法:最新评述

Journal of data science : JDS Pub Date : 2022-01-01 DOI: 10.6339/22-jds1051

Bohao Tang, Nicholas C. Henderson, Ravi Varadhan

{"title":"Accelerating Fixed-Point Algorithms in Statistics and Data Science: A State-of-Art Review","authors":"Bohao Tang, Nicholas C. Henderson, Ravi Varadhan","doi":"10.6339/22-jds1051","DOIUrl":"https://doi.org/10.6339/22-jds1051","url":null,"abstract":"Fixed-point algorithms are popular in statistics and data science due to their simplicity, guaranteed convergence, and applicability to high-dimensional problems. Well-known examples include the expectation-maximization (EM) algorithm, majorization-minimization (MM), and gradient-based algorithms like gradient descent (GD) and proximal gradient descent. A characteristic weakness of these algorithms is their slow convergence. We discuss several state-of-art techniques for accelerating their convergence. We demonstrate and evaluate these techniques in terms of their efficiency and robustness in six distinct applications. Among the acceleration schemes, SQUAREM shows robust acceleration with a mean 18-fold speedup. DAAREM and restarted-Nesterov schemes also demonstrate consistently impressive accelerations. Thus, it is possible to accelerate the original fixed-point algorithm by using one of SQUAREM, DAAREM, or restarted-Nesterov acceleration schemes. We describe implementation details and software packages to facilitate the application of the acceleration schemes. We also discuss strategies for selecting a particular acceleration scheme for a given problem.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71320343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Editorial: Data Science Meets Social Sciences 社论:数据科学遇上社会科学

Journal of data science : JDS Pub Date : 2022-01-01 DOI: 10.6339/22-jds203edi

E. Erosheva, Shahryar Minhas, Gongjun Xu, Ran Xu

引用次数: 0

Propensity Score Modeling in Electronic Health Records with Time-to-Event Endpoints: Application to Kidney Transplantation 以时间到事件为终点的电子健康记录中的倾向评分模型:在肾移植中的应用

Journal of data science : JDS Pub Date : 2022-01-01 DOI: 10.6339/22-jds1046

Jonathan W. Yu, D. Bandyopadhyay, Shu Yang, Le Kang, G. Gupta

{"title":"Propensity Score Modeling in Electronic Health Records with Time-to-Event Endpoints: Application to Kidney Transplantation","authors":"Jonathan W. Yu, D. Bandyopadhyay, Shu Yang, Le Kang, G. Gupta","doi":"10.6339/22-jds1046","DOIUrl":"https://doi.org/10.6339/22-jds1046","url":null,"abstract":"For large observational studies lacking a control group (unlike randomized controlled trials, RCT), propensity scores (PS) are often the method of choice to account for pre-treatment confounding in baseline characteristics, and thereby avoid substantial bias in treatment estimation. A vast majority of PS techniques focus on average treatment effect estimation, without any clear consensus on how to account for confounders, especially in a multiple treatment setting. Furthermore, for time-to event outcomes, the analytical framework is further complicated in presence of high censoring rates (sometimes, due to non-susceptibility of study units to a disease), imbalance between treatment groups, and clustered nature of the data (where, survival outcomes appear in groups). Motivated by a right-censored kidney transplantation dataset derived from the United Network of Organ Sharing (UNOS), we investigate and compare two recent promising PS procedures, (a) the generalized boosted model (GBM), and (b) the covariate-balancing propensity score (CBPS), in an attempt to decouple the causal effects of treatments (here, study subgroups, such as hepatitis C virus (HCV) positive/negative donors, and positive/negative recipients) on time to death of kidney recipients due to kidney failure, post transplantation. For estimation, we employ a 2-step procedure which addresses various complexities observed in the UNOS database within a unified paradigm. First, to adjust for the large number of confounders on the multiple sub-groups, we fit multinomial PS models via procedures (a) and (b). In the next stage, the estimated PS is incorporated into the likelihood of a semi-parametric cure rate Cox proportional hazard frailty model via inverse probability of treatment weighting, adjusted for multi-center clustering and excess censoring, Our data analysis reveals a more informative and superior performance of the full model in terms of treatment effect estimation, over sub-models that relaxes the various features of the event time dataset.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71320161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

An Effective Tensor Regression with Latent Sparse Regularization 具有隐稀疏正则化的有效张量回归

Journal of data science : JDS Pub Date : 2022-01-01 DOI: 10.6339/22-jds1048

Ko-Shin Chen, Tingyang Xu, Guannan Liang, Qianqian Tong, Minghu Song, J. Bi

{"title":"An Effective Tensor Regression with Latent Sparse Regularization","authors":"Ko-Shin Chen, Tingyang Xu, Guannan Liang, Qianqian Tong, Minghu Song, J. Bi","doi":"10.6339/22-jds1048","DOIUrl":"https://doi.org/10.6339/22-jds1048","url":null,"abstract":"As data acquisition technologies advance, longitudinal analysis is facing challenges of exploring complex feature patterns from high-dimensional data and modeling potential temporally lagged effects of features on a response. We propose a tensor-based model to analyze multidimensional data. It simultaneously discovers patterns in features and reveals whether features observed at past time points have impact on current outcomes. The model coefficient, a k-mode tensor, is decomposed into a summation of k tensors of the same dimension. We introduce a so-called latent F-1 norm that can be applied to the coefficient tensor to performed structured selection of features. Specifically, features will be selected along each mode of the tensor. The proposed model takes into account within-subject correlations by employing a tensor-based quadratic inference function. An asymptotic analysis shows that our model can identify true support when the sample size approaches to infinity. To solve the corresponding optimization problem, we develop a linearized block coordinate descent algorithm and prove its convergence for a fixed sample size. Computational results on synthetic datasets and real-life fMRI and EEG datasets demonstrate the superior performance of the proposed approach over existing techniques.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71320174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Does Aging Make Us Grittier? Disentangling the Age and Generation Effect on Passion and Perseverance 衰老会让我们变得更坚强吗?拆解年龄和世代对激情和毅力的影响

Journal of data science : JDS Pub Date : 2022-01-01 DOI: 10.6339/22-jds1041

S. Sanders, Nuwan Indika Millagaha Gedara, Bhavneet Walia, C. Boudreaux, M. Silverstein

{"title":"Does Aging Make Us Grittier? Disentangling the Age and Generation Effect on Passion and Perseverance","authors":"S. Sanders, Nuwan Indika Millagaha Gedara, Bhavneet Walia, C. Boudreaux, M. Silverstein","doi":"10.6339/22-jds1041","DOIUrl":"https://doi.org/10.6339/22-jds1041","url":null,"abstract":"Defined as perseverance and passion for long term goals, grit represents an important psychological skill toward goal-attainment in academic and less-stylized settings. An outstanding issue of primary importance is whether age affects grit, ceteris paribus. The 12-item Grit-O Scale and the 8-item Grit-S Scale—from which grit scores are calculated—have not existed for a long period of time. Therefore, Duckworth (2016, p. 37) states in her book, Grit: The Power and Passion of Perseverance, that “we need a different kind of study” to distinguish between rival explanations that either generational cohort or age are more important in explaining variation in grit across individuals. Despite this clear data constraint, we obtain a glimpse into the future in the present study by using a within and between generational cohort age difference-in-difference approach. By specifying generation as a categorical variable and age-in-generation as a count variable in the same regression specifications, we are able to account for the effects of variation in age and generation simultaneously, while avoiding problems of multicollinearity that would hinder post-regression statistical inference. We conclude robust, significant evidence that the negative-parabolic shape of the grit-age profile is driven by generational variation and not by age variation. Our findings suggest that, absent a grit-mindset intervention, individual-level grit may be persistent over time.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71320247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Do Americans Think the Digital Economy is Fair? Using Supervised Learning to Explore Evaluations of Predictive Automation 美国人认为数字经济公平吗?使用监督学习探索预测自动化的评估

Journal of data science : JDS Pub Date : 2022-01-01 DOI: 10.6339/22-jds1053

E. Lehoucq

引用次数: 1

High-Dimensional Nonlinear Spatio-Temporal Filtering by Compressing Hierarchical Sparse Cholesky Factors 压缩分层稀疏Cholesky因子的高维非线性时空滤波

Journal of data science : JDS Pub Date : 2022-01-01 DOI: 10.6339/22-jds1071

Anirban Chakraborty, M. Katzfuss

引用次数: 1

Supervised Spatial Regionalization using the Karhunen-Loève Expansion and Minimum Spanning Trees 基于karhunen - lo<e:1>展开和最小生成树的监督空间区划

Journal of data science : JDS Pub Date : 2022-01-01 DOI: 10.6339/22-jds1077

Ranadeep Daw, C. Wikle

引用次数: 2