Journal of data science : JDS最新文献

筛选
英文 中文
An Assessment of Crop-Specific Land Cover Predictions Using High-Order Markov Chains and Deep Neural Networks 基于高阶马尔可夫链和深度神经网络的作物特定土地覆盖预测评估
Journal of data science : JDS Pub Date : 2023-01-01 DOI: 10.6339/23-jds1098
L. Sartore, C. Boryan, Andrew Dau, P. Willis
{"title":"An Assessment of Crop-Specific Land Cover Predictions Using High-Order Markov Chains and Deep Neural Networks","authors":"L. Sartore, C. Boryan, Andrew Dau, P. Willis","doi":"10.6339/23-jds1098","DOIUrl":"https://doi.org/10.6339/23-jds1098","url":null,"abstract":"High-Order Markov Chains (HOMC) are conventional models, based on transition probabilities, that are used by the United States Department of Agriculture (USDA) National Agricultural Statistics Service (NASS) to study crop-rotation patterns over time. However, HOMCs routinely suffer from sparsity and identifiability issues because the categorical data are represented as indicator (or dummy) variables. In fact, the dimension of the parametric space increases exponentially with the order of HOMCs required for analysis. While parsimonious representations reduce the number of parameters, as has been shown in the literature, they often result in less accurate predictions. Most parsimonious models are trained on big data structures, which can be compressed and efficiently processed using alternative algorithms. Consequently, a thorough evaluation and comparison of the prediction results obtain using a new HOMC algorithm and different types of Deep Neural Networks (DNN) across a range of agricultural conditions is warranted to determine which model is most appropriate for operational crop specific land cover prediction of United States (US) agriculture. In this paper, six neural network models are applied to crop rotation data between 2011 and 2021 from six agriculturally intensive counties, which reflect the range of major crops grown and a variety of crop rotation patterns in the Midwest and southern US. The six counties include: Renville, North Dakota; Perkins, Nebraska; Hale, Texas; Livingston, Illinois; McLean, Illinois; and Shelby, Ohio. Results show the DNN models achieve higher overall prediction accuracy for all counties in 2021. The proposed DNN models allow for the ingestion of long time series data, and robustly achieve higher accuracy values than a new HOMC algorithm considered for predicting crop specific land cover in the US.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71321012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Binary Classification of Malignant Mesothelioma: A Comparative Study 恶性间皮瘤二元分类的比较研究
Journal of data science : JDS Pub Date : 2023-01-01 DOI: 10.6339/23-jds1090
Ted Si Yuan Cheng, Xiyue Liao
{"title":"Binary Classification of Malignant Mesothelioma: A Comparative Study","authors":"Ted Si Yuan Cheng, Xiyue Liao","doi":"10.6339/23-jds1090","DOIUrl":"https://doi.org/10.6339/23-jds1090","url":null,"abstract":"Malignant mesotheliomas are aggressive cancers that occur in the thin layer of tissue that covers most commonly the linings of the chest or abdomen. Though the cancer itself is rare and deadly, early diagnosis will help with treatment and improve outcomes. Mesothelioma is usually diagnosed in the later stages. Symptoms are similar to other, more common conditions. As such, predicting and diagnosing mesothelioma early is essential to starting early treatment for a cancer that is often diagnosed too late. The goal of this comprehensive empirical comparison is to determine the best-performing model based on recall (sensitivity). We particularly wish to avoid false negatives, as it is costly to diagnose a patient as healthy when they actually have cancer. Model training will be conducted based on k-fold cross validation. Random forest is chosen as the optimal model. According to this model, age and duration of asbestos exposure are ranked as the most important features affecting diagnosis of mesothelioma.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71320731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Computing Pseudolikelihood Estimators for Exponential-Family Random Graph Models 计算指数族随机图模型的伪似然估计量
Journal of data science : JDS Pub Date : 2023-01-01 DOI: 10.6339/23-jds1094
Christian S. Schmid, David R. Hunter
{"title":"Computing Pseudolikelihood Estimators for Exponential-Family Random Graph Models","authors":"Christian S. Schmid, David R. Hunter","doi":"10.6339/23-jds1094","DOIUrl":"https://doi.org/10.6339/23-jds1094","url":null,"abstract":"The reputation of the maximum pseudolikelihood estimator (MPLE) for Exponential Random Graph Models (ERGM) has undergone a drastic change over the past 30 years. While first receiving broad support, mainly due to its computational feasibility and the lack of alternatives, general opinions started to change with the introduction of approximate maximum likelihood estimator (MLE) methods that became practicable due to increasing computing power and the introduction of MCMC methods. Previous comparison studies appear to yield contradicting results regarding the preference of these two point estimators; however, there is consensus that the prevailing method to obtain an MPLE’s standard error by the inverse Hessian matrix generally underestimates standard errors. We propose replacing the inverse Hessian matrix by an approximation of the Godambe matrix that results in confidence intervals with appropriate coverage rates and that, in addition, enables examining for model degeneracy. Our results also provide empirical evidence for the asymptotic normality of the MPLE under certain conditions.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71320816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Impact of Bias Correction of the Least Squares Estimation on Bootstrap Confidence Intervals for Bifurcating Autoregressive Models 最小二乘估计偏差校正对分岔自回归模型自举置信区间的影响
Journal of data science : JDS Pub Date : 2023-01-01 DOI: 10.6339/23-jds1092
T. Elbayoumi, S. Mostafa
{"title":"Impact of Bias Correction of the Least Squares Estimation on Bootstrap Confidence Intervals for Bifurcating Autoregressive Models","authors":"T. Elbayoumi, S. Mostafa","doi":"10.6339/23-jds1092","DOIUrl":"https://doi.org/10.6339/23-jds1092","url":null,"abstract":"The least squares (LS) estimator of the autoregressive coefficient in the bifurcating autoregressive (BAR) model was recently shown to suffer from substantial bias, especially for small to moderate samples. This study investigates the impact of the bias in the LS estimator on the behavior of various types of bootstrap confidence intervals for the autoregressive coefficient and introduces methods for constructing bias-corrected bootstrap confidence intervals. We first describe several bootstrap confidence interval procedures for the autoregressive coefficient of the BAR model and present their bias-corrected versions. The behavior of uncorrected and corrected confidence interval procedures is studied empirically through extensive Monte Carlo simulations and two real cell lineage data applications. The empirical results show that the bias in the LS estimator can have a significant negative impact on the behavior of bootstrap confidence intervals and that bias correction can significantly improve the performance of bootstrap confidence intervals in terms of coverage, width, and symmetry.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71320777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Covid-19 Vaccine Efficacy: Accuracy Assessment, Comparison, and Caveats Covid-19疫苗有效性:准确性评估、比较和注意事项
Journal of data science : JDS Pub Date : 2023-01-01 DOI: 10.6339/23-jds1089
Wenjiang J. Fu, Jieni Li, P. Scheet
{"title":"Covid-19 Vaccine Efficacy: Accuracy Assessment, Comparison, and Caveats","authors":"Wenjiang J. Fu, Jieni Li, P. Scheet","doi":"10.6339/23-jds1089","DOIUrl":"https://doi.org/10.6339/23-jds1089","url":null,"abstract":"Vaccine efficacy is a key index to evaluate vaccines in initial clinical trials during the development of vaccines. In particular, it plays a crucial role in authorizing Covid-19 vaccines. It has been reported that Covid-19 vaccine efficacy varies with a number of factors, including demographics of population, time after vaccine administration, and virus strains. By examining clinical trial data of three Covid-19 vaccine studies, we find that current approach to evaluating vaccines with an overall efficacy does not provide desired accuracy. It requires no time frame during which a candidate vaccine is evaluated, and is subject to misuse, resulting in potential misleading information and interpretation. In particular, we illustrate with clinical trial data that the variability of vaccine efficacy is underestimated. We demonstrate that a new method may help to address these caveats. It leads to accurate estimation of the variation of efficacy, provides useful information to define a reasonable time frame to evaluate vaccines, and avoids misuse of vaccine efficacy and misleading information.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71320716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Second Competition on Spatial Statistics for Large Datasets 第二届大型数据集空间统计竞赛
Journal of data science : JDS Pub Date : 2022-11-06 DOI: 10.6339/22-jds1076
Sameh Abdulah, Faten S. Alamri, Pratik Nag, Ying Sun, H. Ltaief, D. Keyes, M. Genton
{"title":"The Second Competition on Spatial Statistics for Large Datasets","authors":"Sameh Abdulah, Faten S. Alamri, Pratik Nag, Ying Sun, H. Ltaief, D. Keyes, M. Genton","doi":"10.6339/22-jds1076","DOIUrl":"https://doi.org/10.6339/22-jds1076","url":null,"abstract":"In the last few decades, the size of spatial and spatio-temporal datasets in many research areas has rapidly increased with the development of data collection technologies. As a result, classical statistical methods in spatial statistics are facing computational challenges. For example, the kriging predictor in geostatistics becomes prohibitive on traditional hardware architectures for large datasets as it requires high computing power and memory footprint when dealing with large dense matrix operations. Over the years, various approximation methods have been proposed to address such computational issues, however, the community lacks a holistic process to assess their approximation efficiency. To provide a fair assessment, in 2021, we organized the first competition on spatial statistics for large datasets, generated by our ExaGeoStat software, and asked participants to report the results of estimation and prediction. Thanks to its widely acknowledged success and at the request of many participants, we organized the second competition in 2022 focusing on predictions for more complex spatial and spatio-temporal processes, including univariate nonstationary spatial processes, univariate stationary space-time processes, and bivariate stationary spatial processes. In this paper, we describe in detail the data generation procedure and make the valuable datasets publicly available for a wider adoption. Then, we review the submitted methods from fourteen teams worldwide, analyze the competition outcomes, and assess the performance of each team.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42045278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Vecchia Approximations and Optimization for Multivariate Matérn Models 多元数学模型的Vecchia逼近与优化
Journal of data science : JDS Pub Date : 2022-10-17 DOI: 10.6339/22-jds1074
Youssef A. Fahmy, J. Guinness
{"title":"Vecchia Approximations and Optimization for Multivariate Matérn Models","authors":"Youssef A. Fahmy, J. Guinness","doi":"10.6339/22-jds1074","DOIUrl":"https://doi.org/10.6339/22-jds1074","url":null,"abstract":"We describe our implementation of the multivariate Matérn model for multivariate spatial datasets, using Vecchia’s approximation and a Fisher scoring optimization algorithm. We consider various pararameterizations for the multivariate Matérn that have been proposed in the literature for ensuring model validity, as well as an unconstrained model. A strength of our study is that the code is tested on many real-world multivariate spatial datasets. We use it to study the effect of ordering and conditioning in Vecchia’s approximation and the restrictions imposed by the various parameterizations. We also consider a model in which co-located nuggets are correlated across components and find that forcing this cross-component nugget correlation to be zero can have a serious impact on the other model parameters, so we suggest allowing cross-component correlation in co-located nugget terms.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44042544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Geostatistics for Large Datasets on Riemannian Manifolds: A Matrix-Free Approach 黎曼流形上大数据集的地质统计学:无矩阵方法
Journal of data science : JDS Pub Date : 2022-08-26 DOI: 10.6339/22-jds1075
M. Pereira, N. Desassis, D. Allard
{"title":"Geostatistics for Large Datasets on Riemannian Manifolds: A Matrix-Free Approach","authors":"M. Pereira, N. Desassis, D. Allard","doi":"10.6339/22-jds1075","DOIUrl":"https://doi.org/10.6339/22-jds1075","url":null,"abstract":"Large or very large spatial (and spatio-temporal) datasets have become common place in many environmental and climate studies. These data are often collected in non-Euclidean spaces (such as the planet Earth) and they often present nonstationary anisotropies. This paper proposes a generic approach to model Gaussian Random Fields (GRFs) on compact Riemannian manifolds that bridges the gap between existing works on nonstationary GRFs and random fields on manifolds. This approach can be applied to any smooth compact manifolds, and in particular to any compact surface. By defining a Riemannian metric that accounts for the preferential directions of correlation, our approach yields an interpretation of the nonstationary geometric anisotropies as resulting from local deformations of the domain. We provide scalable algorithms for the estimation of the parameters and for optimal prediction by kriging and simulation able to tackle very large grids. Stationary and nonstationary illustrations are provided.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46734367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
EVIboost for the Estimation of Extreme Value Index Under Heterogeneous Extremes 非均匀极值下极值指数估计的EVIboost
Journal of data science : JDS Pub Date : 2022-05-28 DOI: 10.6339/22-jds1067
Jiaxi Wang, Yanxi Hou, Xingchi Li, Tiandong Wang
{"title":"EVIboost for the Estimation of Extreme Value Index Under Heterogeneous Extremes","authors":"Jiaxi Wang, Yanxi Hou, Xingchi Li, Tiandong Wang","doi":"10.6339/22-jds1067","DOIUrl":"https://doi.org/10.6339/22-jds1067","url":null,"abstract":"Modeling heterogeneity on heavy-tailed distributions under a regression framework is challenging, yet classical statistical methodologies usually place conditions on the distribution models to facilitate the learning procedure. However, these conditions will likely overlook the complex dependence structure between the heaviness of tails and the covariates. Moreover, data sparsity on tail regions makes the inference method less stable, leading to biased estimates for extreme-related quantities. This paper proposes a gradient boosting algorithm to estimate a functional extreme value index with heterogeneous extremes. Our proposed algorithm is a data-driven procedure capturing complex and dynamic structures in tail distributions. We also conduct extensive simulation studies to show the prediction accuracy of the proposed algorithm. In addition, we apply our method to a real-world data set to illustrate the state-dependent and time-varying properties of heavy-tail phenomena in the financial industry.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43556826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Linear Algorithms for Robust and Scalable Nonparametric Multiclass Probability Estimation 鲁棒可扩展的非参数多类概率估计的线性算法
Journal of data science : JDS Pub Date : 2022-05-25 DOI: 10.6339/22-jds1069
Liyun Zeng, Hao Helen Zhang
{"title":"Linear Algorithms for Robust and Scalable Nonparametric Multiclass Probability Estimation","authors":"Liyun Zeng, Hao Helen Zhang","doi":"10.6339/22-jds1069","DOIUrl":"https://doi.org/10.6339/22-jds1069","url":null,"abstract":"Multiclass probability estimation is the problem of estimating conditional probabilities of a data point belonging to a class given its covariate information. It has broad applications in statistical analysis and data science. Recently a class of weighted Support Vector Machines (wSVMs) has been developed to estimate class probabilities through ensemble learning for K-class problems (Wu et al., 2010; Wang et al., 2019), where K is the number of classes. The estimators are robust and achieve high accuracy for probability estimation, but their learning is implemented through pairwise coupling, which demands polynomial time in K. In this paper, we propose two new learning schemes, the baseline learning and the One-vs-All (OVA) learning, to further improve wSVMs in terms of computational efficiency and estimation accuracy. In particular, the baseline learning has optimal computational complexity in the sense that it is linear in K. Though not the most efficient in computation, the OVA is found to have the best estimation accuracy among all the procedures under comparison. The resulting estimators are distribution-free and shown to be consistent. We further conduct extensive numerical experiments to demonstrate their finite sample performance.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44600781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信