Journal of data science : JDS最新文献_第3页

An Assessment of Crop-Specific Land Cover Predictions Using High-Order Markov Chains and Deep Neural Networks 基于高阶马尔可夫链和深度神经网络的作物特定土地覆盖预测评估

Journal of data science : JDS Pub Date : 2023-01-01 DOI: 10.6339/23-jds1098

L. Sartore, C. Boryan, Andrew Dau, P. Willis

{"title":"An Assessment of Crop-Specific Land Cover Predictions Using High-Order Markov Chains and Deep Neural Networks","authors":"L. Sartore, C. Boryan, Andrew Dau, P. Willis","doi":"10.6339/23-jds1098","DOIUrl":"https://doi.org/10.6339/23-jds1098","url":null,"abstract":"High-Order Markov Chains (HOMC) are conventional models, based on transition probabilities, that are used by the United States Department of Agriculture (USDA) National Agricultural Statistics Service (NASS) to study crop-rotation patterns over time. However, HOMCs routinely suffer from sparsity and identifiability issues because the categorical data are represented as indicator (or dummy) variables. In fact, the dimension of the parametric space increases exponentially with the order of HOMCs required for analysis. While parsimonious representations reduce the number of parameters, as has been shown in the literature, they often result in less accurate predictions. Most parsimonious models are trained on big data structures, which can be compressed and efficiently processed using alternative algorithms. Consequently, a thorough evaluation and comparison of the prediction results obtain using a new HOMC algorithm and different types of Deep Neural Networks (DNN) across a range of agricultural conditions is warranted to determine which model is most appropriate for operational crop specific land cover prediction of United States (US) agriculture. In this paper, six neural network models are applied to crop rotation data between 2011 and 2021 from six agriculturally intensive counties, which reflect the range of major crops grown and a variety of crop rotation patterns in the Midwest and southern US. The six counties include: Renville, North Dakota; Perkins, Nebraska; Hale, Texas; Livingston, Illinois; McLean, Illinois; and Shelby, Ohio. Results show the DNN models achieve higher overall prediction accuracy for all counties in 2021. The proposed DNN models allow for the ingestion of long time series data, and robustly achieve higher accuracy values than a new HOMC algorithm considered for predicting crop specific land cover in the US.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71321012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Binary Classification of Malignant Mesothelioma: A Comparative Study 恶性间皮瘤二元分类的比较研究

Journal of data science : JDS Pub Date : 2023-01-01 DOI: 10.6339/23-jds1090

Ted Si Yuan Cheng, Xiyue Liao

引用次数: 3

Computing Pseudolikelihood Estimators for Exponential-Family Random Graph Models 计算指数族随机图模型的伪似然估计量

Journal of data science : JDS Pub Date : 2023-01-01 DOI: 10.6339/23-jds1094

Christian S. Schmid, David R. Hunter

引用次数: 3

Impact of Bias Correction of the Least Squares Estimation on Bootstrap Confidence Intervals for Bifurcating Autoregressive Models 最小二乘估计偏差校正对分岔自回归模型自举置信区间的影响

Journal of data science : JDS Pub Date : 2023-01-01 DOI: 10.6339/23-jds1092

T. Elbayoumi, S. Mostafa

引用次数: 0

Covid-19 Vaccine Efficacy: Accuracy Assessment, Comparison, and Caveats Covid-19疫苗有效性:准确性评估、比较和注意事项

Journal of data science : JDS Pub Date : 2023-01-01 DOI: 10.6339/23-jds1089

Wenjiang J. Fu, Jieni Li, P. Scheet

引用次数: 0

The Second Competition on Spatial Statistics for Large Datasets 第二届大型数据集空间统计竞赛

Journal of data science : JDS Pub Date : 2022-11-06 DOI: 10.6339/22-jds1076

Sameh Abdulah, Faten S. Alamri, Pratik Nag, Ying Sun, H. Ltaief, D. Keyes, M. Genton

{"title":"The Second Competition on Spatial Statistics for Large Datasets","authors":"Sameh Abdulah, Faten S. Alamri, Pratik Nag, Ying Sun, H. Ltaief, D. Keyes, M. Genton","doi":"10.6339/22-jds1076","DOIUrl":"https://doi.org/10.6339/22-jds1076","url":null,"abstract":"In the last few decades, the size of spatial and spatio-temporal datasets in many research areas has rapidly increased with the development of data collection technologies. As a result, classical statistical methods in spatial statistics are facing computational challenges. For example, the kriging predictor in geostatistics becomes prohibitive on traditional hardware architectures for large datasets as it requires high computing power and memory footprint when dealing with large dense matrix operations. Over the years, various approximation methods have been proposed to address such computational issues, however, the community lacks a holistic process to assess their approximation efficiency. To provide a fair assessment, in 2021, we organized the first competition on spatial statistics for large datasets, generated by our ExaGeoStat software, and asked participants to report the results of estimation and prediction. Thanks to its widely acknowledged success and at the request of many participants, we organized the second competition in 2022 focusing on predictions for more complex spatial and spatio-temporal processes, including univariate nonstationary spatial processes, univariate stationary space-time processes, and bivariate stationary spatial processes. In this paper, we describe in detail the data generation procedure and make the valuable datasets publicly available for a wider adoption. Then, we review the submitted methods from fourteen teams worldwide, analyze the competition outcomes, and assess the performance of each team.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42045278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Vecchia Approximations and Optimization for Multivariate Matérn Models 多元数学模型的Vecchia逼近与优化

Journal of data science : JDS Pub Date : 2022-10-17 DOI: 10.6339/22-jds1074

Youssef A. Fahmy, J. Guinness

引用次数: 2

Geostatistics for Large Datasets on Riemannian Manifolds: A Matrix-Free Approach 黎曼流形上大数据集的地质统计学:无矩阵方法

Journal of data science : JDS Pub Date : 2022-08-26 DOI: 10.6339/22-jds1075

M. Pereira, N. Desassis, D. Allard

引用次数: 6

EVIboost for the Estimation of Extreme Value Index Under Heterogeneous Extremes 非均匀极值下极值指数估计的EVIboost

Journal of data science : JDS Pub Date : 2022-05-28 DOI: 10.6339/22-jds1067

Jiaxi Wang, Yanxi Hou, Xingchi Li, Tiandong Wang

引用次数: 0

Linear Algorithms for Robust and Scalable Nonparametric Multiclass Probability Estimation 鲁棒可扩展的非参数多类概率估计的线性算法

Journal of data science : JDS Pub Date : 2022-05-25 DOI: 10.6339/22-jds1069

Liyun Zeng, Hao Helen Zhang

{"title":"Linear Algorithms for Robust and Scalable Nonparametric Multiclass Probability Estimation","authors":"Liyun Zeng, Hao Helen Zhang","doi":"10.6339/22-jds1069","DOIUrl":"https://doi.org/10.6339/22-jds1069","url":null,"abstract":"Multiclass probability estimation is the problem of estimating conditional probabilities of a data point belonging to a class given its covariate information. It has broad applications in statistical analysis and data science. Recently a class of weighted Support Vector Machines (wSVMs) has been developed to estimate class probabilities through ensemble learning for K-class problems (Wu et al., 2010; Wang et al., 2019), where K is the number of classes. The estimators are robust and achieve high accuracy for probability estimation, but their learning is implemented through pairwise coupling, which demands polynomial time in K. In this paper, we propose two new learning schemes, the baseline learning and the One-vs-All (OVA) learning, to further improve wSVMs in terms of computational efficiency and estimation accuracy. In particular, the baseline learning has optimal computational complexity in the sense that it is linear in K. Though not the most efficient in computation, the OVA is found to have the best estimation accuracy among all the procedures under comparison. The resulting estimators are distribution-free and shown to be consistent. We further conduct extensive numerical experiments to demonstrate their finite sample performance.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44600781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0