Journal of data science : JDS最新文献

筛选
英文 中文
On the Use of Deep Neural Networks for Large-Scale Spatial Prediction 深度神经网络在大尺度空间预测中的应用
Journal of data science : JDS Pub Date : 2022-01-01 DOI: 10.6339/22-jds1070
Skyler Gray, Matthew J. Heaton, D. Bolintineanu, A. Olson
{"title":"On the Use of Deep Neural Networks for Large-Scale Spatial Prediction","authors":"Skyler Gray, Matthew J. Heaton, D. Bolintineanu, A. Olson","doi":"10.6339/22-jds1070","DOIUrl":"https://doi.org/10.6339/22-jds1070","url":null,"abstract":"For spatial kriging (prediction), the Gaussian process (GP) has been the go-to tool of spatial statisticians for decades. However, the GP is plagued by computational intractability, rendering it infeasible for use on large spatial data sets. Neural networks (NNs), on the other hand, have arisen as a flexible and computationally feasible approach for capturing nonlinear relationships. To date, however, NNs have only been scarcely used for problems in spatial statistics but their use is beginning to take root. In this work, we argue for equivalence between a NN and a GP and demonstrate how to implement NNs for kriging from large spatial data. We compare the computational efficacy and predictive power of NNs with that of GP approximations across a variety of big spatial Gaussian, non-Gaussian and binary data applications of up to size $n={10^{6}}$. Our results suggest that fully-connected NNs perform similarly to state-of-the-art, GP-approximated models for short-range predictions but can suffer for longer range predictions.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71320737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Integration of Social Determinants of Health Data into the Largest, Not-for-Profit Health System in South Florida 整合健康数据的社会决定因素到最大的,非营利性的卫生系统在南佛罗里达州
Journal of data science : JDS Pub Date : 2022-01-01 DOI: 10.6339/22-jds1063
Lourdes M. Rojas, Gregory L. Vincent, D. Parris
{"title":"Integration of Social Determinants of Health Data into the Largest, Not-for-Profit Health System in South Florida","authors":"Lourdes M. Rojas, Gregory L. Vincent, D. Parris","doi":"10.6339/22-jds1063","DOIUrl":"https://doi.org/10.6339/22-jds1063","url":null,"abstract":"Social determinants of health (SDOH) are the conditions in which people are born, grow, work, and live. Although evidence suggests that SDOH influence a range of health outcomes, health systems lack the infrastructure to access and act upon this information. The purpose of this manuscript is to explain the methodology that a health system used to: 1) identify and integrate publicly available SDOH data into the health systems’ Data Warehouse, 2) integrate a HIPAA compliant geocoding software (via DeGAUSS), and 3) visualize data to inform SDOH projects (via Tableau). First, authors engaged key stakeholders across the health system to convey the implications of SDOH data for our patient population and identify variables of interest. As a result, fourteen publicly available data sets, accounting for >30,800 variables representing national, state, county, and census tract information over 2016–2019, were cleaned and integrated into our Data Warehouse. To pilot the data visualization, we created county and census tract level maps for our service areas and plotted common SDOH metrics (e.g., income, education, insurance status, etc.). This practical, methodological integration of SDOH data at a large health system demonstrated feasibility. Ultimately, we will repeat this process system wide to further understand the risk burden in our patient population and improve our prediction models – allowing us to become better partners with our community.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":"22 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71320606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Active Data Science for Improving Clinical Risk Prediction 积极的数据科学改善临床风险预测
Journal of data science : JDS Pub Date : 2022-01-01 DOI: 10.6339/22-jds1078
D. Ankerst, Matthias Neumair
{"title":"Active Data Science for Improving Clinical Risk Prediction","authors":"D. Ankerst, Matthias Neumair","doi":"10.6339/22-jds1078","DOIUrl":"https://doi.org/10.6339/22-jds1078","url":null,"abstract":"Clinical risk prediction models are commonly developed in a post-hoc and passive fashion, capitalizing on convenient data from completed clinical trials or retrospective cohorts. Impacts of the models often end at their publication rather than with the patients. The field of clinical risk prediction is rapidly improving in a progressively more transparent data science era. Based on collective experience over the past decade by the Prostate Biopsy Collaborative Group (PBCG), this paper proposes the following four data science-driven strategies for improving clinical risk prediction to the benefit of clinical practice and research. The first proposed strategy is to actively design prospective data collection, monitoring, analysis and validation of risk tools following the same standards as for clinical trials in order to elevate the quality of training data. The second suggestion is to make risk tools and model formulas available online. User-friendly risk tools will bring quantitative information to patients and their clinicians for improved knowledge-based decision-making. As past experience testifies, online tools expedite independent validation, providing helpful information as to whether the tools are generalizable to new populations. The third proposal is to dynamically update and localize risk tools to adapt to changing demographic and clinical landscapes. The fourth strategy is to accommodate systematic missing data patterns across cohorts in order to maximize the statistical power in model training, as well as to accommodate missing information on the end-user side too, in order to maximize utility for the public.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71320795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
‘You Draw It’: Implementation of Visually Fitted Trends with r2d3 “你画它”:使用r2d3实现视觉拟合趋势
Journal of data science : JDS Pub Date : 2022-01-01 DOI: 10.6339/22-jds1083
Emily A. Robinson, Réka Howard, Susan Vanderplas
{"title":"‘You Draw It’: Implementation of Visually Fitted Trends with r2d3","authors":"Emily A. Robinson, Réka Howard, Susan Vanderplas","doi":"10.6339/22-jds1083","DOIUrl":"https://doi.org/10.6339/22-jds1083","url":null,"abstract":"How do statistical regression results compare to intuitive, visually fitted results? Fitting lines by eye through a set of points has been explored since the 20th century. Common methods of fitting trends by eye involve maneuvering a string, black thread, or ruler until the fit is suitable, then drawing the line through the set of points. In 2015, the New York Times introduced an interactive feature, called ‘You Draw It,’ where readers are asked to input their own assumptions about various metrics and compare how these assumptions relate to reality. This research is intended to implement ‘You Draw It’, adapted from the New York Times, as a way to measure the patterns we see in data. In this paper, we describe the adaptation of an old tool for graphical testing and evaluation, eye-fitting, for use in modern web-applications suitable for testing statistical graphics. We present an empirical evaluation of this testing method for linear regression, and briefly discuss an extension of this method to non-linear applications.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71320862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Addressing the Impact of the COVID-19 Pandemic on Survival Outcomes in Randomized Phase III Oncology Trials 解决COVID-19大流行对随机III期肿瘤试验生存结果的影响
Journal of data science : JDS Pub Date : 2022-01-01 DOI: 10.6339/22-jds1079
Jiabu Ye, Binbing Yu, H. Mann, A. Sabin, Z. Szíjgyártó, David Wright, P. Mukhopadhyay, C. Massacesi, S. Ghiorghiu, R. Iacona
{"title":"Addressing the Impact of the COVID-19 Pandemic on Survival Outcomes in Randomized Phase III Oncology Trials","authors":"Jiabu Ye, Binbing Yu, H. Mann, A. Sabin, Z. Szíjgyártó, David Wright, P. Mukhopadhyay, C. Massacesi, S. Ghiorghiu, R. Iacona","doi":"10.6339/22-jds1079","DOIUrl":"https://doi.org/10.6339/22-jds1079","url":null,"abstract":"We assessed the impact of the coronavirus disease 2019 (COVID-19) pandemic on the statistical analysis of time-to-event outcomes in late-phase oncology trials. Using a simulated case study that mimics a Phase III ongoing trial during the pandemic, we evaluated the impact of COVID-19-related deaths, time off-treatment and missed clinical visits due to the pandemic, on overall survival and/or progression-free survival in terms of test size (also referred to as Type 1 error rate or alpha level), power, and hazard ratio (HR) estimates. We found that COVID-19-related deaths would impact both size and power, and lead to biased HR estimates; the impact would be more severe if there was an imbalance in COVID-19-related deaths between the study arms. Approaches censoring COVID-19-related deaths may mitigate the impact on power and HR estimation, especially if study data cut-off was extended to recover censoring-related event loss. The impact of COVID-19-related time off-treatment would be modest for power, and moderate for size and HR estimation. Different rules of censoring cancer progression times result in a slight difference in the power for the analysis of progression-free survival. The simulations provided valuable information for determining whether clinical-trial modifications should be required for ongoing trials during the COVID-19 pandemic.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71320957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identifying Prerequisite Courses in Undergraduate Biology Using Machine Learning 利用机器学习确定本科生物学的必修课程
Journal of data science : JDS Pub Date : 2022-01-01 DOI: 10.6339/22-jds1068
Youngjin Lee
{"title":"Identifying Prerequisite Courses in Undergraduate Biology Using Machine Learning","authors":"Youngjin Lee","doi":"10.6339/22-jds1068","DOIUrl":"https://doi.org/10.6339/22-jds1068","url":null,"abstract":"Many undergraduate students who matriculated in Science, Technology, Engineering and Mathematics (STEM) degree programs drop out or switch their major. Previous studies indicate that performance of students in prerequisite courses is important for attrition of students in STEM. This study analyzed demographic information, ACT/SAT score, and performance of students in freshman year courses to develop machine learning models predicting their success in earning a bachelor’s degree in biology. The predictive model based on Random Forest (RF) and Extreme Gradient Boosting (XGBoost) showed a better performance in terms of AUC (Area Under the Curve) with more balanced sensitivity and specificity than Logistic Regression (LR), K-Nearest Neighbor (KNN), and Neural Network (NN) models. An explainable machine learning approach called break-down was employed to identify important freshman year courses that could have a larger impact on student success at the biology degree program and student levels. More important courses identified at the program level can help program coordinators to prioritize their effort in addressing student attrition while more important courses identified at the student level can help academic advisors to provide more personalized, data-driven guidance to students.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":"491 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71320701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Hybrid Monitoring Procedure for Detecting Abnormality with Application to Energy Consumption Data 一种用于能耗数据异常检测的混合监测程序
Journal of data science : JDS Pub Date : 2022-01-01 DOI: 10.6339/22-jds1039
Daeyoung Lim, Ming-Hui Chen, N. Ravishanker, Mark Bolduc, Brian McKeon, Stanley Nolan
{"title":"A Hybrid Monitoring Procedure for Detecting Abnormality with Application to Energy Consumption Data","authors":"Daeyoung Lim, Ming-Hui Chen, N. Ravishanker, Mark Bolduc, Brian McKeon, Stanley Nolan","doi":"10.6339/22-jds1039","DOIUrl":"https://doi.org/10.6339/22-jds1039","url":null,"abstract":"The complexity of energy infrastructure at large institutions increasingly calls for data-driven monitoring of energy usage. This article presents a hybrid monitoring algorithm for detecting consumption surges using statistical hypothesis testing, leveraging the posterior distribution and its information about uncertainty to introduce randomness in the parameter estimates, while retaining the frequentist testing framework. This hybrid approach is designed to be asymptotically equivalent to the Neyman-Pearson test. We show via extensive simulation studies that the hybrid approach enjoys control over type-1 error rate even with finite sample sizes whereas the naive plug-in method tends to exceed the specified level, resulting in overpowered tests. The proposed method is applied to the natural gas usage data at the University of Connecticut.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71320187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Editorial: Large-Scale Spatial Data Science 社论:大规模空间数据科学
Journal of data science : JDS Pub Date : 2022-01-01 DOI: 10.6339/22-jds204edi
Sameh Abdulah, S. Castruccio, M. Genton, Ying Sun
{"title":"Editorial: Large-Scale Spatial Data Science","authors":"Sameh Abdulah, S. Castruccio, M. Genton, Ying Sun","doi":"10.6339/22-jds204edi","DOIUrl":"https://doi.org/10.6339/22-jds204edi","url":null,"abstract":"This special issue features eight articles on “Large-Scale Spatial Data Science.” Data science for complex and large-scale spatial and spatio-temporal data has become essential in many research fields, such as climate science and environmental applications. Due to the ever-increasing amounts of data collected, traditional statistical approaches tend to break down and computa-tionally efficient methods and scalable algorithms that are suitable for large-scale spatial data have become crucial to cope with many challenges associated with big data. This special issue aims at highlighting some of the latest developments in the area of large-scale spatial data science. The research papers presented showcase advanced statistical methods and machine learn-ing approaches for solving complex and large-scale problems arising from modern data science applications. Abdulah et al. (2022) reported the results of the second competition on spatial statistics for large datasets organized by the King Abdullah University of Science and Technology (KAUST). Very large datasets (up to 1 million in size) were generated with the ExaGeoStat software to design the competition on large-scale predictions in challenging settings, including univariate nonstationary spatial processes, univariate stationary space-time processes, and bivariate stationary spatial processes. The authors described the data generation process in detail in each setting and made these valuable datasets publicly available. They reviewed the methods used by fourteen competing teams worldwide, analyzed the results of the competition, and assessed the performance of each team.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71320563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Creating a Census County Assessment Tool for Visualizing Census Data 创建一个用于可视化普查数据的普查县评估工具
Journal of data science : JDS Pub Date : 2022-01-01 DOI: 10.6339/22-jds1082
Izzy Youngs, R. Prevost, Christopher Dick
{"title":"Creating a Census County Assessment Tool for Visualizing Census Data","authors":"Izzy Youngs, R. Prevost, Christopher Dick","doi":"10.6339/22-jds1082","DOIUrl":"https://doi.org/10.6339/22-jds1082","url":null,"abstract":"The 2020 Census County Assessment Tool was developed to assist decennial census data users in identifying deviations between expected census counts and the released counts across population and housing indicators. The tool also offers contextual data for each county on factors which could have contributed to census collection issues, such as self-response rates and COVID-19 infection rates. The tool compiles this information into a downloadable report and includes additional local data sources relevant to the data collection process and experts to seek more assistance.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71320850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Data Science Applications and Implications in Legal Studies: A Perspective Through Topic Modelling 数据科学在法律研究中的应用和意义:通过主题建模的视角
Journal of data science : JDS Pub Date : 2022-01-01 DOI: 10.6339/22-jds1058
Jinzhe Tan, Huan Wan, Ping Yan, Zheng Hua Zhu
{"title":"Data Science Applications and Implications in Legal Studies: A Perspective Through Topic Modelling","authors":"Jinzhe Tan, Huan Wan, Ping Yan, Zheng Hua Zhu","doi":"10.6339/22-jds1058","DOIUrl":"https://doi.org/10.6339/22-jds1058","url":null,"abstract":"Law and legal studies has been an exciting new field for data science applications whereas the technological advancement also has profound implications for legal practice. For example, the legal industry has accumulated a rich body of high quality texts, images and other digitised formats, which are ready to be further processed and analysed by data scientists. On the other hand, the increasing popularity of data science has been a genuine challenge to legal practitioners, regulators and even general public and has motivated a long-lasting debate in the academia focusing on issues such as privacy protection and algorithmic discrimination. This paper collects 1236 journal articles involving both law and data science from the platform Web of Science to understand the patterns and trends of this interdisciplinary research field in terms of English journal publications. We find a clear trend of increasing publication volume over time and a strong presence of high-impact law and political science journals. We then use the Latent Dirichlet Allocation (LDA) as a topic modelling method to classify the abstracts into four topics based on the coherence measure. The four topics identified confirm that both challenges and opportunities have been investigated in this interdisciplinary field and help offer directions for future research.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71320450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信