Journal of data science, statistics, and visualisation最新文献

筛选
英文 中文
Casting multiple shadows: interactive data visualisation with tours and embeddings 投射多个阴影:带有游览和嵌入的交互式数据可视化
Journal of data science, statistics, and visualisation Pub Date : 2022-05-30 DOI: 10.52933/jdssv.v2i3.21
Stuart Lee, U. Laa, D. Cook
{"title":"Casting multiple shadows: interactive data visualisation with tours and embeddings","authors":"Stuart Lee, U. Laa, D. Cook","doi":"10.52933/jdssv.v2i3.21","DOIUrl":"https://doi.org/10.52933/jdssv.v2i3.21","url":null,"abstract":"Non-linear dimensionality reduction (NLDR) methods such as t-distributed stochastic neighbour embedding (t-SNE) are ubiquitous in the natural sciences, however, the appropriate use of these methods is difficult because of their complex parameterisations; analysts must make trade-offs in order to identify structure in the visualisation of an NLDR technique. We present visual diagnostics for the pragmatic usage of NLDR methods by combining them with a technique called the tour. A tour is a sequence of interpolated linear projections of multivariate data onto a lower dimensional space. The sequence is displayed as a dynamic visualisation, allowing a user to see the shadows the high-dimensional data casts in a lower dimensional view. By linking the tour to an NLDR view, we can preserve global structure and through user interactions like linked brushing observe where the NLDR view may be misleading. We display several case studies from both simulations and single cell transcriptomics, that shows our approach is useful for cluster orientation tasks. The implementation of our framework is available as an R package called liminal available at https://github.com/sa-lee/liminal.","PeriodicalId":93459,"journal":{"name":"Journal of data science, statistics, and visualisation","volume":"12 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81018292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
INTEREST: INteractive Tool for Exploring REsults from Simulation sTudies. 兴趣:探索模拟研究结果的交互式工具。
Journal of data science, statistics, and visualisation Pub Date : 2021-12-31 DOI: 10.52933/jdssv.v1i4.9
Alessandro Gasparini, Tim P Morris, Michael J Crowther
{"title":"INTEREST: INteractive Tool for Exploring REsults from Simulation sTudies.","authors":"Alessandro Gasparini,&nbsp;Tim P Morris,&nbsp;Michael J Crowther","doi":"10.52933/jdssv.v1i4.9","DOIUrl":"https://doi.org/10.52933/jdssv.v1i4.9","url":null,"abstract":"<p><p>Simulation studies allow us to explore the properties of statistical methods. They provide a powerful tool with a multiplicity of aims; among others: evaluating and comparing new or existing statistical methods, assessing violations of modelling assumptions, helping with the understanding of statistical concepts, and supporting the design of clinical trials. The increased availability of powerful computational tools and usable software has contributed to the rise of simulation studies in the current literature. However, simulation studies involve increasingly complex designs, making it difficult to provide all relevant results clearly. Dissemination of results plays a focal role in simulation studies: it can drive applied analysts to use methods that have been shown to perform well in their settings, guide researchers to develop new methods in a promising direction, and provide insights into less established methods. It is crucial that we can digest relevant results of simulation studies. Therefore, we developed <b>INTEREST</b>: an <i>INteractive Tool for Exploring REsults from Simulation sTudies</i>. The tool has been developed using the <b>Shiny</b> framework in R and is available as a web app or as a standalone package. It requires uploading a tidy format dataset with the results of a simulation study in R, Stata, SAS, SPSS, or comma-separated format. A variety of performance measures are estimated automatically along with Monte Carlo standard errors; results and performance summaries are displayed both in tabular and graphical fashion, with a wide variety of available plots. Consequently, the reader can focus on simulation parameters and estimands of most interest. In conclusion, <b>INTEREST</b> can facilitate the investigation of results from simulation studies and supplement the reporting of results, allowing researchers to share detailed results from their simulations, readers to explore them freely.</p>","PeriodicalId":93459,"journal":{"name":"Journal of data science, statistics, and visualisation","volume":"1 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7612246/pdf/EMS140699.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39949693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
On Generalization and Computation of Tukey's Depth: Part II 土基深度的概化与计算:第二部分
Journal of data science, statistics, and visualisation Pub Date : 2021-12-15 DOI: 10.52933/jdssv.v2i2.61
Yiyuan She, Shao Tang, Jingze Liu
{"title":"On Generalization and Computation of Tukey's Depth: Part II","authors":"Yiyuan She, Shao Tang, Jingze Liu","doi":"10.52933/jdssv.v2i2.61","DOIUrl":"https://doi.org/10.52933/jdssv.v2i2.61","url":null,"abstract":"This paper studies how to generalize Tukey's depth to problems defined in a restricted space that may be curved or have boundaries, and to problems with a nondifferentiable objective. First, using a manifold approach, we propose a broad class of Riemannian \u0000depth for smooth problems defined on a Riemannian manifold, and showcase its applications in spherical data analysis, principal component analysis, and multivariate orthogonal regression. Moreover, for nonsmooth problems, we introduce additional slack variables and inequality constraints to define a novel slacked data depth, which can perform center-outward rankings of estimators arising from sparse learning and reduced rank regression. Real data examples illustrate the usefulness of some proposed data depths. \u0000 ","PeriodicalId":93459,"journal":{"name":"Journal of data science, statistics, and visualisation","volume":"4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86968094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
On Generalization and Computation of Tukey's Depth: Part I 土基深度的概化与计算:第一部分
Journal of data science, statistics, and visualisation Pub Date : 2021-12-15 DOI: 10.52933/jdssv.v2i1.23
Yiyuan She, S. Tang, Jingze Liu
{"title":"On Generalization and Computation of Tukey's Depth: Part I","authors":"Yiyuan She, S. Tang, Jingze Liu","doi":"10.52933/jdssv.v2i1.23","DOIUrl":"https://doi.org/10.52933/jdssv.v2i1.23","url":null,"abstract":"Tukey's depth offers a powerful tool for nonparametric inference and estimation, but also encounters serious computational and methodological difficulties in modern statistical data analysis. This paper studies how to generalize and compute Tukey-type depths in multi-dimensions. A general framework of influence-driven polished subspace depth, which emphasizes the importance of the underlying influence space and discrepancy measure, is introduced. The new matrix formulation enables us to utilize state-of-the-art optimization techniques to develop scalable algorithms with implementation ease and guaranteed fast convergence. In particular, half-space depth as well as regression depth can now be computed much faster than previously possible, with the support from extensive experiments. A companion paper is also offered to the reader in the same issue of this journal.","PeriodicalId":93459,"journal":{"name":"Journal of data science, statistics, and visualisation","volume":"80 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89122730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Editorial Founding Issue 创刊编辑
Journal of data science, statistics, and visualisation Pub Date : 2021-09-30 DOI: 10.52933/jdssv.v1i1.52
S. Aelst, P. Groenen
{"title":"Editorial Founding Issue","authors":"S. Aelst, P. Groenen","doi":"10.52933/jdssv.v1i1.52","DOIUrl":"https://doi.org/10.52933/jdssv.v1i1.52","url":null,"abstract":"The Journal of Data Science, Statistics, and Visualisation (JDSSV) is an electronic journal which welcomes contributions to data science, statistics, and visualisation, and in particular, those aspects which link and integrate these subject areas. Articles can cover topics such as machine learning and statistical learning, the visualisation and verbalisation of data, visual analytics, big data infrastructures and analytics, interactive learning, and advanced computing. Articles thatdiscuss two or more research areas of the journal are favoured. Scientific contributions should be of a high standard. Articles should be oriented towards a wide scientific audience of statisticians, data scientists, computer scientists, data analysts, etc. The journal welcomes original contributions that are not being considered for publication elsewhere and contain a high level of novelty. Articles with a thorough but concise review of a certain topic with the potential to provide new insights are also welcome. Manuscripts submitted to the journal generally are accompanied by supplementary material containing software code, data, technical derivations or detailed explanations, additional examples, etc. All submitted material will be reviewed by the assigned associate editor and reviewers of the manuscript.","PeriodicalId":93459,"journal":{"name":"Journal of data science, statistics, and visualisation","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86046756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Spatial SEIR Model for COVID-19 in South Africa 南非COVID-19的空间SEIR模型
Journal of data science, statistics, and visualisation Pub Date : 2021-06-09 DOI: 10.20944/PREPRINTS202106.0262.V1
I. Fabris-Rotelli, Jenny P. Holloway, Zaid Kimmie, S. Archibald, P. Debba, Raeesa Manjoo-Docrat, A. Roux, Nontembeko Dudeni-Tlhone, Charl Janse van Rensburg, R. Thiede, N. Abdelatif, Sibusisiwe Makhanya, Arminn Potgieter
{"title":"A Spatial SEIR Model for COVID-19 in South Africa","authors":"I. Fabris-Rotelli, Jenny P. Holloway, Zaid Kimmie, S. Archibald, P. Debba, Raeesa Manjoo-Docrat, A. Roux, Nontembeko Dudeni-Tlhone, Charl Janse van Rensburg, R. Thiede, N. Abdelatif, Sibusisiwe Makhanya, Arminn Potgieter","doi":"10.20944/PREPRINTS202106.0262.V1","DOIUrl":"https://doi.org/10.20944/PREPRINTS202106.0262.V1","url":null,"abstract":"The virus SARS-CoV-2 has resulted in numerous modelling approaches arising rapidly to understand the spread of the disease COVID-19 and to plan for future interventions. Herein, we present an SEIR model with a spatial spread component as well as four infectious compartments to account for the variety of symptom levels and transmission rate. The model takes into account the pattern of spatial vulnerability in South Africa through a vulnerability index that is based on socioeconomic and health susceptibility characteristics. Another spatially relevant factor in this context is level of mobility throughout. The thesis of this study is that without the contextual spatial spread modelling, the heterogeneity in COVID-19 prevalence in the South African setting would not be captured. The model is illustrated on South African COVID-19 case counts and hospitalisations.","PeriodicalId":93459,"journal":{"name":"Journal of data science, statistics, and visualisation","volume":"84 1","pages":"14-45"},"PeriodicalIF":0.0,"publicationDate":"2021-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85564927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Review of Containerization for Interactive and Reproducible Analysis 交互式和可重复分析的容器化研究综述
Journal of data science, statistics, and visualisation Pub Date : 2021-03-30 DOI: 10.52933/jdssv.v3i1.53
Gregory J. Hunt, Johann A. Gagnon-Bartsch
{"title":"A Review of Containerization for Interactive and Reproducible Analysis","authors":"Gregory J. Hunt, Johann A. Gagnon-Bartsch","doi":"10.52933/jdssv.v3i1.53","DOIUrl":"https://doi.org/10.52933/jdssv.v3i1.53","url":null,"abstract":"In recent decades the analysis of data has become increasingly computational. Correspondingly, this has changed how scientific and statistical work is shared. For example, it is now commonplace for underlying analysis code and data to be proffered alongside journal publications and conference talks. Unfortunately, sharing code faces several challenges. First, it is often difficult to take code from one computer and run it on another. Code configuration, version, and dependency issues often make this challenging. Secondly, even if the code runs, it is often hard to understand or interact with the analysis. This makes it difficult to assess the code and its findings, for example, in a peer review process. In this review we describe the combination of two computing technologies that help make analyses shareable, interactive, and completely reproducible. These technologies are (1) analysis containerization, which leverages virtualization to fully encapsulate analysis, data, code and dependencies into an interactive and shareable format, and (2) code notebooks, a literate programming format for interacting with analyses. The fusion of these two technologies offers significant advantages over using either individually. This review surveys how the combination enhances the accessibility and reproducibility of code, analyses, and ideas.","PeriodicalId":93459,"journal":{"name":"Journal of data science, statistics, and visualisation","volume":"16 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73237900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust Model-Based Clustering 稳健的基于模型的聚类
Journal of data science, statistics, and visualisation Pub Date : 2021-02-13 DOI: 10.1201/b18358-20
Juan D. González, R. Maronna, V. Yohai, R. Zamar
{"title":"Robust Model-Based Clustering","authors":"Juan D. González, R. Maronna, V. Yohai, R. Zamar","doi":"10.1201/b18358-20","DOIUrl":"https://doi.org/10.1201/b18358-20","url":null,"abstract":"We propose a class of Fisher-consistent robust estimators for mixture models. These estimators are then used to build a robust model-based clustering procedure. We study in detail the case of multivariate Gaussian mixtures and propose an algorithm, similar to the EM algorithm, to compute the proposed estimators and build the robust clusters. An extensive Monte Carlo simulation study shows that our proposal outperforms other robust and non robust, state of the art, model-based clustering procedures. We apply our proposal to a real data set and show that again it outperforms alternative procedures.","PeriodicalId":93459,"journal":{"name":"Journal of data science, statistics, and visualisation","volume":"42 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85674805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Handling Cellwise Outliers by Sparse Regression and Robust Covariance 稀疏回归和稳健协方差处理单元格异常值
Journal of data science, statistics, and visualisation Pub Date : 2020-12-07 DOI: 10.52933/jdssv.v1i3.18
Jakob Raymaekers, P. Rousseeuw
{"title":"Handling Cellwise Outliers by Sparse Regression and Robust Covariance","authors":"Jakob Raymaekers, P. Rousseeuw","doi":"10.52933/jdssv.v1i3.18","DOIUrl":"https://doi.org/10.52933/jdssv.v1i3.18","url":null,"abstract":"We propose a data-analytic method for detecting cellwise outliers. Given a robust covariance matrix, outlying cells (entries) in a row are found by the cellFlagger technique which combines lasso regression with a stepwise application of constructed cutoff values. The penalty term of the lasso has a physical interpretation as the total distance that suspicious cells need to move in order to bring their row into the fold. For estimating a cellwise robust covariance matrix we construct a detection-imputation method which alternates between flagging outlying cells and updating the covariance matrix as in the EM algorithm. The proposed methods are illustrated by simulations and on real data about volatile organic compounds in children.","PeriodicalId":93459,"journal":{"name":"Journal of data science, statistics, and visualisation","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82184537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Compressed sensing with a jackknife and a bootstrap 压缩传感与一个小刀和一个引导
Journal of data science, statistics, and visualisation Pub Date : 2018-09-18 DOI: 10.52933/jdssv.v2i4.43
Aaron Defazio, M. Tygert, Rachel A. Ward, Jure Zbontar
{"title":"Compressed sensing with a jackknife and a bootstrap","authors":"Aaron Defazio, M. Tygert, Rachel A. Ward, Jure Zbontar","doi":"10.52933/jdssv.v2i4.43","DOIUrl":"https://doi.org/10.52933/jdssv.v2i4.43","url":null,"abstract":"Compressed sensing proposes to reconstruct more degrees of freedom in a signal than the number of values actually measured (based on a potentially unjustified regularizer or prior distribution). Compressed sensing therefore risks introducing errors -- inserting spurious artifacts or masking the abnormalities that medical imaging seeks to discover. Estimating errors using the standard statistical tools of a jackknife and a bootstrap yields \"error bars\" in the form of full images that are remarkably qualitatively representative of the actual errors (at least when evaluated and validated on data sets for which the ground truth and hence the actual error is available). These images show the structure of possible errors -- without recourse to measuring the entire ground truth directly -- and build confidence in regions of the images where the estimated errors are small. Further visualizations and summary statistics can aid in the interpretation of such error estimates. Visualizations include suitable colorizations of the reconstruction, as well as the obvious \"correction\" of the reconstruction by subtracting off the error estimates. The canonical summary statistic would be the root-mean-square of the error estimates. Unfortunately, colorizations appear likely to be too distracting for actual clinical practice in medical imaging, and the root-mean-square gets swamped by background noise in the error estimates. Fortunately, straightforward displays of the error estimates and of the \"corrected\" reconstruction are illuminating, and the root-mean-square improves greatly after mild blurring of the error estimates; the blurring is barely perceptible to the human eye yet smooths away background noise that would otherwise overwhelm the root-mean-square.","PeriodicalId":93459,"journal":{"name":"Journal of data science, statistics, and visualisation","volume":"59 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85227334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信