Foundations of data science (Springfield, Mo.)最新文献

筛选
英文 中文
A log-Gaussian Cox process with sequential Monte Carlo for line narrowing in spectroscopy 谱线窄化的对数高斯-考克斯过程
Foundations of data science (Springfield, Mo.) Pub Date : 2022-02-26 DOI: 10.3934/fods.2023008
T. Harkonen, Emma Hannula, M. Moores, E. Vartiainen, L. Roininen
{"title":"A log-Gaussian Cox process with sequential Monte Carlo for line narrowing in spectroscopy","authors":"T. Harkonen, Emma Hannula, M. Moores, E. Vartiainen, L. Roininen","doi":"10.3934/fods.2023008","DOIUrl":"https://doi.org/10.3934/fods.2023008","url":null,"abstract":"We propose a statistical model for narrowing line shapes in spectroscopy that are well approximated as linear combinations of Lorentzian or Voigt functions. We introduce a log-Gaussian Cox process to represent the peak locations thereby providing uncertainty quantification for the line narrowing. Bayesian formulation of the method allows for robust and explicit inclusion of prior information as probability distributions for parameters of the model. Estimation of the signal and its parameters is performed using a sequential Monte Carlo algorithm followed by an optimization step to determine the peak locations. Our method is validated using a simulation study and applied to a mineralogical Raman spectrum.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45413111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data based quantification of synchronization 基于数据的同步量化
Foundations of data science (Springfield, Mo.) Pub Date : 2022-01-01 DOI: 10.3934/fods.2022020
{"title":"Data based quantification of synchronization","authors":"","doi":"10.3934/fods.2022020","DOIUrl":"https://doi.org/10.3934/fods.2022020","url":null,"abstract":"","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70248163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Addressing confirmation bias in middle school data science education 解决中学数据科学教育中的确认偏误
Foundations of data science (Springfield, Mo.) Pub Date : 2022-01-01 DOI: 10.3934/fods.2021035
S. Hedges, Kim Given
{"title":"Addressing confirmation bias in middle school data science education","authors":"S. Hedges, Kim Given","doi":"10.3934/fods.2021035","DOIUrl":"https://doi.org/10.3934/fods.2021035","url":null,"abstract":"More research is needed involving middle school students' engagement in the statistical problem-solving process, particularly the beginning process steps: formulate a question and make a plan to collect data/consider the data. Further, the increased availability of large-scale electronically accessible data sets is an untapped area of study. This interpretive study examined middle school students' understanding of statistical concepts involved in making a plan to collect data to answer a statistical question within a social issue context using data available on the internet. Student artifacts, researcher notes, and audio and video recordings from nine groups of 20 seventh-grade students in two gifted education pull-out classes at a suburban middle school were used to answer the study research questions. Data were analyzed using a priori codes from previously developed frameworks and by using an inductive approach to find themes.Three themes that emerged from data related to confirmation bias. Some middle school students held preconceptions about the social issues they chose to study that biased their statistical questions. This in turn influenced the sources of data students used to answer their questions. Confirmation bias is a serious issue that is exacerbated due to endless sources of data electronically available. We argue that this type of bias should be addressed early in students' educational experiences. Based on the findings from this study, we offer recommendations for future research and implications for statistics and data science education.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70248512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Statistical inference for persistent homology applied to simulated fMRI time series data 持续同源性的统计推断应用于模拟fMRI时间序列数据
Foundations of data science (Springfield, Mo.) Pub Date : 2022-01-01 DOI: 10.3934/fods.2022014
H. Abdallah, Adam J. Regalski, Mohammad Behzad Kang, Maria Berishaj, N. Nnadi, Asadur Chowdury, V. Diwadkar, A. Salch
{"title":"Statistical inference for persistent homology applied to simulated fMRI time series data","authors":"H. Abdallah, Adam J. Regalski, Mohammad Behzad Kang, Maria Berishaj, N. Nnadi, Asadur Chowdury, V. Diwadkar, A. Salch","doi":"10.3934/fods.2022014","DOIUrl":"https://doi.org/10.3934/fods.2022014","url":null,"abstract":"Time-series data are amongst the most widely-used in biomedical sciences, including domains such as functional Magnetic Resonance Imaging (fMRI). Structure within time series data can be captured by the tools of topological data analysis (TDA). Persistent homology is the mostly commonly used data-analytic tool in TDA, and can effectively summarize complex high-dimensional data into an interpretable 2-dimensional representation called a persistence diagram. Existing methods for statistical inference for persistent homology of data depend on an independence assumption being satisfied. While persistent homology can be computed for each time index in a time-series, time-series data often fail to satisfy the independence assumption. This paper develops a statistical test that obviates the independence assumption by implementing a multi-level block sampled Monte Carlo test with sets of persistence diagrams. Its efficacy for detecting task-dependent topological organization is then demonstrated on simulated fMRI data. This new statistical test is therefore suitable for analyzing persistent homology of fMRI data, and of non-independent data in general.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70248128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Teaching data science to students in biology using R, RStudio and Learnr: Analysis of three years data 使用R、RStudio和Learnr向生物学专业的学生教授数据科学:三年数据分析
Foundations of data science (Springfield, Mo.) Pub Date : 2022-01-01 DOI: 10.3934/fods.2022022
G. Engels, P. Grosjean, Frédérique Artus
{"title":"Teaching data science to students in biology using R, RStudio and Learnr: Analysis of three years data","authors":"G. Engels, P. Grosjean, Frédérique Artus","doi":"10.3934/fods.2022022","DOIUrl":"https://doi.org/10.3934/fods.2022022","url":null,"abstract":"We examine the impact of implementing active pedagogical methodologies in three successive data science courses for a biology curriculum at the University of Mons, Belgium. Blended learning and flipped classroom approaches were adopted, with an emphasis on project-based biological data analysis. Four successive types of exercises of increasing difficulties were proposed to the students. Tutorials written with the R package learnr were identified as a critical step to transition between theory and the application of the concepts. The cognitive workload needed to complete the learnr tutorials was measured for the three courses and it was only lower for the last course, suggesting students needed a long time to get used to their software environment (R, RStudio and git). Data relative to students' activity, collected primarily from the ongoing assessment, were also used to establish student profiles according to their learning strategies. Several suboptimal strategies were observed and discussed. Finally, the timing of students contributions, and the intensity of teacher-learner interactions related to these contributions were analyzed before, during and after the mandatory distance learning due to the COVID-19 lockdown. A lag phase was visible at the beginning of the first lockdown, but the students' work was not markedly affected during the second lockdown period which lasted much longer.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70248176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Applying topological data analysis to local search problems 将拓扑数据分析应用于局部搜索问题
Foundations of data science (Springfield, Mo.) Pub Date : 2022-01-01 DOI: 10.3934/fods.2022006
Erik Carlsson, J. Carlsson, Shannon Sweitzer
{"title":"Applying topological data analysis to local search problems","authors":"Erik Carlsson, J. Carlsson, Shannon Sweitzer","doi":"10.3934/fods.2022006","DOIUrl":"https://doi.org/10.3934/fods.2022006","url":null,"abstract":"<p style='text-indent:20px;'>We present an application of topological data analysis (TDA) to discrete optimization problems, which we show can improve the performance of the 2-opt local search method for the traveling salesman problem by simply applying standard Vietoris-Rips construction to a data set of trials. We then construct a simplicial complex which is specialized for this sort of simulated data set, determined by a stochastic matrix with a steady state vector <inline-formula><tex-math id=\"M1\">begin{document}$ (P,pi) $end{document}</tex-math></inline-formula>. When <inline-formula><tex-math id=\"M2\">begin{document}$ P $end{document}</tex-math></inline-formula> is induced from a random walk on a finite metric space, this complex exhibits similarities with standard constructions such as Vietoris-Rips on the data set, but is not sensitive to outliers, as sparsity is a natural feature of the construction. We interpret the persistent homology groups in several examples coming from random walks and discrete optimization, and illustrate how higher dimensional Betti numbers can be used to classify connected components, i.e. zero dimensional features in higher dimensions.</p>","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70248570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Multimodal correlations-based data clustering 基于多模态相关的数据聚类
Foundations of data science (Springfield, Mo.) Pub Date : 2022-01-01 DOI: 10.3934/fods.2022011
Jia Chen, I. Schizas
{"title":"Multimodal correlations-based data clustering","authors":"Jia Chen, I. Schizas","doi":"10.3934/fods.2022011","DOIUrl":"https://doi.org/10.3934/fods.2022011","url":null,"abstract":"This work proposes a novel technique for clustering multimodal data according to their information content. Statistical correlations present in data that contain similar information are exploited to perform the clustering task. Specifically, multiset canonical correlation analysis is equipped with norm-one regularization mechanisms to identify clusters within different types of data that share the same information content. A pertinent minimization formulation is put forth, while block coordinate descent is employed to derive a batch clustering algorithm which achieves better clustering performance than existing alternatives. Relying on subgradient descent, an online clustering approach is derived which substantially lowers computational complexity compared to the batch approach, while not compromising significantly the clustering performance. It is established that for an increasing number of data the novel regularized multiset framework is able to correctly cluster the multimodal data entries. Further, it is proved that the online clustering scheme converges with probability one to a stationary point of the ensemble regularized multiset correlations cost having the potential to recover the correct clusters. Extensive numerical tests demonstrate that the novel clustering scheme outperforms existing alternatives, while the online scheme achieves substantial computational savings.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70248118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HOMOTOPY CONTINUATION FOR THE SPECTRA OF PERSISTENT LAPLACIANS. 持久拉普拉斯算子谱的同伦延拓。
Foundations of data science (Springfield, Mo.) Pub Date : 2021-12-01 DOI: 10.3934/fods.2021017
Xiaoqi Wei, Guo-Wei Wei
{"title":"HOMOTOPY CONTINUATION FOR THE SPECTRA OF PERSISTENT LAPLACIANS.","authors":"Xiaoqi Wei,&nbsp;Guo-Wei Wei","doi":"10.3934/fods.2021017","DOIUrl":"https://doi.org/10.3934/fods.2021017","url":null,"abstract":"<p><p>The <i>p</i>-persistent <i>q</i>-combinatorial Laplacian defined for a pair of simplicial complexes is a generalization of the <i>q</i>-combinatorial Laplacian. Given a filtration, the spectra of persistent combinatorial Laplacians not only recover the persistent Betti numbers of persistent homology but also provide extra multiscale geometrical information of the data. Paired with machine learning algorithms, the persistent Laplacian has many potential applications in data science. Seeking different ways to find the spectrum of an operator is an active research topic, becoming interesting when ideas are originated from multiple fields. In this work, we explore an alternative approach for the spectrum of persistent Laplacians. As the eigenvalues of a persistent Laplacian matrix are the roots of its characteristic polynomial, one may attempt to find the roots of the characteristic polynomial by homotopy continuation, and thus resolving the spectrum of the corresponding persistent Laplacian. We consider a set of simple polytopes and small molecules to prove the principle that algebraic topology, combinatorial graph, and algebraic geometry can be integrated to understand the shape of data.</p>","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":"3 4","pages":"677-700"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9273002/pdf/nihms-1768199.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40610845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Analysis of the feedback particle filter with diffusion map based approximation of the gain 基于扩散图逼近增益的反馈粒子滤波器分析
Foundations of data science (Springfield, Mo.) Pub Date : 2021-09-06 DOI: 10.3934/fods.2021023
S. Pathiraja, W. Stannat
{"title":"Analysis of the feedback particle filter with diffusion map based approximation of the gain","authors":"S. Pathiraja, W. Stannat","doi":"10.3934/fods.2021023","DOIUrl":"https://doi.org/10.3934/fods.2021023","url":null,"abstract":"<p style='text-indent:20px;'>Control-type particle filters have been receiving increasing attention over the last decade as a means of obtaining sample based approximations to the sequential Bayesian filtering problem in the nonlinear setting. Here we analyse one such type, namely the feedback particle filter and a recently proposed approximation of the associated gain function based on diffusion maps. The key purpose is to provide analytic insights on the form of the approximate gain, which are of interest in their own right. These are then used to establish a roadmap to obtaining well-posedness and convergence of the finite <inline-formula><tex-math id=\"M1\">begin{document}$ N $end{document}</tex-math></inline-formula> system to its mean field limit. A number of possible future research directions are also discussed.</p>","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42664532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Fast computation of persistent homology representatives with involuted persistent homology 具有对折持久同调的持久同调表示的快速计算
Foundations of data science (Springfield, Mo.) Pub Date : 2021-05-08 DOI: 10.3934/fods.2023006
Matija vCufar, Žiga Virk
{"title":"Fast computation of persistent homology representatives with involuted persistent homology","authors":"Matija vCufar, Žiga Virk","doi":"10.3934/fods.2023006","DOIUrl":"https://doi.org/10.3934/fods.2023006","url":null,"abstract":"Persistent homology is typically computed through persistent cohomology. While this generally improves the running time significantly, it does not facilitate extraction of homology representatives. The mentioned representatives are geometric manifestations of the corresponding holes and often carry desirable information. We propose a new method of extraction of persistent homology representatives using cohomology. In a nutshell, we first compute persistent cohomology and use the obtained information to significantly improve the running time of the direct persistent homology computations. This algorithm applied to Rips filtrations generally computes persistent homology representatives much faster than the standard methods.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48820334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信