Pandata可扩展的开源分析堆栈

Proceedings of the Python in Science Conference Pub Date : 1900-01-01 DOI:10.25080/gerudo-f2bc6f59-00b

James Bednar, Martin Durant

{"title":"Pandata可扩展的开源分析堆栈","authors":"James Bednar, Martin Durant","doi":"10.25080/gerudo-f2bc6f59-00b","DOIUrl":null,"url":null,"abstract":"—As the scale of scientiﬁc data analysis continues to grow, traditional domain-speciﬁc tools often struggle with data of increasing size and complexity. These tools also face sustainability challenges due to a relatively narrow user base, a limited pool of contributors, and constrained funding sources. We introduce the Pandata open-source software stack as a solution, emphasizing the use of domain-independent tools at critical stages of the data life cycle, without compromising the depth of domain-speciﬁc analyses. This set of interoperable and compositional tools, including Dask, Xarray, Numba, hvPlot, Panel, and Jupyter, provides a versatile and sustainable model for data analysis and scientiﬁc computation. Collectively, the Pandata stack covers the landscape of data access, distributed computation, and interactive visualization across any domain or scale. See github.com/panstacks/pandata to get started using this stack or to help contribute to it.","PeriodicalId":364654,"journal":{"name":"Proceedings of the Python in Science Conference","volume":"57 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The Pandata Scalable Open-Source Analysis Stack\",\"authors\":\"James Bednar, Martin Durant\",\"doi\":\"10.25080/gerudo-f2bc6f59-00b\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"—As the scale of scientiﬁc data analysis continues to grow, traditional domain-speciﬁc tools often struggle with data of increasing size and complexity. These tools also face sustainability challenges due to a relatively narrow user base, a limited pool of contributors, and constrained funding sources. We introduce the Pandata open-source software stack as a solution, emphasizing the use of domain-independent tools at critical stages of the data life cycle, without compromising the depth of domain-speciﬁc analyses. This set of interoperable and compositional tools, including Dask, Xarray, Numba, hvPlot, Panel, and Jupyter, provides a versatile and sustainable model for data analysis and scientiﬁc computation. Collectively, the Pandata stack covers the landscape of data access, distributed computation, and interactive visualization across any domain or scale. See github.com/panstacks/pandata to get started using this stack or to help contribute to it.\",\"PeriodicalId\":364654,\"journal\":{\"name\":\"Proceedings of the Python in Science Conference\",\"volume\":\"57 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Python in Science Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.25080/gerudo-f2bc6f59-00b\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Python in Science Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.25080/gerudo-f2bc6f59-00b","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

-随着科学数据分析规模的不断增长，传统的特定领域工具往往难以处理不断增长的数据规模和复杂性。这些工具还面临着可持续性的挑战，因为用户基础相对狭窄，贡献者有限，资金来源有限。我们引入了Pandata开源软件堆栈作为解决方案，强调在数据生命周期的关键阶段使用领域独立的工具，而不影响领域特定分析的深度。这组可互操作的组合工具，包括Dask、Xarray、Numba、hvPlot、Panel和Jupyter，为数据分析和科学计算提供了一个通用且可持续的模型。总的来说，Pandata堆栈涵盖了跨任何领域或规模的数据访问、分布式计算和交互式可视化。请参阅github.com/panstacks/pandata以开始使用此堆栈或帮助为其做出贡献。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

The Pandata Scalable Open-Source Analysis Stack

—As the scale of scientiﬁc data analysis continues to grow, traditional domain-speciﬁc tools often struggle with data of increasing size and complexity. These tools also face sustainability challenges due to a relatively narrow user base, a limited pool of contributors, and constrained funding sources. We introduce the Pandata open-source software stack as a solution, emphasizing the use of domain-independent tools at critical stages of the data life cycle, without compromising the depth of domain-speciﬁc analyses. This set of interoperable and compositional tools, including Dask, Xarray, Numba, hvPlot, Panel, and Jupyter, provides a versatile and sustainable model for data analysis and scientiﬁc computation. Collectively, the Pandata stack covers the landscape of data access, distributed computation, and interactive visualization across any domain or scale. See github.com/panstacks/pandata to get started using this stack or to help contribute to it.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the Python in Science Conference

自引率

0.00%

发文量