Python and R for the Modern Data Scientist

IF 5.4 2区 计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
C. Lortie
{"title":"Python and R for the Modern Data Scientist","authors":"C. Lortie","doi":"10.18637/jss.v103.b02","DOIUrl":null,"url":null,"abstract":"Computation in many fields including those that use statistical software is increasingly driven by needs that can be addressed in many programming ecosystems. In projects that require statistical analyses, both R and Python comprise two frequent resources. In ecology, R is the most frequently used (Lai, Lortie, Muenchen, Yang, and Ma 2019). In bioinformatic gene set analyses, R is also more frequently used in peer-reviewed publications, but Python is still an important statistical resource depending on the specific project (Xie, Jauhari, and Mora 2021). Python outcompetes other languages in use for machine learning and some forms of factor analyses (Hao and Ho 2019; Persson and Khojasteh 2021; Raschka, Patterson, and Nolet 2020). However, the relative frequency that a tool is used for statistical analyses is only one metric of importance and not necessarily a proxy for its merit or its capacity to support innovation and efficient in analyses for practitioners (Zhao, Yan, and Li 2018). It is thus critical that we explore contrasts of at least these two common software languages that support statistics because data scientists can become isolated or polarized within their specific competencies, ideologies, and workflows. A high-level discussion of strengths and weaknesses specific to data endeavors with statistics is germane to both decisions on specific projects and on competency development as a scientist.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":null,"pages":null},"PeriodicalIF":5.4000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Statistical Software","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.18637/jss.v103.b02","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 1

Abstract

Computation in many fields including those that use statistical software is increasingly driven by needs that can be addressed in many programming ecosystems. In projects that require statistical analyses, both R and Python comprise two frequent resources. In ecology, R is the most frequently used (Lai, Lortie, Muenchen, Yang, and Ma 2019). In bioinformatic gene set analyses, R is also more frequently used in peer-reviewed publications, but Python is still an important statistical resource depending on the specific project (Xie, Jauhari, and Mora 2021). Python outcompetes other languages in use for machine learning and some forms of factor analyses (Hao and Ho 2019; Persson and Khojasteh 2021; Raschka, Patterson, and Nolet 2020). However, the relative frequency that a tool is used for statistical analyses is only one metric of importance and not necessarily a proxy for its merit or its capacity to support innovation and efficient in analyses for practitioners (Zhao, Yan, and Li 2018). It is thus critical that we explore contrasts of at least these two common software languages that support statistics because data scientists can become isolated or polarized within their specific competencies, ideologies, and workflows. A high-level discussion of strengths and weaknesses specific to data endeavors with statistics is germane to both decisions on specific projects and on competency development as a scientist.
面向现代数据科学家的Python和R
包括使用统计软件在内的许多领域的计算越来越多地受到许多编程生态系统中可以解决的需求的驱动。在需要统计分析的项目中,R和Python都包含两种常用资源。在生态学中,R是最常用的(Lai, Lortie, Muenchen, Yang, and Ma 2019)。在生物信息学基因集分析中,R也更频繁地用于同行评审的出版物,但根据具体项目,Python仍然是重要的统计资源(Xie, Jauhari, and Mora 2021)。Python在机器学习和某些形式的因素分析方面胜过其他语言(Hao and Ho 2019;Persson and Khojasteh 2021;Raschka, Patterson, and Nolet 2020)。然而,工具用于统计分析的相对频率只是一个重要的度量标准,并不一定代表其优点或支持从业者创新和高效分析的能力(Zhao, Yan, and Li, 2018)。因此,我们探索至少这两种支持统计的常见软件语言的对比是至关重要的,因为数据科学家可能在他们的特定能力、意识形态和工作流程中变得孤立或两极分化。对统计数据的优缺点进行高层次的讨论,对具体项目的决策和作为科学家的能力发展都是有密切关系的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Statistical Software
Journal of Statistical Software 工程技术-计算机:跨学科应用
CiteScore
10.70
自引率
1.70%
发文量
40
审稿时长
6-12 weeks
期刊介绍: The Journal of Statistical Software (JSS) publishes open-source software and corresponding reproducible articles discussing all aspects of the design, implementation, documentation, application, evaluation, comparison, maintainance and distribution of software dedicated to improvement of state-of-the-art in statistical computing in all areas of empirical research. Open-source code and articles are jointly reviewed and published in this journal and should be accessible to a broad community of practitioners, teachers, and researchers in the field of statistics.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信