The R Quest: from Users to Developers

R J. Pub Date : 2021-01-01 DOI:10.32614/rj-2021-111
Simon Urbanek
{"title":"The R Quest: from Users to Developers","authors":"Simon Urbanek","doi":"10.32614/rj-2021-111","DOIUrl":null,"url":null,"abstract":"R is not a programming language, and this produces the inherent dichotomy between analytics and software engineering. With the emergence of data science, the opportunity exists to bridge this gap, especially through teaching practices. Genesis: How did we get here? The article “Software Engineering and R Programming: A Call to Action” summarizes the dichotomy between analytics and software engineering in the R ecosystem, provides examples where this leads to problems and proposes what we as R users can do to bridge the gap. Data Analytic Language The fundamental basis of the dichotomy is inherent in the evolution of S and R: they are not programming languages, but they ended up being mistaken for such. S was designed to be a data analytic language: to turn ideas into software quickly and faithfully, often used in “non-programming” style (Chambers, 1998). Its original goal was to enable the statisticians to apply code which was written in programming languages (at the time mostly FORTRAN) to analyze data quickly and interactively for some suitable definition of “interactive” at the time (Becker, 1994). The success of S and then R can be traced to the ability to perform data analysis by applying existing tools to data in creative ways. A data analysis is a quest at every step we learn more about the data which informs our decision about next steps. Whether it is an exploratory data analysis leveraging graphics or computing statistics or fitting models the final goal is typically not known ahead of time, it is obtained by an iterative process of applying tools that we as analysts think may lead us further (Tukey, 1977). It is important to note that this is exactly the opposite of software engineering where there is a well-defined goal: a specification or desired outcome, which simply needs to be expressed in a way understandable to the computer.","PeriodicalId":20974,"journal":{"name":"R J.","volume":"475 1","pages":"697"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"R J.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.32614/rj-2021-111","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

R is not a programming language, and this produces the inherent dichotomy between analytics and software engineering. With the emergence of data science, the opportunity exists to bridge this gap, especially through teaching practices. Genesis: How did we get here? The article “Software Engineering and R Programming: A Call to Action” summarizes the dichotomy between analytics and software engineering in the R ecosystem, provides examples where this leads to problems and proposes what we as R users can do to bridge the gap. Data Analytic Language The fundamental basis of the dichotomy is inherent in the evolution of S and R: they are not programming languages, but they ended up being mistaken for such. S was designed to be a data analytic language: to turn ideas into software quickly and faithfully, often used in “non-programming” style (Chambers, 1998). Its original goal was to enable the statisticians to apply code which was written in programming languages (at the time mostly FORTRAN) to analyze data quickly and interactively for some suitable definition of “interactive” at the time (Becker, 1994). The success of S and then R can be traced to the ability to perform data analysis by applying existing tools to data in creative ways. A data analysis is a quest at every step we learn more about the data which informs our decision about next steps. Whether it is an exploratory data analysis leveraging graphics or computing statistics or fitting models the final goal is typically not known ahead of time, it is obtained by an iterative process of applying tools that we as analysts think may lead us further (Tukey, 1977). It is important to note that this is exactly the opposite of software engineering where there is a well-defined goal: a specification or desired outcome, which simply needs to be expressed in a way understandable to the computer.
R任务:从用户到开发者
R不是一种编程语言,这就产生了分析和软件工程之间固有的二分法。随着数据科学的出现,存在着弥合这一差距的机会,特别是通过教学实践。创世纪:我们是怎么来到这里的?“软件工程和R编程:行动呼吁”这篇文章总结了R生态系统中分析和软件工程之间的二分法,提供了导致问题的例子,并建议我们作为R用户可以做些什么来弥合差距。这种二分法的基本基础是S和R的演变所固有的:它们不是编程语言,但它们最终被误认为是编程语言。S被设计成一种数据分析语言:将想法快速而忠实地转化为软件,通常以“非编程”风格使用(Chambers, 1998)。它最初的目标是使统计学家能够应用用编程语言(当时主要是FORTRAN)编写的代码来快速和交互式地分析数据,以获得当时“交互式”的一些合适定义(Becker, 1994)。S和R的成功可以追溯到通过创造性地将现有工具应用于数据来执行数据分析的能力。数据分析是一种探索,在每一步中我们都了解更多关于数据的信息,从而为我们下一步的决策提供信息。无论是利用图形或计算统计或拟合模型的探索性数据分析,最终目标通常是事先不知道的,它是通过应用工具的迭代过程获得的,我们作为分析师认为这些工具可能会引导我们走得更远(Tukey, 1977)。重要的是要注意,这与软件工程完全相反,软件工程有一个定义良好的目标:一个规范或期望的结果,只需要用计算机可以理解的方式来表达。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信