Automated Refactoring of Non-Idiomatic Python Code With Pythonic Idioms

IF 6.5 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING
Zejun Zhang;Zhenchang Xing;Dehai Zhao;Xiwei Xu;Liming Zhu;Qinghua Lu
{"title":"Automated Refactoring of Non-Idiomatic Python Code With Pythonic Idioms","authors":"Zejun Zhang;Zhenchang Xing;Dehai Zhao;Xiwei Xu;Liming Zhu;Qinghua Lu","doi":"10.1109/TSE.2024.3420886","DOIUrl":null,"url":null,"abstract":"Compared to other programming languages (e.g., Java), Python has more idioms to make Python code concise and efficient. Although Pythonic idioms are well accepted in the Python community, Python programmers are often faced with many challenges in using them, for example, being unaware of certain Pythonic idioms or not knowing how to use them properly. Based on an analysis of 7,577 Python repositories on GitHub, we find that non-idiomatic Python code that can be implemented with Pythonic idioms occurs frequently and widely. To assist Python developers in adopting Pythonic idioms, we design and implement an automatic refactoring tool named RIdiom to refactor code with Pythonic idioms. We identify twelve Pythonic idioms by systematically contrasting the abstract syntax grammar of Python and Java. Then we define the syntactic patterns for detecting non-idiomatic code for each Pythonic idiom. Finally, we devise atomic AST-rewriting operations and refactoring steps to refactor non-idiomatic code into idiomatic code. Our approach is evaluated on 1,814 code refactorings, achieving a precision of 0.99 and a recall of 0.87, underscoring its effectiveness. We further evaluate the tool's utility in helping developers refactor code with Pythonic idioms. A user study involving 14 students demonstrates a 112.9% improvement in correctness and a 35.5% speedup when referring to the tool-generated code pairs. Additionally, the 120 pull requests that refactor non-idiomatic code with Pythonic idioms, submitted to GitHub projects, resulted in 79 responses. Among these, 49 accepted and praised the refactorings, with 42 merging the refactorings into their repositories.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"50 11","pages":"2827-2848"},"PeriodicalIF":6.5000,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10711885/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0

Abstract

Compared to other programming languages (e.g., Java), Python has more idioms to make Python code concise and efficient. Although Pythonic idioms are well accepted in the Python community, Python programmers are often faced with many challenges in using them, for example, being unaware of certain Pythonic idioms or not knowing how to use them properly. Based on an analysis of 7,577 Python repositories on GitHub, we find that non-idiomatic Python code that can be implemented with Pythonic idioms occurs frequently and widely. To assist Python developers in adopting Pythonic idioms, we design and implement an automatic refactoring tool named RIdiom to refactor code with Pythonic idioms. We identify twelve Pythonic idioms by systematically contrasting the abstract syntax grammar of Python and Java. Then we define the syntactic patterns for detecting non-idiomatic code for each Pythonic idiom. Finally, we devise atomic AST-rewriting operations and refactoring steps to refactor non-idiomatic code into idiomatic code. Our approach is evaluated on 1,814 code refactorings, achieving a precision of 0.99 and a recall of 0.87, underscoring its effectiveness. We further evaluate the tool's utility in helping developers refactor code with Pythonic idioms. A user study involving 14 students demonstrates a 112.9% improvement in correctness and a 35.5% speedup when referring to the tool-generated code pairs. Additionally, the 120 pull requests that refactor non-idiomatic code with Pythonic idioms, submitted to GitHub projects, resulted in 79 responses. Among these, 49 accepted and praised the refactorings, with 42 merging the refactorings into their repositories.
用 Pythonic 成语自动重构非成语 Python 代码
与其他编程语言(如 Java)相比,Python 有更多的习语,使 Python 代码更加简洁高效。尽管 Pythonic 习语在 Python 社区已被广泛接受,但 Python 程序员在使用这些习语时往往面临许多挑战,例如不了解某些 Pythonic 习语或不知道如何正确使用这些习语。基于对 GitHub 上 7577 个 Python 代码库的分析,我们发现可以使用 Pythonic 习语实现的非惯用 Python 代码出现得很频繁,也很广泛。为了帮助 Python 开发人员采用 Pythonic 习语,我们设计并实现了一款名为 RIdiom 的自动重构工具,用于重构 Pythonic 习语代码。通过系统地对比 Python 和 Java 的抽象语法语法,我们确定了 12 种 Pythonic 习语。然后,我们为每个 Pythonic 习语定义了用于检测非惯用代码的语法模式。最后,我们设计了原子 AST 重写操作和重构步骤,将非惯用代码重构为惯用代码。我们的方法在 1814 次代码重构中进行了评估,精确度达到 0.99,召回率达到 0.87,充分证明了它的有效性。我们进一步评估了该工具在帮助开发人员使用 Pythonic 习语重构代码方面的实用性。一项由 14 名学生参与的用户研究表明,在引用工具生成的代码对时,正确率提高了 112.9%,速度提高了 35.5%。此外,在提交给 GitHub 项目的 120 个使用 Pythonic 习语重构非惯用代码的拉取请求中,有 79 个得到了响应。其中,49 个项目接受并称赞了重构,42 个项目将重构合并到了自己的资源库中。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Transactions on Software Engineering
IEEE Transactions on Software Engineering 工程技术-工程:电子与电气
CiteScore
9.70
自引率
10.80%
发文量
724
审稿时长
6 months
期刊介绍: IEEE Transactions on Software Engineering seeks contributions comprising well-defined theoretical results and empirical studies with potential impacts on software construction, analysis, or management. The scope of this Transactions extends from fundamental mechanisms to the development of principles and their application in specific environments. Specific topic areas include: a) Development and maintenance methods and models: Techniques and principles for specifying, designing, and implementing software systems, encompassing notations and process models. b) Assessment methods: Software tests, validation, reliability models, test and diagnosis procedures, software redundancy, design for error control, and measurements and evaluation of process and product aspects. c) Software project management: Productivity factors, cost models, schedule and organizational issues, and standards. d) Tools and environments: Specific tools, integrated tool environments, associated architectures, databases, and parallel and distributed processing issues. e) System issues: Hardware-software trade-offs. f) State-of-the-art surveys: Syntheses and comprehensive reviews of the historical development within specific areas of interest.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信