iJTyper: An effective type inference framework for incomplete java codes by integrating constraint- and statistics-based methods

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Expert Systems with Applications Pub Date : 2025-10-09 DOI:10.1016/j.eswa.2025.129972

Zhixiang Chen , Anji Li , Neng Zhang , Jianguo Chen , Yuan Huang , Zibin Zheng

{"title":"iJTyper: An effective type inference framework for incomplete java codes by integrating constraint- and statistics-based methods","authors":"Zhixiang Chen , Anji Li , Neng Zhang , Jianguo Chen , Yuan Huang , Zibin Zheng","doi":"10.1016/j.eswa.2025.129972","DOIUrl":null,"url":null,"abstract":"<div><div>Inferring the types of APIs used in incomplete codes (also referred to as code snippets), e.g., those on Q&A forums, is a prerequisite step required to work with the codes. Existing type inference methods proposed for incomplete Java codes can be primarily categorized as constraint-based or statistics-based. The former relies on a pre-built API knowledge base (KB) and the type constraints in code snippets, which imposes higher requirements on code syntax and thus suffers from low recall due to the syntactic limitation. The latter overcomes the syntactic limitation by learning statistical regularities from a code corpus, however it rarely employs the type constraints in code snippets, which may lead to low precision. In this paper, we propose an effective type inference framework, called iJTyper, for incomplete Java codes by integrating the complementary advantages of constraint- and statistics-based methods. For a code snippet, iJTyper first applies a constraint-based method and augments the code context with the inferred API types. Then, it applies a statistics-based method to the augmented code snippet. The types predicted for APIs are further used to improve the constraint-based method by reducing its pre-built KB. iJTyper iteratively executes both methods and performs the code context augmentation and KB reduction mechanisms until a termination condition is satisfied. The final inference results are produced by combining the results of both methods. We implemented a version of iJTyper by integrating two state-of-the-art methods, SnR and MLMTyper, and evaluated iJTyper on two open-source datasets. Results show that 1) iJTyper achieves the highest average precision/recall<sup>1</sup> of 97.3 % and 92.5 % on both datasets; 2) iJTyper improves the average recall of SnR and MLMTyper by at least 7.3 % and 27.4 %, respectively; and 3) iJTyper improves the average precision/recall of the recently popular language model, ChatGPT, by 3.2 % and 0.5 % on both datasets.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"299 ","pages":"Article 129972"},"PeriodicalIF":7.5000,"publicationDate":"2025-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417425035870","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Inferring the types of APIs used in incomplete codes (also referred to as code snippets), e.g., those on Q&A forums, is a prerequisite step required to work with the codes. Existing type inference methods proposed for incomplete Java codes can be primarily categorized as constraint-based or statistics-based. The former relies on a pre-built API knowledge base (KB) and the type constraints in code snippets, which imposes higher requirements on code syntax and thus suffers from low recall due to the syntactic limitation. The latter overcomes the syntactic limitation by learning statistical regularities from a code corpus, however it rarely employs the type constraints in code snippets, which may lead to low precision. In this paper, we propose an effective type inference framework, called iJTyper, for incomplete Java codes by integrating the complementary advantages of constraint- and statistics-based methods. For a code snippet, iJTyper first applies a constraint-based method and augments the code context with the inferred API types. Then, it applies a statistics-based method to the augmented code snippet. The types predicted for APIs are further used to improve the constraint-based method by reducing its pre-built KB. iJTyper iteratively executes both methods and performs the code context augmentation and KB reduction mechanisms until a termination condition is satisfied. The final inference results are produced by combining the results of both methods. We implemented a version of iJTyper by integrating two state-of-the-art methods, SnR and MLMTyper, and evaluated iJTyper on two open-source datasets. Results show that 1) iJTyper achieves the highest average precision/recall¹ of 97.3 % and 92.5 % on both datasets; 2) iJTyper improves the average recall of SnR and MLMTyper by at least 7.3 % and 27.4 %, respectively; and 3) iJTyper improves the average precision/recall of the recently popular language model, ChatGPT, by 3.2 % and 0.5 % on both datasets.

查看原文本刊更多论文

ijtype：通过集成基于约束和统计的方法，为不完整java代码提供有效的类型推断框架

推断不完整代码（也称为代码片段）中使用的api类型，例如，Q&；A论坛上的代码，是处理代码所需的先决步骤。针对不完整Java代码提出的现有类型推断方法主要可分为基于约束和基于统计的两类。前者依赖于预先构建的API知识库（KB）和代码片段中的类型约束，这对代码语法提出了更高的要求，因此由于语法限制而导致召回率低。后者通过从代码语料库中学习统计规律来克服语法限制，但它很少在代码片段中使用类型约束，这可能导致精度低。在本文中，我们通过集成基于约束和基于统计的方法的互补优势，为不完整的Java代码提出了一个有效的类型推断框架，称为ijtype。对于代码片段，iJTyper首先应用一个基于约束的方法，并用推断的API类型来扩展代码上下文。然后，它将基于统计的方法应用于增强的代码片段。预测的api类型进一步用于通过减少预构建的KB来改进基于约束的方法。ijtype迭代地执行这两个方法，并执行代码上下文增强和KB减少机制，直到满足终止条件。将两种方法的结果结合得到最终的推理结果。我们通过集成两种最先进的方法SnR和MLMTyper实现了iJTyper的一个版本，并在两个开源数据集上对iJTyper进行了评估。结果表明：1)iJTyper在两个数据集上的平均precision/recall1分别达到97.3%和92.5%；2) iJTyper将SnR和MLMTyper的平均召回率分别提高了至少7.3%和27.4%；3) ijtype提高了最近流行的语言模型ChatGPT的平均精度/召回率，在两个数据集上分别提高了3.2%和0.5%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Expert Systems with Applications 工程技术-工程：电子与电气

CiteScore

13.80

自引率

10.60%

发文量

2045

审稿时长

8.7 months

期刊介绍： Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.