Zhixiang Chen , Anji Li , Neng Zhang , Jianguo Chen , Yuan Huang , Zibin Zheng
{"title":"iJTyper: An effective type inference framework for incomplete java codes by integrating constraint- and statistics-based methods","authors":"Zhixiang Chen , Anji Li , Neng Zhang , Jianguo Chen , Yuan Huang , Zibin Zheng","doi":"10.1016/j.eswa.2025.129972","DOIUrl":null,"url":null,"abstract":"<div><div>Inferring the types of APIs used in incomplete codes (also referred to as code snippets), e.g., those on Q&A forums, is a prerequisite step required to work with the codes. Existing type inference methods proposed for incomplete Java codes can be primarily categorized as constraint-based or statistics-based. The former relies on a pre-built API knowledge base (KB) and the type constraints in code snippets, which imposes higher requirements on code syntax and thus suffers from low recall due to the syntactic limitation. The latter overcomes the syntactic limitation by learning statistical regularities from a code corpus, however it rarely employs the type constraints in code snippets, which may lead to low precision. In this paper, we propose an effective type inference framework, called iJTyper, for incomplete Java codes by integrating the complementary advantages of constraint- and statistics-based methods. For a code snippet, iJTyper first applies a constraint-based method and augments the code context with the inferred API types. Then, it applies a statistics-based method to the augmented code snippet. The types predicted for APIs are further used to improve the constraint-based method by reducing its pre-built KB. iJTyper iteratively executes both methods and performs the code context augmentation and KB reduction mechanisms until a termination condition is satisfied. The final inference results are produced by combining the results of both methods. We implemented a version of iJTyper by integrating two state-of-the-art methods, SnR and MLMTyper, and evaluated iJTyper on two open-source datasets. Results show that 1) iJTyper achieves the highest average precision/recall<sup>1</sup> of 97.3 % and 92.5 % on both datasets; 2) iJTyper improves the average recall of SnR and MLMTyper by at least 7.3 % and 27.4 %, respectively; and 3) iJTyper improves the average precision/recall of the recently popular language model, ChatGPT, by 3.2 % and 0.5 % on both datasets.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"299 ","pages":"Article 129972"},"PeriodicalIF":7.5000,"publicationDate":"2025-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417425035870","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Inferring the types of APIs used in incomplete codes (also referred to as code snippets), e.g., those on Q&A forums, is a prerequisite step required to work with the codes. Existing type inference methods proposed for incomplete Java codes can be primarily categorized as constraint-based or statistics-based. The former relies on a pre-built API knowledge base (KB) and the type constraints in code snippets, which imposes higher requirements on code syntax and thus suffers from low recall due to the syntactic limitation. The latter overcomes the syntactic limitation by learning statistical regularities from a code corpus, however it rarely employs the type constraints in code snippets, which may lead to low precision. In this paper, we propose an effective type inference framework, called iJTyper, for incomplete Java codes by integrating the complementary advantages of constraint- and statistics-based methods. For a code snippet, iJTyper first applies a constraint-based method and augments the code context with the inferred API types. Then, it applies a statistics-based method to the augmented code snippet. The types predicted for APIs are further used to improve the constraint-based method by reducing its pre-built KB. iJTyper iteratively executes both methods and performs the code context augmentation and KB reduction mechanisms until a termination condition is satisfied. The final inference results are produced by combining the results of both methods. We implemented a version of iJTyper by integrating two state-of-the-art methods, SnR and MLMTyper, and evaluated iJTyper on two open-source datasets. Results show that 1) iJTyper achieves the highest average precision/recall1 of 97.3 % and 92.5 % on both datasets; 2) iJTyper improves the average recall of SnR and MLMTyper by at least 7.3 % and 27.4 %, respectively; and 3) iJTyper improves the average precision/recall of the recently popular language model, ChatGPT, by 3.2 % and 0.5 % on both datasets.
期刊介绍:
Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.