CLEAR: Contrastive Learning for API Recommendation

2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE) Pub Date : 2022-05-01 DOI:10.1145/3510003.3510159

Moshi Wei, Nima Shiri Harzevili, Yuchao Huang, Junjie Wang, Song Wang

{"title":"CLEAR: Contrastive Learning for API Recommendation","authors":"Moshi Wei, Nima Shiri Harzevili, Yuchao Huang, Junjie Wang, Song Wang","doi":"10.1145/3510003.3510159","DOIUrl":null,"url":null,"abstract":"Automatic API recommendation has been studied for years. There are two orthogonal lines of approaches for this task, i.e., information-retrieval-based (IR-based) and neural-based methods. Although these approaches were reported having remarkable performance, our observation shows that existing approaches can fail due to the following two reasons: 1) most IR-based approaches treat task queries as bag-of-words and use word embedding to represent queries, which cannot capture the sequential semantic information. 2) both the IR-based and the neural-based approaches are weak at distinguishing the semantic difference among lexically similar queries. In this paper, we propose CLEAR, which leverages BERT sen-tence embedding and contrastive learning to tackle the above two is-sues. Specifically, CLEAR embeds the whole sentence of queries and Stack Overflow (SO) posts with a BERT-based model rather than the bag-of-word-based word embedding model, which can preserve the semantic-related sequential information. In addition, CLEAR uses contrastive learning to train the BERT-based embedding model for learning precise semantic representation of programming termi-nologies regardless of their lexical information. CLEAR also builds a BERT-based re-ranking model to optimize its recommendation results. Given a query, CLEAR first selects a set of candidate SO posts via the BERT sentence embedding-based similarity to reduce search space. CLEAR further leverages a BERT-based re-ranking model to rank candidate SO posts and recommends the APIs from the ranked top SO posts for the query. Our experiment results on three different test datasets confirm the effectiveness of CLEAR for both method-level and class-level API recommendation. Compared to the state-of-the-art API recom-mendation approaches, CLEAR improves the MAP by 25%-187% at method-level and 10%-100% at class-level.","PeriodicalId":202896,"journal":{"name":"2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3510003.3510159","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 21

Abstract

Automatic API recommendation has been studied for years. There are two orthogonal lines of approaches for this task, i.e., information-retrieval-based (IR-based) and neural-based methods. Although these approaches were reported having remarkable performance, our observation shows that existing approaches can fail due to the following two reasons: 1) most IR-based approaches treat task queries as bag-of-words and use word embedding to represent queries, which cannot capture the sequential semantic information. 2) both the IR-based and the neural-based approaches are weak at distinguishing the semantic difference among lexically similar queries. In this paper, we propose CLEAR, which leverages BERT sen-tence embedding and contrastive learning to tackle the above two is-sues. Specifically, CLEAR embeds the whole sentence of queries and Stack Overflow (SO) posts with a BERT-based model rather than the bag-of-word-based word embedding model, which can preserve the semantic-related sequential information. In addition, CLEAR uses contrastive learning to train the BERT-based embedding model for learning precise semantic representation of programming termi-nologies regardless of their lexical information. CLEAR also builds a BERT-based re-ranking model to optimize its recommendation results. Given a query, CLEAR first selects a set of candidate SO posts via the BERT sentence embedding-based similarity to reduce search space. CLEAR further leverages a BERT-based re-ranking model to rank candidate SO posts and recommends the APIs from the ranked top SO posts for the query. Our experiment results on three different test datasets confirm the effectiveness of CLEAR for both method-level and class-level API recommendation. Compared to the state-of-the-art API recom-mendation approaches, CLEAR improves the MAP by 25%-187% at method-level and 10%-100% at class-level.

查看原文本刊更多论文

清晰:API推荐的对比学习

自动API推荐已经研究了很多年。这项任务有两条正交的方法，即基于信息检索(ir)和基于神经的方法。尽管这些方法被报道具有显著的性能，但我们的观察表明，现有的方法可能会失败，原因有以下两个:1)大多数基于ir的方法将任务查询视为词袋，并使用词嵌入来表示查询，这无法捕获顺序语义信息。2)基于ir的方法和基于神经的方法在区分词汇相似查询之间的语义差异方面都很弱。在本文中，我们提出了CLEAR，它利用BERT句子嵌入和对比学习来解决上述两个问题。具体而言，CLEAR使用基于bert的模型而不是基于词袋的词嵌入模型嵌入查询和堆栈溢出(SO)帖子的整句，从而可以保留与语义相关的顺序信息。此外，CLEAR使用对比学习来训练基于bert的嵌入模型，以学习编程术语的精确语义表示，而不考虑其词汇信息。CLEAR还建立了一个基于bert的重新排序模型来优化其推荐结果。给定一个查询，CLEAR首先通过基于BERT句子嵌入的相似度选择一组候选SO帖子，以减少搜索空间。CLEAR进一步利用基于bert的重新排序模型对候选SO帖子进行排序，并从排名靠前的SO帖子中为查询推荐api。我们在三个不同的测试数据集上的实验结果证实了CLEAR在方法级和类级API推荐方面的有效性。与最先进的API推荐方法相比，CLEAR在方法级别将MAP提高了25%-187%，在类级别将MAP提高了10%-100%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE)

自引率

0.00%

发文量