Representation Learning for Stack Overflow Posts: How Far are We?

IF 6.6 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Software Engineering and Methodology Pub Date : 2023-12-07 DOI:10.1145/3635711

Junda He, Xin Zhou, Bowen Xu, Ting Zhang, Kisub Kim, Zhou Yang, Ferdian Thung, Ivana Clairine Irsan, David Lo

{"title":"Representation Learning for Stack Overflow Posts: How Far are We?","authors":"Junda He, Xin Zhou, Bowen Xu, Ting Zhang, Kisub Kim, Zhou Yang, Ferdian Thung, Ivana Clairine Irsan, David Lo","doi":"10.1145/3635711","DOIUrl":null,"url":null,"abstract":"<p>The tremendous success of Stack Overflow has accumulated an extensive corpus of software engineering knowledge, thus motivating researchers to propose various solutions for analyzing its content. The performance of such solutions hinges significantly on the selection of representation models for Stack Overflow posts. As the volume of literature on Stack Overflow continues to burgeon, it highlights the need for a powerful Stack Overflow post representation model and drives researchers’ interest in developing specialized representation models that can adeptly capture the intricacies of Stack Overflow posts. The state-of-the-art (SOTA) Stack Overflow post representation models are Post2Vec and BERTOverflow, which are built upon neural networks such as convolutional neural network (CNN) and transformer architecture (e.g., BERT). Despite their promising results, these representation methods have not been evaluated in the same experimental setting. To fill the research gap, we first empirically compare the performance of the representation models designed specifically for Stack Overflow posts (Post2Vec and BERTOverflow) in a wide range of related tasks, i.e., tag recommendation, relatedness prediction, and API recommendation. The results show that Post2Vec cannot further improve the state-of-the-art techniques of the considered downstream tasks, and BERTOverflow shows surprisingly poor performance. To find more suitable representation models for the posts, we further explore a diverse set of transformer-based models, including (1) general domain language models (RoBERTa, Longformer, GPT2) and (2) language models built with software engineering-related textual artifacts (CodeBERT, GraphCodeBERT, seBERT, CodeT5, PLBart, and CodeGen). This exploration shows that models like CodeBERT and RoBERTa are suitable for representing Stack Overflow posts. However, it also illustrates the “No Silver Bullet” concept, as none of the models consistently wins against all the others. Inspired by the findings, we propose SOBERT, which employs a simple yet effective strategy to improve the representation models of Stack Overflow posts by continuing the pre-training phase with the textual artifact from Stack Overflow. The overall experimental results demonstrate that SOBERT can consistently outperform the considered models and increase the state-of-the-art performance significantly for all the downstream tasks.</p>","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"101 1","pages":""},"PeriodicalIF":6.6000,"publicationDate":"2023-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Software Engineering and Methodology","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3635711","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 6

Abstract

The tremendous success of Stack Overflow has accumulated an extensive corpus of software engineering knowledge, thus motivating researchers to propose various solutions for analyzing its content. The performance of such solutions hinges significantly on the selection of representation models for Stack Overflow posts. As the volume of literature on Stack Overflow continues to burgeon, it highlights the need for a powerful Stack Overflow post representation model and drives researchers’ interest in developing specialized representation models that can adeptly capture the intricacies of Stack Overflow posts. The state-of-the-art (SOTA) Stack Overflow post representation models are Post2Vec and BERTOverflow, which are built upon neural networks such as convolutional neural network (CNN) and transformer architecture (e.g., BERT). Despite their promising results, these representation methods have not been evaluated in the same experimental setting. To fill the research gap, we first empirically compare the performance of the representation models designed specifically for Stack Overflow posts (Post2Vec and BERTOverflow) in a wide range of related tasks, i.e., tag recommendation, relatedness prediction, and API recommendation. The results show that Post2Vec cannot further improve the state-of-the-art techniques of the considered downstream tasks, and BERTOverflow shows surprisingly poor performance. To find more suitable representation models for the posts, we further explore a diverse set of transformer-based models, including (1) general domain language models (RoBERTa, Longformer, GPT2) and (2) language models built with software engineering-related textual artifacts (CodeBERT, GraphCodeBERT, seBERT, CodeT5, PLBart, and CodeGen). This exploration shows that models like CodeBERT and RoBERTa are suitable for representing Stack Overflow posts. However, it also illustrates the “No Silver Bullet” concept, as none of the models consistently wins against all the others. Inspired by the findings, we propose SOBERT, which employs a simple yet effective strategy to improve the representation models of Stack Overflow posts by continuing the pre-training phase with the textual artifact from Stack Overflow. The overall experimental results demonstrate that SOBERT can consistently outperform the considered models and increase the state-of-the-art performance significantly for all the downstream tasks.

查看原文本刊更多论文

Stack Overflow 帖子的表征学习：我们还有多远？

Stack Overflow 的巨大成功积累了大量的软件工程知识，因此促使研究人员提出了各种分析其内容的解决方案。这些解决方案的性能在很大程度上取决于对 Stack Overflow 帖子表示模型的选择。随着有关 Stack Overflow 的文献数量不断激增，这凸显了对强大的 Stack Overflow 帖子表示模型的需求，并推动了研究人员对开发能巧妙捕捉 Stack Overflow 帖子复杂性的专门表示模型的兴趣。最先进的 Stack Overflow 帖子表示模型（SOTA）是 Post2Vec 和 BERTOverflow，它们建立在卷积神经网络（CNN）和变压器架构（如 BERT）等神经网络的基础上。尽管这些表示方法取得了可喜的成果，但还没有在相同的实验环境中进行过评估。为了填补这一研究空白，我们首先对专为 Stack Overflow 帖子设计的表示模型（Post2Vec 和 BERTOverflow）在一系列相关任务（即标签推荐、相关性预测和 API 推荐）中的性能进行了实证比较。结果表明，在所考虑的下游任务中，Post2Vec 无法进一步改进最先进的技术，而 BERTOverflow 则表现出令人惊讶的低劣性能。为了找到更适合帖子的表示模型，我们进一步探索了一系列基于转换器的模型，包括：（1）通用领域语言模型（RoBERTa、Longformer、GPT2）和（2）使用软件工程相关文本工件构建的语言模型（CodeBERT、GraphCodeBERT、seBERT、CodeT5、PLBart 和 CodeGen）。这一探索表明，CodeBERT 和 RoBERTa 等模型适用于表示 Stack Overflow 帖子。不过，这也说明了 "没有银弹 "的概念，因为没有一个模型能在与所有其他模型的竞争中始终胜出。受这一发现的启发，我们提出了 SOBERT，它采用了一种简单而有效的策略，通过继续使用 Stack Overflow 的文本工件进行预训练来改进 Stack Overflow 帖子的表示模型。总体实验结果表明，SOBERT 可以持续超越所考虑的模型，并在所有下游任务中显著提高最先进的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Transactions on Software Engineering and Methodology 工程技术-计算机：软件工程

CiteScore

6.30

自引率

4.50%

发文量

164

审稿时长

>12 weeks

期刊介绍： Designing and building a large, complex software system is a tremendous challenge. ACM Transactions on Software Engineering and Methodology (TOSEM) publishes papers on all aspects of that challenge: specification, design, development and maintenance. It covers tools and methodologies, languages, data structures, and algorithms. TOSEM also reports on successful efforts, noting practical lessons that can be scaled and transferred to other projects, and often looks at applications of innovative technologies. The tone is scholarly but readable; the content is worthy of study; the presentation is effective.