Zhiqi Chen , Yuzhou Liu , Lei Liu , Huaxiao Liu , Ren Li , Peng Zhang
{"title":"基于从 Stack Overflow 挖掘出的非功能信息进行 API 比较","authors":"Zhiqi Chen , Yuzhou Liu , Lei Liu , Huaxiao Liu , Ren Li , Peng Zhang","doi":"10.1016/j.scico.2024.103228","DOIUrl":null,"url":null,"abstract":"<div><div>When comparing similar APIs, developers tend to distinguish them from the aspects of functional details. At the same time, some important non-functional factors (such as performance, usability, and security) may be ignored or noticed after using the API in the project. This may result in unnecessary errors or extra costs. API-related questions are common on Stack Overflow, and they can give a well-rounded picture of the APIs. This provides us with a rich resource for API comparison. However, although many methods are offered for mining Questions and Answers (Q&As) automatically, they often suffer from two main problems: 1) they only focus on the functional information of APIs; 2) they analyze each text in isolation but ignore the correlations among them. In this paper, we propose an approach based on the pre-training model BERT to mine the non-functional information of APIs from Stack Overflow: we first tease out the correlations among questions, answers as well as corresponding reviews, so that one Q&A can be analyzed as a whole; then, an information extraction model is constructed by fine-tuning BERT with three subtasks—entity identification, aspect classification, and sentiment analysis separately, and we use it to mine the texts in Q&As step by step; finally, we summarize and visualize the results in a user-friendly way, so that developers can understand the information intuitively at the beginning of API selection. We evaluate our approach on 4,456 Q&As collected from Stack Overflow. The results show our approach can identify the correlations among reviews with 90.1% precision, and such information can improve the performance of the data mining process. In addition, the survey on maturers and novices indicates the understandability and helpfulness of our method. Moreover, compared with language models, our method can provide more intuitive and brief information for API comparison in non-functional aspects.</div></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"241 ","pages":"Article 103228"},"PeriodicalIF":1.5000,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"API comparison based on the non-functional information mined from Stack Overflow\",\"authors\":\"Zhiqi Chen , Yuzhou Liu , Lei Liu , Huaxiao Liu , Ren Li , Peng Zhang\",\"doi\":\"10.1016/j.scico.2024.103228\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>When comparing similar APIs, developers tend to distinguish them from the aspects of functional details. At the same time, some important non-functional factors (such as performance, usability, and security) may be ignored or noticed after using the API in the project. This may result in unnecessary errors or extra costs. API-related questions are common on Stack Overflow, and they can give a well-rounded picture of the APIs. This provides us with a rich resource for API comparison. However, although many methods are offered for mining Questions and Answers (Q&As) automatically, they often suffer from two main problems: 1) they only focus on the functional information of APIs; 2) they analyze each text in isolation but ignore the correlations among them. In this paper, we propose an approach based on the pre-training model BERT to mine the non-functional information of APIs from Stack Overflow: we first tease out the correlations among questions, answers as well as corresponding reviews, so that one Q&A can be analyzed as a whole; then, an information extraction model is constructed by fine-tuning BERT with three subtasks—entity identification, aspect classification, and sentiment analysis separately, and we use it to mine the texts in Q&As step by step; finally, we summarize and visualize the results in a user-friendly way, so that developers can understand the information intuitively at the beginning of API selection. We evaluate our approach on 4,456 Q&As collected from Stack Overflow. The results show our approach can identify the correlations among reviews with 90.1% precision, and such information can improve the performance of the data mining process. In addition, the survey on maturers and novices indicates the understandability and helpfulness of our method. Moreover, compared with language models, our method can provide more intuitive and brief information for API comparison in non-functional aspects.</div></div>\",\"PeriodicalId\":49561,\"journal\":{\"name\":\"Science of Computer Programming\",\"volume\":\"241 \",\"pages\":\"Article 103228\"},\"PeriodicalIF\":1.5000,\"publicationDate\":\"2024-11-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Science of Computer Programming\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167642324001515\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Science of Computer Programming","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167642324001515","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0
摘要
在比较类似的应用程序接口时,开发人员往往会从功能细节方面进行区分。同时,在项目中使用 API 后,一些重要的非功能性因素(如性能、可用性和安全性)可能会被忽略或注意到。这可能会导致不必要的错误或额外成本。在 Stack Overflow 上,与 API 相关的问题很常见,这些问题可以让我们对 API 有一个全面的了解。这为我们提供了丰富的 API 比较资源。然而,尽管有很多方法可以自动挖掘问与答(Q&As),但它们往往存在两个主要问题:1)它们只关注 API 的功能信息;2)它们孤立地分析每个文本,却忽略了它们之间的关联性。在本文中,我们提出了一种基于预训练模型 BERT 的方法,从 Stack Overflow 中挖掘 API 的非功能性信息:首先,我们找出问题、答案以及相应评论之间的关联性,从而将一个 Q&A 作为一个整体进行分析;然后,通过对 BERT 进行微调,分别完成实体识别、方面分类和情感分析三个子任务,构建信息提取模型,并利用该模型逐步挖掘 Q&As 中的文本;最后,我们以用户友好的方式对结果进行总结和可视化,以便开发人员在开始选择 API 时就能直观地了解信息。我们对从 Stack Overflow 收集的 4,456 个 Q&As 进行了评估。结果表明,我们的方法能以 90.1% 的精度识别出评论之间的相关性,而这些信息能提高数据挖掘过程的性能。此外,对成熟用户和新用户的调查表明,我们的方法易于理解,而且很有帮助。此外,与语言模型相比,我们的方法能在非功能方面为 API 比较提供更直观、更简短的信息。
API comparison based on the non-functional information mined from Stack Overflow
When comparing similar APIs, developers tend to distinguish them from the aspects of functional details. At the same time, some important non-functional factors (such as performance, usability, and security) may be ignored or noticed after using the API in the project. This may result in unnecessary errors or extra costs. API-related questions are common on Stack Overflow, and they can give a well-rounded picture of the APIs. This provides us with a rich resource for API comparison. However, although many methods are offered for mining Questions and Answers (Q&As) automatically, they often suffer from two main problems: 1) they only focus on the functional information of APIs; 2) they analyze each text in isolation but ignore the correlations among them. In this paper, we propose an approach based on the pre-training model BERT to mine the non-functional information of APIs from Stack Overflow: we first tease out the correlations among questions, answers as well as corresponding reviews, so that one Q&A can be analyzed as a whole; then, an information extraction model is constructed by fine-tuning BERT with three subtasks—entity identification, aspect classification, and sentiment analysis separately, and we use it to mine the texts in Q&As step by step; finally, we summarize and visualize the results in a user-friendly way, so that developers can understand the information intuitively at the beginning of API selection. We evaluate our approach on 4,456 Q&As collected from Stack Overflow. The results show our approach can identify the correlations among reviews with 90.1% precision, and such information can improve the performance of the data mining process. In addition, the survey on maturers and novices indicates the understandability and helpfulness of our method. Moreover, compared with language models, our method can provide more intuitive and brief information for API comparison in non-functional aspects.
期刊介绍:
Science of Computer Programming is dedicated to the distribution of research results in the areas of software systems development, use and maintenance, including the software aspects of hardware design.
The journal has a wide scope ranging from the many facets of methodological foundations to the details of technical issues andthe aspects of industrial practice.
The subjects of interest to SCP cover the entire spectrum of methods for the entire life cycle of software systems, including
• Requirements, specification, design, validation, verification, coding, testing, maintenance, metrics and renovation of software;
• Design, implementation and evaluation of programming languages;
• Programming environments, development tools, visualisation and animation;
• Management of the development process;
• Human factors in software, software for social interaction, software for social computing;
• Cyber physical systems, and software for the interaction between the physical and the machine;
• Software aspects of infrastructure services, system administration, and network management.