太长时间;没有读:GitHub README的自动摘要。带变压器的MD

Proceedings of the 27th International Conference on Evaluation and Assessment in Software Engineering Pub Date : 2023-06-14 DOI:10.1145/3593434.3593448

Thu T. H. Doan, P. Nguyen, Juri Di Rocco, Davide Di Ruscio

{"title":"太长时间;没有读:GitHub README的自动摘要。带变压器的MD","authors":"Thu T. H. Doan, P. Nguyen, Juri Di Rocco, Davide Di Ruscio","doi":"10.1145/3593434.3593448","DOIUrl":null,"url":null,"abstract":"The ability to allow developers to share their source code and collaborate on software projects has made GitHub a widely used open source platform. Each repository in GitHub is generally equipped with a README.MD file to exhibit an overview of the main functionalities. Nevertheless, while offering useful information, README.MD is usually lengthy, requiring time and effort to read and comprehend. Thus, besides README.MD, GitHub also allows its users to add a short description called “About,” giving a brief but informative summary about the repository. This enables visitors to quickly grasp the main content and decide whether to continue reading. Unfortunately, due to various reasons–not excluding laziness–oftentimes this field is left blank by developers. This paper proposes GitSum as a novel approach to the summarization of README.MD. GitSum is built on top of BART and T5, two cutting-edge deep learning techniques, learning from existing data to perform recommendations for repositories with a missing description. We test its performance using two datasets collected from GitHub. The evaluation shows that GitSum can generate relevant predictions, outperforming a well-established baseline.","PeriodicalId":178596,"journal":{"name":"Proceedings of the 27th International Conference on Evaluation and Assessment in Software Engineering","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Too long; didn’t read: Automatic summarization of GitHub README.MD with Transformers\",\"authors\":\"Thu T. H. Doan, P. Nguyen, Juri Di Rocco, Davide Di Ruscio\",\"doi\":\"10.1145/3593434.3593448\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The ability to allow developers to share their source code and collaborate on software projects has made GitHub a widely used open source platform. Each repository in GitHub is generally equipped with a README.MD file to exhibit an overview of the main functionalities. Nevertheless, while offering useful information, README.MD is usually lengthy, requiring time and effort to read and comprehend. Thus, besides README.MD, GitHub also allows its users to add a short description called “About,” giving a brief but informative summary about the repository. This enables visitors to quickly grasp the main content and decide whether to continue reading. Unfortunately, due to various reasons–not excluding laziness–oftentimes this field is left blank by developers. This paper proposes GitSum as a novel approach to the summarization of README.MD. GitSum is built on top of BART and T5, two cutting-edge deep learning techniques, learning from existing data to perform recommendations for repositories with a missing description. We test its performance using two datasets collected from GitHub. The evaluation shows that GitSum can generate relevant predictions, outperforming a well-established baseline.\",\"PeriodicalId\":178596,\"journal\":{\"name\":\"Proceedings of the 27th International Conference on Evaluation and Assessment in Software Engineering\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 27th International Conference on Evaluation and Assessment in Software Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3593434.3593448\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 27th International Conference on Evaluation and Assessment in Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3593434.3593448","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

允许开发人员共享源代码并在软件项目上进行协作的能力使GitHub成为一个广泛使用的开源平台。GitHub中的每个存储库通常都配有一个README。MD文件来展示主要功能的概述。然而，在提供有用信息的同时，README。MD通常很长，需要时间和精力来阅读和理解。因此，除了README之外。此外，GitHub还允许用户添加一个名为“About”的简短描述，提供一个关于存储库的简短但信息丰富的摘要。这使访问者能够快速掌握主要内容，并决定是否继续阅读。不幸的是，由于各种原因(不排除懒惰)，开发人员经常将此字段留白。本文提出GitSum作为一种新的方法来总结README.MD。GitSum建立在BART和T5这两种尖端的深度学习技术之上，从现有数据中学习，为缺少描述的存储库执行推荐。我们使用从GitHub收集的两个数据集来测试它的性能。评估表明，GitSum可以生成相关的预测，优于已建立的基线。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Too long; didn’t read: Automatic summarization of GitHub README.MD with Transformers

The ability to allow developers to share their source code and collaborate on software projects has made GitHub a widely used open source platform. Each repository in GitHub is generally equipped with a README.MD file to exhibit an overview of the main functionalities. Nevertheless, while offering useful information, README.MD is usually lengthy, requiring time and effort to read and comprehend. Thus, besides README.MD, GitHub also allows its users to add a short description called “About,” giving a brief but informative summary about the repository. This enables visitors to quickly grasp the main content and decide whether to continue reading. Unfortunately, due to various reasons–not excluding laziness–oftentimes this field is left blank by developers. This paper proposes GitSum as a novel approach to the summarization of README.MD. GitSum is built on top of BART and T5, two cutting-edge deep learning techniques, learning from existing data to perform recommendations for repositories with a missing description. We test its performance using two datasets collected from GitHub. The evaluation shows that GitSum can generate relevant predictions, outperforming a well-established baseline.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 27th International Conference on Evaluation and Assessment in Software Engineering

自引率

0.00%

发文量