{"title":"GitHub中机器人和人类活动的数据集","authors":"Natarajan Chidambaram, Alexandre Decan, T. Mens","doi":"10.1109/MSR59073.2023.00070","DOIUrl":null,"url":null,"abstract":"Software repositories hosted on GitHub frequently use development bots to automate repetitive, effort intensive and error-prone tasks. To understand and study how these bots are used, state-of-the-art bot identification tools have been developed to detect bots based on their comments in commits, issues and pull requests. Given that bots can be involved in many other activity types, there is a need to consider more activities that they are carrying out in the software repositories they are involved in. We therefore propose a curated dataset of such activities carried out by bots and humans involved in GitHub repositories. The dataset was constructed by identifying 24 high-level activity types that could be extracted from 15 lower-level event types that were queried from GitHub’s event stream API for all considered bots and humans. The proposed dataset contains around 834K activities performed by 385 bots and 616 humans involved in GitHub repositories, during an observation period ranging from 25 November 2022 to 9 March 2023. By analysing the activity patterns of bots and humans, this dataset could lead to better bot identification tools and empirical studies on how bots play a role in collaborative software development.","PeriodicalId":317960,"journal":{"name":"2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A Dataset of Bot and Human Activities in GitHub\",\"authors\":\"Natarajan Chidambaram, Alexandre Decan, T. Mens\",\"doi\":\"10.1109/MSR59073.2023.00070\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Software repositories hosted on GitHub frequently use development bots to automate repetitive, effort intensive and error-prone tasks. To understand and study how these bots are used, state-of-the-art bot identification tools have been developed to detect bots based on their comments in commits, issues and pull requests. Given that bots can be involved in many other activity types, there is a need to consider more activities that they are carrying out in the software repositories they are involved in. We therefore propose a curated dataset of such activities carried out by bots and humans involved in GitHub repositories. The dataset was constructed by identifying 24 high-level activity types that could be extracted from 15 lower-level event types that were queried from GitHub’s event stream API for all considered bots and humans. The proposed dataset contains around 834K activities performed by 385 bots and 616 humans involved in GitHub repositories, during an observation period ranging from 25 November 2022 to 9 March 2023. By analysing the activity patterns of bots and humans, this dataset could lead to better bot identification tools and empirical studies on how bots play a role in collaborative software development.\",\"PeriodicalId\":317960,\"journal\":{\"name\":\"2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR)\",\"volume\":\"7 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MSR59073.2023.00070\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MSR59073.2023.00070","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Software repositories hosted on GitHub frequently use development bots to automate repetitive, effort intensive and error-prone tasks. To understand and study how these bots are used, state-of-the-art bot identification tools have been developed to detect bots based on their comments in commits, issues and pull requests. Given that bots can be involved in many other activity types, there is a need to consider more activities that they are carrying out in the software repositories they are involved in. We therefore propose a curated dataset of such activities carried out by bots and humans involved in GitHub repositories. The dataset was constructed by identifying 24 high-level activity types that could be extracted from 15 lower-level event types that were queried from GitHub’s event stream API for all considered bots and humans. The proposed dataset contains around 834K activities performed by 385 bots and 616 humans involved in GitHub repositories, during an observation period ranging from 25 November 2022 to 9 March 2023. By analysing the activity patterns of bots and humans, this dataset could lead to better bot identification tools and empirical studies on how bots play a role in collaborative software development.