{"title":"Predictive use cases of CNN based multi label classification for programming languages","authors":"Satyarth Upadhyaya, Anish Parajuli, S. Shakya","doi":"10.1109/ICCCIS48478.2019.8974489","DOIUrl":null,"url":null,"abstract":"Multi-label classification refers to classifying data into two or more, usually independent, set of output labels. This approach is suitable for deep learning applications in multi-faceted subjects like software development, where it is desirable to yield multiple outcomes. This paper proposes a CNN based deep learning model on public datasets of programming language platforms like GitHub and Stack Overflow to infer intelligence to aid decision making process regarding the choice of programming languages for a given software development requirement. For this research, we’ve developed a training model with pre-trained vector embedding layer and multi-channel one dimensional CNN layers, followed by Multi Layer Perceptron layer to provide multi label outputs. We have managed to achieve 92%, 98% accuracy and 22%, 4% loss with our two experimental setups for Github and Stack Overflow respectively. The model performed well when tested on software development requirements. Stack Overflow dataset was observed to be noticeably better performing than the Github dataset for actual software development use cases. The implications of these models were also found to be good for trend prediction and source code use cases.","PeriodicalId":436154,"journal":{"name":"2019 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCCIS48478.2019.8974489","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Multi-label classification refers to classifying data into two or more, usually independent, set of output labels. This approach is suitable for deep learning applications in multi-faceted subjects like software development, where it is desirable to yield multiple outcomes. This paper proposes a CNN based deep learning model on public datasets of programming language platforms like GitHub and Stack Overflow to infer intelligence to aid decision making process regarding the choice of programming languages for a given software development requirement. For this research, we’ve developed a training model with pre-trained vector embedding layer and multi-channel one dimensional CNN layers, followed by Multi Layer Perceptron layer to provide multi label outputs. We have managed to achieve 92%, 98% accuracy and 22%, 4% loss with our two experimental setups for Github and Stack Overflow respectively. The model performed well when tested on software development requirements. Stack Overflow dataset was observed to be noticeably better performing than the Github dataset for actual software development use cases. The implications of these models were also found to be good for trend prediction and source code use cases.