{"title":"Applying Markov Models to Identify Grammatical Patterns of Function Identifiers","authors":"Reem S. Alsuhaibani","doi":"10.1109/ICSME.2019.00097","DOIUrl":null,"url":null,"abstract":"An empirical study to evaluate the effectiveness of using Markov chains in finding and predicting the grammatical patterns of function identifiers found in source code is presented. The study uses a specialized part-of-speech tagger to annotate function identifiers extracted from 20 C++ open-source systems. A dataset of 93K annotated unique function identifiers is created for analysis. The analysis includes using a first-order Markov chain to model part of speech tag sequences of the identifier names, using a probability transition matrix. The evaluation of the model is via a 10-fold cross validation over the entire set of annotated function identifier names. The preliminary results are promising in terms of applicability and accuracy. The model achieved an accuracy median of 91.53% in predicting the most common part of speech tag on a test set. Future work involves utilizing these results in creating a quality assessment and automatic repairing tool for source code function identifiers.","PeriodicalId":106748,"journal":{"name":"2019 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Conference on Software Maintenance and Evolution (ICSME)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSME.2019.00097","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Applying Markov Models to Identify Grammatical Patterns of Function Identifiers
An empirical study to evaluate the effectiveness of using Markov chains in finding and predicting the grammatical patterns of function identifiers found in source code is presented. The study uses a specialized part-of-speech tagger to annotate function identifiers extracted from 20 C++ open-source systems. A dataset of 93K annotated unique function identifiers is created for analysis. The analysis includes using a first-order Markov chain to model part of speech tag sequences of the identifier names, using a probability transition matrix. The evaluation of the model is via a 10-fold cross validation over the entire set of annotated function identifier names. The preliminary results are promising in terms of applicability and accuracy. The model achieved an accuracy median of 91.53% in predicting the most common part of speech tag on a test set. Future work involves utilizing these results in creating a quality assessment and automatic repairing tool for source code function identifiers.