Reem S. Alsuhaibani
{"title":"Applying Markov Models to Identify Grammatical Patterns of Function Identifiers","authors":"Reem S. Alsuhaibani","doi":"10.1109/ICSME.2019.00097","DOIUrl":null,"url":null,"abstract":"An empirical study to evaluate the effectiveness of using Markov chains in finding and predicting the grammatical patterns of function identifiers found in source code is presented. The study uses a specialized part-of-speech tagger to annotate function identifiers extracted from 20 C++ open-source systems. A dataset of 93K annotated unique function identifiers is created for analysis. The analysis includes using a first-order Markov chain to model part of speech tag sequences of the identifier names, using a probability transition matrix. The evaluation of the model is via a 10-fold cross validation over the entire set of annotated function identifier names. The preliminary results are promising in terms of applicability and accuracy. The model achieved an accuracy median of 91.53% in predicting the most common part of speech tag on a test set. Future work involves utilizing these results in creating a quality assessment and automatic repairing tool for source code function identifiers.","PeriodicalId":106748,"journal":{"name":"2019 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Conference on Software Maintenance and Evolution (ICSME)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSME.2019.00097","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

本文提出了一项实证研究,以评估使用马尔可夫链查找和预测源代码中发现的函数标识符的语法模式的有效性。这项研究使用了一个专门的词性标注器来标注从20个c++开源系统中提取的函数标识符。创建一个包含93K个带注释的唯一函数标识符的数据集用于分析。该分析包括使用一阶马尔可夫链对标识符名称的词性标签序列进行建模,并使用概率转移矩阵。模型的评估是通过对整个带注释的函数标识符名称集进行10倍交叉验证。初步结果具有较好的适用性和准确性。该模型预测测试集上最常见词性标签的准确率中值为91.53%。未来的工作包括利用这些结果创建源代码功能标识符的质量评估和自动修复工具。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Applying Markov Models to Identify Grammatical Patterns of Function Identifiers
An empirical study to evaluate the effectiveness of using Markov chains in finding and predicting the grammatical patterns of function identifiers found in source code is presented. The study uses a specialized part-of-speech tagger to annotate function identifiers extracted from 20 C++ open-source systems. A dataset of 93K annotated unique function identifiers is created for analysis. The analysis includes using a first-order Markov chain to model part of speech tag sequences of the identifier names, using a probability transition matrix. The evaluation of the model is via a 10-fold cross validation over the entire set of annotated function identifier names. The preliminary results are promising in terms of applicability and accuracy. The model achieved an accuracy median of 91.53% in predicting the most common part of speech tag on a test set. Future work involves utilizing these results in creating a quality assessment and automatic repairing tool for source code function identifiers.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信