{"title":"使用多项朴素贝叶斯机器学习方法分类,检测和识别编程语言源代码","authors":"A. Odeh, Munther Odeh, Nada Odeh","doi":"10.1109/ACIT57182.2022.9994117","DOIUrl":null,"url":null,"abstract":"Processing programming languages are very similar to processing natural languages, especially high-level languages such as Python, Java, C#, C, C++, and others. Therefore, the natural language processing concepts can be applied as one of the most important branches of artificial intelligence in detecting, recognizing, and classification scripts written in different programming languages. The programming language script classification can be counted as a classical machine learning problem. This research aims to present a model using Multinomial Naïve Bayes (MNB) artificial intelligence algorithm to identify and classify the programming language used in writing the source code file provided as an input for the proposed model. A set of categorized files containing source codes will be used in training the proposed model, and then the model will be able to automatically detect and classify a new script into one of the already trained categories. The machine learning method called NB Multinomial will be used to implement this matter. This work is very important for Mufti-programming language editors such as Visual Studio Code, Notepad+, and others, where the user can paste the source code, and the editor will recognize the programming language automatically.","PeriodicalId":256713,"journal":{"name":"2022 International Arab Conference on Information Technology (ACIT)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Using Multinomial Naive Bayes Machine Learning Method To Classify, Detect, And Recognize Programming Language Source Code\",\"authors\":\"A. Odeh, Munther Odeh, Nada Odeh\",\"doi\":\"10.1109/ACIT57182.2022.9994117\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Processing programming languages are very similar to processing natural languages, especially high-level languages such as Python, Java, C#, C, C++, and others. Therefore, the natural language processing concepts can be applied as one of the most important branches of artificial intelligence in detecting, recognizing, and classification scripts written in different programming languages. The programming language script classification can be counted as a classical machine learning problem. This research aims to present a model using Multinomial Naïve Bayes (MNB) artificial intelligence algorithm to identify and classify the programming language used in writing the source code file provided as an input for the proposed model. A set of categorized files containing source codes will be used in training the proposed model, and then the model will be able to automatically detect and classify a new script into one of the already trained categories. The machine learning method called NB Multinomial will be used to implement this matter. This work is very important for Mufti-programming language editors such as Visual Studio Code, Notepad+, and others, where the user can paste the source code, and the editor will recognize the programming language automatically.\",\"PeriodicalId\":256713,\"journal\":{\"name\":\"2022 International Arab Conference on Information Technology (ACIT)\",\"volume\":\"16 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 International Arab Conference on Information Technology (ACIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ACIT57182.2022.9994117\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Arab Conference on Information Technology (ACIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ACIT57182.2022.9994117","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
摘要
处理编程语言非常类似于处理自然语言,特别是高级语言,如Python、Java、c#、C、c++等。因此,自然语言处理概念可以作为人工智能最重要的分支之一,用于检测、识别和分类用不同编程语言编写的脚本。编程语言脚本分类可以算作一个经典的机器学习问题。本研究旨在提出一个使用Multinomial Naïve Bayes (MNB)人工智能算法的模型,以识别和分类编写源代码文件时使用的编程语言,并将其作为所提议模型的输入。一组包含源代码的分类文件将用于训练提议的模型,然后模型将能够自动检测并将新脚本分类到已经训练的类别之一中。我们将使用名为NB多项式的机器学习方法来实现这个问题。这项工作对于多编程语言编辑器(如Visual Studio Code、Notepad+等)非常重要,用户可以粘贴源代码,编辑器会自动识别编程语言。
Using Multinomial Naive Bayes Machine Learning Method To Classify, Detect, And Recognize Programming Language Source Code
Processing programming languages are very similar to processing natural languages, especially high-level languages such as Python, Java, C#, C, C++, and others. Therefore, the natural language processing concepts can be applied as one of the most important branches of artificial intelligence in detecting, recognizing, and classification scripts written in different programming languages. The programming language script classification can be counted as a classical machine learning problem. This research aims to present a model using Multinomial Naïve Bayes (MNB) artificial intelligence algorithm to identify and classify the programming language used in writing the source code file provided as an input for the proposed model. A set of categorized files containing source codes will be used in training the proposed model, and then the model will be able to automatically detect and classify a new script into one of the already trained categories. The machine learning method called NB Multinomial will be used to implement this matter. This work is very important for Mufti-programming language editors such as Visual Studio Code, Notepad+, and others, where the user can paste the source code, and the editor will recognize the programming language automatically.