Melissa Oussaid, Samia Lazib, Lydia Lazib, Farida Bouarab-Dahmani, N. Cullot
{"title":"Educational Lexical Resource Enrichment Using Machine Learning Classifiers","authors":"Melissa Oussaid, Samia Lazib, Lydia Lazib, Farida Bouarab-Dahmani, N. Cullot","doi":"10.1109/acit53391.2021.9677450","DOIUrl":null,"url":null,"abstract":"Opinion mining is one of the most popular topics today in the Natural Language Processing (NLP) and Artificial Intelligence (AI) domains. It intends to analyze people's emotions, feelings, humor, appreciation, etc. It covers a wide field of applications and education is one of them. The study of opinions in the educational field can be very useful; the use of lexical resources specific to the studied field can help in different tasks of NLP. Besides, the improvement of these lexical resources can play an important role in the opinion extraction task as it improves the opinion detection process. Our work consists of the enrichment of a French lexical resource called DICO, dedicated to educational opinion mining through a recalculation of its polarities. This enrichment is based on the use of several features including the word embedding to extract semantic information from a corpus of annotated comments, built from various educational sources. This semantic information is used to develop different classification models such as the K-Nearest Neighbors, Support Vector Machine, Decision Tree, Naive Bayes, MLP, Random Forest, AdaBoost, and SGD. The development of classification models is implemented using the high-level programming language Python. These models classify the synsets of the lexical resource DICO, and the results of this classification are used for the recalculation of DICO polarities to get a new lexical resource: DICO-2. We compared the classification performances of the corpus using DICO with those obtained using DICO-2, and the results show that DICO-2 allows a better classification of opinions, with a noticeable increase in performances.","PeriodicalId":302120,"journal":{"name":"2021 22nd International Arab Conference on Information Technology (ACIT)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 22nd International Arab Conference on Information Technology (ACIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/acit53391.2021.9677450","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Opinion mining is one of the most popular topics today in the Natural Language Processing (NLP) and Artificial Intelligence (AI) domains. It intends to analyze people's emotions, feelings, humor, appreciation, etc. It covers a wide field of applications and education is one of them. The study of opinions in the educational field can be very useful; the use of lexical resources specific to the studied field can help in different tasks of NLP. Besides, the improvement of these lexical resources can play an important role in the opinion extraction task as it improves the opinion detection process. Our work consists of the enrichment of a French lexical resource called DICO, dedicated to educational opinion mining through a recalculation of its polarities. This enrichment is based on the use of several features including the word embedding to extract semantic information from a corpus of annotated comments, built from various educational sources. This semantic information is used to develop different classification models such as the K-Nearest Neighbors, Support Vector Machine, Decision Tree, Naive Bayes, MLP, Random Forest, AdaBoost, and SGD. The development of classification models is implemented using the high-level programming language Python. These models classify the synsets of the lexical resource DICO, and the results of this classification are used for the recalculation of DICO polarities to get a new lexical resource: DICO-2. We compared the classification performances of the corpus using DICO with those obtained using DICO-2, and the results show that DICO-2 allows a better classification of opinions, with a noticeable increase in performances.