Salima Brachemi-Meftah, F. Barigou, Abdelaziz Djendara, Oussama Zaoui
{"title":"降维对阿尔及利亚方言情感分析的影响","authors":"Salima Brachemi-Meftah, F. Barigou, Abdelaziz Djendara, Oussama Zaoui","doi":"10.1109/SETIT54465.2022.9875532","DOIUrl":null,"url":null,"abstract":"In Algeria, sentiment analysis for Algerian dialect becomes very important for organizations and companies to track customer feedback, to predict their satisfaction, and to assess their opinions over time. However, identification of sentiments is a challenging task; (i) the Algerian dialect is an informal language without rigorous rules of writing and standardization. It is mainly based on Modern Standard Arabic (MSA) vocabulary, where the majority of the original words are modified both phonologically and morphologically. It is also based on a set of foreign words from Turkish, Spanish and French as well Tamazight. This is called code switching. (ii) Another problem which is obviously present in the Algerian dialect is the fact that a word with one form of pronunciation can be written in several forms. Therefore, our objective is to consider these two issues within the process of sentiment analysis of Algerian dialect. To this end, we propose to examine the impact of dimensionality reduction techniques such as lemmatization, stemming, feature selection and in particular our extended Soundex algorithm on the system performance. We used a supervised machine learning approach without going through a translation step into MSA or transliteration into another target language like French. We compare the performance of five classifiers with and without the use of dimensionality techniques. Results show that feature selection combined with multinomial Naive Bayes classifier gives an F1 score of 83.20% and attribute reduction rate of 82.65%.","PeriodicalId":126155,"journal":{"name":"2022 IEEE 9th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Impact of Dimensionality Reduction on Sentiment Analysis of Algerian Dialect\",\"authors\":\"Salima Brachemi-Meftah, F. Barigou, Abdelaziz Djendara, Oussama Zaoui\",\"doi\":\"10.1109/SETIT54465.2022.9875532\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In Algeria, sentiment analysis for Algerian dialect becomes very important for organizations and companies to track customer feedback, to predict their satisfaction, and to assess their opinions over time. However, identification of sentiments is a challenging task; (i) the Algerian dialect is an informal language without rigorous rules of writing and standardization. It is mainly based on Modern Standard Arabic (MSA) vocabulary, where the majority of the original words are modified both phonologically and morphologically. It is also based on a set of foreign words from Turkish, Spanish and French as well Tamazight. This is called code switching. (ii) Another problem which is obviously present in the Algerian dialect is the fact that a word with one form of pronunciation can be written in several forms. Therefore, our objective is to consider these two issues within the process of sentiment analysis of Algerian dialect. To this end, we propose to examine the impact of dimensionality reduction techniques such as lemmatization, stemming, feature selection and in particular our extended Soundex algorithm on the system performance. We used a supervised machine learning approach without going through a translation step into MSA or transliteration into another target language like French. We compare the performance of five classifiers with and without the use of dimensionality techniques. Results show that feature selection combined with multinomial Naive Bayes classifier gives an F1 score of 83.20% and attribute reduction rate of 82.65%.\",\"PeriodicalId\":126155,\"journal\":{\"name\":\"2022 IEEE 9th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 9th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SETIT54465.2022.9875532\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 9th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SETIT54465.2022.9875532","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Impact of Dimensionality Reduction on Sentiment Analysis of Algerian Dialect
In Algeria, sentiment analysis for Algerian dialect becomes very important for organizations and companies to track customer feedback, to predict their satisfaction, and to assess their opinions over time. However, identification of sentiments is a challenging task; (i) the Algerian dialect is an informal language without rigorous rules of writing and standardization. It is mainly based on Modern Standard Arabic (MSA) vocabulary, where the majority of the original words are modified both phonologically and morphologically. It is also based on a set of foreign words from Turkish, Spanish and French as well Tamazight. This is called code switching. (ii) Another problem which is obviously present in the Algerian dialect is the fact that a word with one form of pronunciation can be written in several forms. Therefore, our objective is to consider these two issues within the process of sentiment analysis of Algerian dialect. To this end, we propose to examine the impact of dimensionality reduction techniques such as lemmatization, stemming, feature selection and in particular our extended Soundex algorithm on the system performance. We used a supervised machine learning approach without going through a translation step into MSA or transliteration into another target language like French. We compare the performance of five classifiers with and without the use of dimensionality techniques. Results show that feature selection combined with multinomial Naive Bayes classifier gives an F1 score of 83.20% and attribute reduction rate of 82.65%.