教育领域异常值弹性语义特征深度驱动情感分析模型

International journal of machine learning and computing Pub Date : 2023-10-05 DOI:10.53759/7669/jmc202303034

Smitha B A, Raja Praveen K N

{"title":"教育领域异常值弹性语义特征深度驱动情感分析模型","authors":"Smitha B A, Raja Praveen K N","doi":"10.53759/7669/jmc202303034","DOIUrl":null,"url":null,"abstract":"The high pace rising global competitions across education sector has forced institutions to enhance aforesaid aspects, which require assessing students or related stakeholders’ perception and opinion towards the learning materials, courses, learning methods or pedagogies, etc. To achieve it, the use of reviews by students can of paramount significance; yet, annotating student’s opinion over huge heterogenous and unstructured data remains a tedious task. Though, the artificial intelligence (AI) and natural language processing (NLP) techniques can play decisive role; yet the conventional unsupervised lexicon, corpus-based solutions, and machine learning and/or deep driven approaches are found limited due to the different issues like class-imbalance, lack of contextual details, lack of long-term dependency, convergence, local minima etc. The aforesaid challenges can be severe over large inputs in Big Data ecosystems. In this reference, this paper proposed an outlier resilient semantic featuring deep driven sentiment analysis model (ORDSAENet) for educational domain sentiment annotations. To address data heterogeneity and unstructured-ness over unpredictable digital media, the ORDSAENet applies varied pre-processing methods including missing value removal, Unicode normalization, Emoji and Website link removal, removal of the words with numeric values, punctuations removal, lower case conversion, stop-word removal, lemmatization, and tokenization. Moreover, it applies a text size-constrained criteria to remove outlier texts from the input and hence improve ROI-specific learning for accurate annotation. The tokenized data was processed for Word2Vec assisted continuous bag-of-words (CBOW) semantic embedding followed by synthetic minority over-sampling with edited nearest neighbor (SMOTE-ENN) resampling. The resampled embedding matrix was then processed for Bi-LSTM feature extraction and learning that retains both local as well as contextual features to achieve efficient learning and classification. Executing ORDSAENet model over educational review dataset encompassing both qualitative reviews as well as quantitative ratings for the online courses, revealed that the proposed approach achieves average sentiment annotation accuracy, precision, recall, and F-Measure of 95.87%, 95.26%, 95.06% and 95.15%, respectively, which is higher than the LSTM driven standalone feature learning solutions and other state-of-arts. The overall simulation results and allied inferences confirm robustness of the ORDSAENet model towards real-time educational sentiment annotation solution.","PeriodicalId":91709,"journal":{"name":"International journal of machine learning and computing","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ORDSAENet: Outlier Resilient Semantic Featured Deep Driven Sentiment Analysis Model for Education Domain\",\"authors\":\"Smitha B A, Raja Praveen K N\",\"doi\":\"10.53759/7669/jmc202303034\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The high pace rising global competitions across education sector has forced institutions to enhance aforesaid aspects, which require assessing students or related stakeholders’ perception and opinion towards the learning materials, courses, learning methods or pedagogies, etc. To achieve it, the use of reviews by students can of paramount significance; yet, annotating student’s opinion over huge heterogenous and unstructured data remains a tedious task. Though, the artificial intelligence (AI) and natural language processing (NLP) techniques can play decisive role; yet the conventional unsupervised lexicon, corpus-based solutions, and machine learning and/or deep driven approaches are found limited due to the different issues like class-imbalance, lack of contextual details, lack of long-term dependency, convergence, local minima etc. The aforesaid challenges can be severe over large inputs in Big Data ecosystems. In this reference, this paper proposed an outlier resilient semantic featuring deep driven sentiment analysis model (ORDSAENet) for educational domain sentiment annotations. To address data heterogeneity and unstructured-ness over unpredictable digital media, the ORDSAENet applies varied pre-processing methods including missing value removal, Unicode normalization, Emoji and Website link removal, removal of the words with numeric values, punctuations removal, lower case conversion, stop-word removal, lemmatization, and tokenization. Moreover, it applies a text size-constrained criteria to remove outlier texts from the input and hence improve ROI-specific learning for accurate annotation. The tokenized data was processed for Word2Vec assisted continuous bag-of-words (CBOW) semantic embedding followed by synthetic minority over-sampling with edited nearest neighbor (SMOTE-ENN) resampling. The resampled embedding matrix was then processed for Bi-LSTM feature extraction and learning that retains both local as well as contextual features to achieve efficient learning and classification. Executing ORDSAENet model over educational review dataset encompassing both qualitative reviews as well as quantitative ratings for the online courses, revealed that the proposed approach achieves average sentiment annotation accuracy, precision, recall, and F-Measure of 95.87%, 95.26%, 95.06% and 95.15%, respectively, which is higher than the LSTM driven standalone feature learning solutions and other state-of-arts. The overall simulation results and allied inferences confirm robustness of the ORDSAENet model towards real-time educational sentiment annotation solution.\",\"PeriodicalId\":91709,\"journal\":{\"name\":\"International journal of machine learning and computing\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-10-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International journal of machine learning and computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.53759/7669/jmc202303034\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of machine learning and computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.53759/7669/jmc202303034","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

教育行业的全球竞争日益激烈，迫使各院校加强上述方面，这需要评估学生或相关利益相关者对学习材料、课程、学习方法或教学法等的看法和意见。要做到这一点，学生使用复习是至关重要的;然而，注释学生对大量异构和非结构化数据的意见仍然是一项乏味的任务。尽管人工智能(AI)和自然语言处理(NLP)技术可以发挥决定性作用;然而，传统的无监督词典、基于语料库的解决方案、机器学习和/或深度驱动方法由于类别不平衡、缺乏上下文细节、缺乏长期依赖、收敛、局部最小等不同问题而受到限制。对于大数据生态系统的大量投入，上述挑战可能会很严峻。本文提出了一种基于异常值弹性语义特征的深度驱动情感分析模型(ORDSAENet)，用于教育领域的情感标注。为了解决不可预测的数字媒体上的数据异质性和非结构化问题，ORDSAENet应用了各种预处理方法，包括缺失值删除、Unicode规范化、表情符号和网站链接删除、带有数值的单词删除、标点删除、小写转换、停止词删除、词序化和标记化。此外，它应用文本大小约束标准从输入中删除异常文本，从而改善roi特定的学习，以获得准确的注释。对标记后的数据进行Word2Vec辅助连续词袋(CBOW)语义嵌入，然后进行合成少数派过采样和编辑最近邻(SMOTE-ENN)重采样。然后对重新采样的嵌入矩阵进行Bi-LSTM特征提取和学习，既保留局部特征，又保留上下文特征，以实现高效的学习和分类。在包含在线课程定性评论和定量评级的教育评论数据集上执行ORDSAENet模型，结果表明，该方法的平均情感注释准确率、精密度、召回率和F-Measure分别达到95.87%、95.26%、95.06%和95.15%，高于LSTM驱动的独立特征学习解决方案和其他技术水平。整体仿真结果和相关推断证实了ORDSAENet模型对实时教育情感标注解决方案的鲁棒性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

ORDSAENet: Outlier Resilient Semantic Featured Deep Driven Sentiment Analysis Model for Education Domain

The high pace rising global competitions across education sector has forced institutions to enhance aforesaid aspects, which require assessing students or related stakeholders’ perception and opinion towards the learning materials, courses, learning methods or pedagogies, etc. To achieve it, the use of reviews by students can of paramount significance; yet, annotating student’s opinion over huge heterogenous and unstructured data remains a tedious task. Though, the artificial intelligence (AI) and natural language processing (NLP) techniques can play decisive role; yet the conventional unsupervised lexicon, corpus-based solutions, and machine learning and/or deep driven approaches are found limited due to the different issues like class-imbalance, lack of contextual details, lack of long-term dependency, convergence, local minima etc. The aforesaid challenges can be severe over large inputs in Big Data ecosystems. In this reference, this paper proposed an outlier resilient semantic featuring deep driven sentiment analysis model (ORDSAENet) for educational domain sentiment annotations. To address data heterogeneity and unstructured-ness over unpredictable digital media, the ORDSAENet applies varied pre-processing methods including missing value removal, Unicode normalization, Emoji and Website link removal, removal of the words with numeric values, punctuations removal, lower case conversion, stop-word removal, lemmatization, and tokenization. Moreover, it applies a text size-constrained criteria to remove outlier texts from the input and hence improve ROI-specific learning for accurate annotation. The tokenized data was processed for Word2Vec assisted continuous bag-of-words (CBOW) semantic embedding followed by synthetic minority over-sampling with edited nearest neighbor (SMOTE-ENN) resampling. The resampled embedding matrix was then processed for Bi-LSTM feature extraction and learning that retains both local as well as contextual features to achieve efficient learning and classification. Executing ORDSAENet model over educational review dataset encompassing both qualitative reviews as well as quantitative ratings for the online courses, revealed that the proposed approach achieves average sentiment annotation accuracy, precision, recall, and F-Measure of 95.87%, 95.26%, 95.06% and 95.15%, respectively, which is higher than the LSTM driven standalone feature learning solutions and other state-of-arts. The overall simulation results and allied inferences confirm robustness of the ORDSAENet model towards real-time educational sentiment annotation solution.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International journal of machine learning and computing

自引率

0.00%

发文量