{"title":"基于TF-IDF(词频-逆文档频率)和LSTM(长短期记忆)的情感文本分类","authors":"M. I. Alfarizi, L. Syafaah, Merinda Lestandy","doi":"10.30595/juita.v10i2.13262","DOIUrl":null,"url":null,"abstract":"Humans in carrying out communication activities can express their feelings either verbally or non-verbally. Verbal communication can be in the form of oral or written communication. A person's feelings or emotions can usually be seen by their behavior, tone of voice, and expression. Not everyone can see emotion only through writing, whether in the form of words, sentences, or paragraphs. Therefore, a classification system is needed to help someone determine the emotions contained in a piece of writing. The novelty of this study is a development of previous research using a similar method, namely LSTM but improved on the word weighting process using the TF-IDF method as a further process of LSTM classification. The method proposed in this research is called Natural Language Processing (NLP). The purpose of this study was to compare the classification method with the LSTM (Long Short-Term Memory) model by adding the word weighting TF-IDF (Term Frequency–Inverse Document Frequency) and the LinearSVC model, as well to increase accuracy in determining an emotion (sadness, anger, fear, love, joy, and surprise) contained in the text. The dataset used is 18000, which is divided into 16000 training data and 2000 test data with 6 classifications of emotion classes, namely sadness, anger, fear, love, joy, and surprise. The results of the classification accuracy of emotions using the LSTM method yielded a 97.50% accuracy while using the LinearSVC method resulted in an accuracy value of 89%.","PeriodicalId":151254,"journal":{"name":"JUITA : Jurnal Informatika","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Emotional Text Classification Using TF-IDF (Term Frequency-Inverse Document Frequency) And LSTM (Long Short-Term Memory)\",\"authors\":\"M. I. Alfarizi, L. Syafaah, Merinda Lestandy\",\"doi\":\"10.30595/juita.v10i2.13262\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Humans in carrying out communication activities can express their feelings either verbally or non-verbally. Verbal communication can be in the form of oral or written communication. A person's feelings or emotions can usually be seen by their behavior, tone of voice, and expression. Not everyone can see emotion only through writing, whether in the form of words, sentences, or paragraphs. Therefore, a classification system is needed to help someone determine the emotions contained in a piece of writing. The novelty of this study is a development of previous research using a similar method, namely LSTM but improved on the word weighting process using the TF-IDF method as a further process of LSTM classification. The method proposed in this research is called Natural Language Processing (NLP). The purpose of this study was to compare the classification method with the LSTM (Long Short-Term Memory) model by adding the word weighting TF-IDF (Term Frequency–Inverse Document Frequency) and the LinearSVC model, as well to increase accuracy in determining an emotion (sadness, anger, fear, love, joy, and surprise) contained in the text. The dataset used is 18000, which is divided into 16000 training data and 2000 test data with 6 classifications of emotion classes, namely sadness, anger, fear, love, joy, and surprise. The results of the classification accuracy of emotions using the LSTM method yielded a 97.50% accuracy while using the LinearSVC method resulted in an accuracy value of 89%.\",\"PeriodicalId\":151254,\"journal\":{\"name\":\"JUITA : Jurnal Informatika\",\"volume\":\"4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JUITA : Jurnal Informatika\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.30595/juita.v10i2.13262\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JUITA : Jurnal Informatika","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.30595/juita.v10i2.13262","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
摘要
人类在进行交际活动时,既可以用语言表达感情,也可以用非语言表达感情。言语交际可以分为口头和书面两种形式。一个人的感觉或情绪通常可以从他们的行为、语调和表情中看出。不是每个人都能通过文字看到情感,无论是以单词、句子还是段落的形式。因此,需要一个分类系统来帮助人们确定一篇文章中包含的情绪。本研究的新颖之处在于发展了先前的研究,使用了类似的方法,即LSTM,但使用TF-IDF方法改进了单词加权过程,作为LSTM分类的进一步过程。本研究提出的方法被称为自然语言处理(NLP)。本研究的目的是通过添加单词加权TF-IDF (Term Frequency - inverse Document Frequency)和线性svc模型,将该分类方法与LSTM(长短期记忆)模型进行比较,并提高确定文本中包含的情绪(悲伤、愤怒、恐惧、爱、喜悦和惊讶)的准确性。使用的数据集为18000个,分为16000个训练数据和2000个测试数据,分为6类情绪,分别是悲伤、愤怒、恐惧、爱、喜悦、惊喜。使用LSTM方法的情绪分类准确率为97.50%,而使用线性svc方法的准确率为89%。
Emotional Text Classification Using TF-IDF (Term Frequency-Inverse Document Frequency) And LSTM (Long Short-Term Memory)
Humans in carrying out communication activities can express their feelings either verbally or non-verbally. Verbal communication can be in the form of oral or written communication. A person's feelings or emotions can usually be seen by their behavior, tone of voice, and expression. Not everyone can see emotion only through writing, whether in the form of words, sentences, or paragraphs. Therefore, a classification system is needed to help someone determine the emotions contained in a piece of writing. The novelty of this study is a development of previous research using a similar method, namely LSTM but improved on the word weighting process using the TF-IDF method as a further process of LSTM classification. The method proposed in this research is called Natural Language Processing (NLP). The purpose of this study was to compare the classification method with the LSTM (Long Short-Term Memory) model by adding the word weighting TF-IDF (Term Frequency–Inverse Document Frequency) and the LinearSVC model, as well to increase accuracy in determining an emotion (sadness, anger, fear, love, joy, and surprise) contained in the text. The dataset used is 18000, which is divided into 16000 training data and 2000 test data with 6 classifications of emotion classes, namely sadness, anger, fear, love, joy, and surprise. The results of the classification accuracy of emotions using the LSTM method yielded a 97.50% accuracy while using the LinearSVC method resulted in an accuracy value of 89%.