{"title":"NLP-enabled Recommendation of Hashtags for Covid based Tweets using Hybrid BERT-LSTM Model","authors":"Kirti Jain, Rajni Jindal","doi":"10.1145/3640812","DOIUrl":null,"url":null,"abstract":"<p>Hashtags have become a new trend to summarize the feelings, sentiments, emotions, swinging moods, food tastes and much more. It also represents various entities like places, families and friends. It is a way to search and categorize various stuff on social media sites. With the increase in the hashtagging, there is a need to automate it, leading to the term “Hashtag Recommendation”. Also, there are plenty of posts on social media sites which remain untagged. These untagged posts get filtered out while searching and categorizing the data using a label. Such posts do not make any contribution to any helpful insight and remain abandoned. But, if the user of such posts is recommended by labels according to his post, then he may choose one or more of them, thus making the posts labelled. For such cases Hashtag recommendation comes into the picture. Although much research work has been done on Hashtag Recommendation using traditional Deep Learning approaches, not much work has been done using NLP based Bert Embedding. In this paper, we have proposed a model, BELHASH, Bert Embedding based LSTM for Hashtag Recommendation. This task is considered as a Multilabel Classification task as the hashtags are one-hot encoded into multiple binary vectors of zeros and ones using MultiLabelBinarizer. This model has been evaluated on Covid 19 tweets. We have achieved 0.72 accuracy, 0.7 Precision, 0.66 Recall and 0.67 F1-Score. This is the first paper of hashtag recommendation to the best of our knowledge combining Bert embedding with LSTM model and achieving the state of the arts results.</p>","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"8 1","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2024-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Asian and Low-Resource Language Information Processing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3640812","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Hashtags have become a new trend to summarize the feelings, sentiments, emotions, swinging moods, food tastes and much more. It also represents various entities like places, families and friends. It is a way to search and categorize various stuff on social media sites. With the increase in the hashtagging, there is a need to automate it, leading to the term “Hashtag Recommendation”. Also, there are plenty of posts on social media sites which remain untagged. These untagged posts get filtered out while searching and categorizing the data using a label. Such posts do not make any contribution to any helpful insight and remain abandoned. But, if the user of such posts is recommended by labels according to his post, then he may choose one or more of them, thus making the posts labelled. For such cases Hashtag recommendation comes into the picture. Although much research work has been done on Hashtag Recommendation using traditional Deep Learning approaches, not much work has been done using NLP based Bert Embedding. In this paper, we have proposed a model, BELHASH, Bert Embedding based LSTM for Hashtag Recommendation. This task is considered as a Multilabel Classification task as the hashtags are one-hot encoded into multiple binary vectors of zeros and ones using MultiLabelBinarizer. This model has been evaluated on Covid 19 tweets. We have achieved 0.72 accuracy, 0.7 Precision, 0.66 Recall and 0.67 F1-Score. This is the first paper of hashtag recommendation to the best of our knowledge combining Bert embedding with LSTM model and achieving the state of the arts results.
期刊介绍:
The ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) publishes high quality original archival papers and technical notes in the areas of computation and processing of information in Asian languages, low-resource languages of Africa, Australasia, Oceania and the Americas, as well as related disciplines. The subject areas covered by TALLIP include, but are not limited to:
-Computational Linguistics: including computational phonology, computational morphology, computational syntax (e.g. parsing), computational semantics, computational pragmatics, etc.
-Linguistic Resources: including computational lexicography, terminology, electronic dictionaries, cross-lingual dictionaries, electronic thesauri, etc.
-Hardware and software algorithms and tools for Asian or low-resource language processing, e.g., handwritten character recognition.
-Information Understanding: including text understanding, speech understanding, character recognition, discourse processing, dialogue systems, etc.
-Machine Translation involving Asian or low-resource languages.
-Information Retrieval: including natural language processing (NLP) for concept-based indexing, natural language query interfaces, semantic relevance judgments, etc.
-Information Extraction and Filtering: including automatic abstraction, user profiling, etc.
-Speech processing: including text-to-speech synthesis and automatic speech recognition.
-Multimedia Asian Information Processing: including speech, image, video, image/text translation, etc.
-Cross-lingual information processing involving Asian or low-resource languages.
-Papers that deal in theory, systems design, evaluation and applications in the aforesaid subjects are appropriate for TALLIP. Emphasis will be placed on the originality and the practical significance of the reported research.