使用混合 BERT-LSTM 模型为基于 Covid 的推文推荐 NLP 标签

IF 2 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

ACM Transactions on Asian and Low-Resource Language Information Processing Pub Date : 2024-01-16 DOI:10.1145/3640812

Kirti Jain, Rajni Jindal

{"title":"使用混合 BERT-LSTM 模型为基于 Covid 的推文推荐 NLP 标签","authors":"Kirti Jain, Rajni Jindal","doi":"10.1145/3640812","DOIUrl":null,"url":null,"abstract":"<p>Hashtags have become a new trend to summarize the feelings, sentiments, emotions, swinging moods, food tastes and much more. It also represents various entities like places, families and friends. It is a way to search and categorize various stuff on social media sites. With the increase in the hashtagging, there is a need to automate it, leading to the term “Hashtag Recommendation”. Also, there are plenty of posts on social media sites which remain untagged. These untagged posts get filtered out while searching and categorizing the data using a label. Such posts do not make any contribution to any helpful insight and remain abandoned. But, if the user of such posts is recommended by labels according to his post, then he may choose one or more of them, thus making the posts labelled. For such cases Hashtag recommendation comes into the picture. Although much research work has been done on Hashtag Recommendation using traditional Deep Learning approaches, not much work has been done using NLP based Bert Embedding. In this paper, we have proposed a model, BELHASH, Bert Embedding based LSTM for Hashtag Recommendation. This task is considered as a Multilabel Classification task as the hashtags are one-hot encoded into multiple binary vectors of zeros and ones using MultiLabelBinarizer. This model has been evaluated on Covid 19 tweets. We have achieved 0.72 accuracy, 0.7 Precision, 0.66 Recall and 0.67 F1-Score. This is the first paper of hashtag recommendation to the best of our knowledge combining Bert embedding with LSTM model and achieving the state of the arts results.</p>","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"8 1","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2024-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"NLP-enabled Recommendation of Hashtags for Covid based Tweets using Hybrid BERT-LSTM Model\",\"authors\":\"Kirti Jain, Rajni Jindal\",\"doi\":\"10.1145/3640812\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Hashtags have become a new trend to summarize the feelings, sentiments, emotions, swinging moods, food tastes and much more. It also represents various entities like places, families and friends. It is a way to search and categorize various stuff on social media sites. With the increase in the hashtagging, there is a need to automate it, leading to the term “Hashtag Recommendation”. Also, there are plenty of posts on social media sites which remain untagged. These untagged posts get filtered out while searching and categorizing the data using a label. Such posts do not make any contribution to any helpful insight and remain abandoned. But, if the user of such posts is recommended by labels according to his post, then he may choose one or more of them, thus making the posts labelled. For such cases Hashtag recommendation comes into the picture. Although much research work has been done on Hashtag Recommendation using traditional Deep Learning approaches, not much work has been done using NLP based Bert Embedding. In this paper, we have proposed a model, BELHASH, Bert Embedding based LSTM for Hashtag Recommendation. This task is considered as a Multilabel Classification task as the hashtags are one-hot encoded into multiple binary vectors of zeros and ones using MultiLabelBinarizer. This model has been evaluated on Covid 19 tweets. We have achieved 0.72 accuracy, 0.7 Precision, 0.66 Recall and 0.67 F1-Score. This is the first paper of hashtag recommendation to the best of our knowledge combining Bert embedding with LSTM model and achieving the state of the arts results.</p>\",\"PeriodicalId\":54312,\"journal\":{\"name\":\"ACM Transactions on Asian and Low-Resource Language Information Processing\",\"volume\":\"8 1\",\"pages\":\"\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2024-01-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Asian and Low-Resource Language Information Processing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3640812\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Asian and Low-Resource Language Information Processing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3640812","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

标签已成为一种新趋势，用来概括感受、情绪、情感、摇摆不定的心情和食物口味等等。它还代表各种实体，如地点、家庭和朋友。它是在社交媒体网站上搜索和分类各种内容的一种方式。随着标签的增加，有必要将其自动化，这就产生了 "标签推荐 "一词。此外，社交媒体网站上还有很多帖子没有标签。在使用标签搜索和分类数据时，这些无标签的帖子会被过滤掉。这些帖子对任何有帮助的洞察力都没有任何贡献，因此一直被遗弃。但是，如果这类帖子的用户根据自己的帖子获得了标签推荐，那么他可能会选择其中的一个或多个标签，从而使帖子贴上标签。在这种情况下，标签推荐就出现了。虽然使用传统的深度学习方法对 Hashtag 推荐进行了大量研究，但使用基于 NLP 的 Bert Embedding 方法进行的研究却不多。在本文中，我们提出了一种基于 Bert Embedding 的 LSTM 模型 BELHASH，用于 Hashtag 推荐。这项任务被视为多标签分类任务，因为标签是使用多标签二进制器（MultiLabelBinarizer）一次性编码成多个由 0 和 1 组成的二进制向量的。该模型已在 Covid 19 条推文中进行了评估。我们取得了 0.72 的准确率、0.7 的精确率、0.66 的召回率和 0.67 的 F1 分数。据我们所知，这是第一篇将 Bert embedding 与 LSTM 模型相结合并取得最新成果的标签推荐论文。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

NLP-enabled Recommendation of Hashtags for Covid based Tweets using Hybrid BERT-LSTM Model

Hashtags have become a new trend to summarize the feelings, sentiments, emotions, swinging moods, food tastes and much more. It also represents various entities like places, families and friends. It is a way to search and categorize various stuff on social media sites. With the increase in the hashtagging, there is a need to automate it, leading to the term “Hashtag Recommendation”. Also, there are plenty of posts on social media sites which remain untagged. These untagged posts get filtered out while searching and categorizing the data using a label. Such posts do not make any contribution to any helpful insight and remain abandoned. But, if the user of such posts is recommended by labels according to his post, then he may choose one or more of them, thus making the posts labelled. For such cases Hashtag recommendation comes into the picture. Although much research work has been done on Hashtag Recommendation using traditional Deep Learning approaches, not much work has been done using NLP based Bert Embedding. In this paper, we have proposed a model, BELHASH, Bert Embedding based LSTM for Hashtag Recommendation. This task is considered as a Multilabel Classification task as the hashtags are one-hot encoded into multiple binary vectors of zeros and ones using MultiLabelBinarizer. This model has been evaluated on Covid 19 tweets. We have achieved 0.72 accuracy, 0.7 Precision, 0.66 Recall and 0.67 F1-Score. This is the first paper of hashtag recommendation to the best of our knowledge combining Bert embedding with LSTM model and achieving the state of the arts results.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Transactions on Asian and Low-Resource Language Information Processing Computer Science-General Computer Science

CiteScore

3.60

自引率

15.00%

发文量

241

期刊介绍： The ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) publishes high quality original archival papers and technical notes in the areas of computation and processing of information in Asian languages, low-resource languages of Africa, Australasia, Oceania and the Americas, as well as related disciplines. The subject areas covered by TALLIP include, but are not limited to: -Computational Linguistics: including computational phonology, computational morphology, computational syntax (e.g. parsing), computational semantics, computational pragmatics, etc. -Linguistic Resources: including computational lexicography, terminology, electronic dictionaries, cross-lingual dictionaries, electronic thesauri, etc. -Hardware and software algorithms and tools for Asian or low-resource language processing, e.g., handwritten character recognition. -Information Understanding: including text understanding, speech understanding, character recognition, discourse processing, dialogue systems, etc. -Machine Translation involving Asian or low-resource languages. -Information Retrieval: including natural language processing (NLP) for concept-based indexing, natural language query interfaces, semantic relevance judgments, etc. -Information Extraction and Filtering: including automatic abstraction, user profiling, etc. -Speech processing: including text-to-speech synthesis and automatic speech recognition. -Multimedia Asian Information Processing: including speech, image, video, image/text translation, etc. -Cross-lingual information processing involving Asian or low-resource languages. -Papers that deal in theory, systems design, evaluation and applications in the aforesaid subjects are appropriate for TALLIP. Emphasis will be placed on the originality and the practical significance of the reported research.