阿拉伯语关键词提取:使用预训练的上下文嵌入和外部特征增强深度学习模型

Workshop on Arabic Natural Language Processing Pub Date : 1900-01-01 DOI:10.18653/v1/2022.wanlp-1.30

Randah Alharbi, H. Al-Muhtaseb

{"title":"阿拉伯语关键词提取:使用预训练的上下文嵌入和外部特征增强深度学习模型","authors":"Randah Alharbi, H. Al-Muhtaseb","doi":"10.18653/v1/2022.wanlp-1.30","DOIUrl":null,"url":null,"abstract":"Keyphrase extraction is essential to many Information retrieval (IR) and Natural language Processing (NLP) tasks such as summarization and indexing. This study investigates deep learning approaches to Arabic keyphrase extraction. We address the problem as sequence classification and create a Bi-LSTM model to classify each sequence token as either part of the keyphrase or outside of it. We have extracted word embeddings from two pre-trained models, Word2Vec and BERT. Moreover, we have investigated the effect of incorporating linguistic, positional, and statistical features with word embeddings on performance. Our best-performing model has achieved 0.45 F1-score on ArabicKPE dataset when combining linguistic and positional features with BERT embedding.","PeriodicalId":355149,"journal":{"name":"Workshop on Arabic Natural Language Processing","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Arabic Keyphrase Extraction: Enhancing Deep Learning Models with Pre-trained Contextual Embedding and External Features\",\"authors\":\"Randah Alharbi, H. Al-Muhtaseb\",\"doi\":\"10.18653/v1/2022.wanlp-1.30\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Keyphrase extraction is essential to many Information retrieval (IR) and Natural language Processing (NLP) tasks such as summarization and indexing. This study investigates deep learning approaches to Arabic keyphrase extraction. We address the problem as sequence classification and create a Bi-LSTM model to classify each sequence token as either part of the keyphrase or outside of it. We have extracted word embeddings from two pre-trained models, Word2Vec and BERT. Moreover, we have investigated the effect of incorporating linguistic, positional, and statistical features with word embeddings on performance. Our best-performing model has achieved 0.45 F1-score on ArabicKPE dataset when combining linguistic and positional features with BERT embedding.\",\"PeriodicalId\":355149,\"journal\":{\"name\":\"Workshop on Arabic Natural Language Processing\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Workshop on Arabic Natural Language Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18653/v1/2022.wanlp-1.30\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Workshop on Arabic Natural Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/2022.wanlp-1.30","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

关键词提取是许多信息检索(IR)和自然语言处理(NLP)任务(如摘要和索引)的关键。本研究探讨了阿拉伯语关键词提取的深度学习方法。我们将这个问题作为序列分类来解决，并创建一个Bi-LSTM模型，将每个序列令牌分类为关键短语的一部分或它的外部。我们从两个预训练模型Word2Vec和BERT中提取了词嵌入。此外，我们还研究了将语言、位置和统计特征与词嵌入相结合对性能的影响。将语言和位置特征与BERT嵌入相结合，我们的最佳模型在ArabicKPE数据集上获得了0.45 f1分。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Arabic Keyphrase Extraction: Enhancing Deep Learning Models with Pre-trained Contextual Embedding and External Features

Keyphrase extraction is essential to many Information retrieval (IR) and Natural language Processing (NLP) tasks such as summarization and indexing. This study investigates deep learning approaches to Arabic keyphrase extraction. We address the problem as sequence classification and create a Bi-LSTM model to classify each sequence token as either part of the keyphrase or outside of it. We have extracted word embeddings from two pre-trained models, Word2Vec and BERT. Moreover, we have investigated the effect of incorporating linguistic, positional, and statistical features with word embeddings on performance. Our best-performing model has achieved 0.45 F1-score on ArabicKPE dataset when combining linguistic and positional features with BERT embedding.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Workshop on Arabic Natural Language Processing

自引率

0.00%

发文量