阿拉伯命名实体识别:什么是有效的，下一步是什么

WANLP@ACL 2019 Pub Date : 2019-08-01 DOI:10.18653/v1/W19-4607

Liyuan Liu, Jingbo Shang, Jiawei Han

{"title":"阿拉伯命名实体识别:什么是有效的，下一步是什么","authors":"Liyuan Liu, Jingbo Shang, Jiawei Han","doi":"10.18653/v1/W19-4607","DOIUrl":null,"url":null,"abstract":"This paper presents the winning solution to the Arabic Named Entity Recognition challenge run by Topcoder.com. The proposed model integrates various tailored techniques together, including representation learning, feature engineering, sequence labeling, and ensemble learning. The final model achieves a test F_1 score of 75.82% on the AQMAR dataset and outperforms baselines by a large margin. Detailed analyses are conducted to reveal both its strengths and limitations. Specifically, we observe that (1) representation learning modules can significantly boost the performance but requires a proper pre-processing and (2) the resulting embedding can be further enhanced with feature engineering due to the limited size of the training data. All implementations and pre-trained models are made public.","PeriodicalId":268163,"journal":{"name":"WANLP@ACL 2019","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":"{\"title\":\"Arabic Named Entity Recognition: What Works and What’s Next\",\"authors\":\"Liyuan Liu, Jingbo Shang, Jiawei Han\",\"doi\":\"10.18653/v1/W19-4607\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents the winning solution to the Arabic Named Entity Recognition challenge run by Topcoder.com. The proposed model integrates various tailored techniques together, including representation learning, feature engineering, sequence labeling, and ensemble learning. The final model achieves a test F_1 score of 75.82% on the AQMAR dataset and outperforms baselines by a large margin. Detailed analyses are conducted to reveal both its strengths and limitations. Specifically, we observe that (1) representation learning modules can significantly boost the performance but requires a proper pre-processing and (2) the resulting embedding can be further enhanced with feature engineering due to the limited size of the training data. All implementations and pre-trained models are made public.\",\"PeriodicalId\":268163,\"journal\":{\"name\":\"WANLP@ACL 2019\",\"volume\":\"2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"WANLP@ACL 2019\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18653/v1/W19-4607\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"WANLP@ACL 2019","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/W19-4607","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 17

摘要

本文提出了由Topcoder.com举办的阿拉伯语命名实体识别挑战赛的获胜方案。所提出的模型集成了各种定制技术，包括表示学习、特征工程、序列标记和集成学习。最终模型在AQMAR数据集上的测试F_1得分为75.82%，大大优于基线。并进行了详细的分析，以揭示其优势和局限性。具体来说，我们观察到(1)表示学习模块可以显著提高性能，但需要适当的预处理;(2)由于训练数据的大小有限，结果嵌入可以通过特征工程进一步增强。所有的实现和预训练模型都是公开的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Arabic Named Entity Recognition: What Works and What’s Next

This paper presents the winning solution to the Arabic Named Entity Recognition challenge run by Topcoder.com. The proposed model integrates various tailored techniques together, including representation learning, feature engineering, sequence labeling, and ensemble learning. The final model achieves a test F_1 score of 75.82% on the AQMAR dataset and outperforms baselines by a large margin. Detailed analyses are conducted to reveal both its strengths and limitations. Specifically, we observe that (1) representation learning modules can significantly boost the performance but requires a proper pre-processing and (2) the resulting embedding can be further enhanced with feature engineering due to the limited size of the training data. All implementations and pre-trained models are made public.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

WANLP@ACL 2019

自引率

0.00%

发文量