Research on Pre-training of Tibetan Natural Language Processing

2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML) Pub Date : 2021-07-16 DOI:10.1109/PRML52754.2021.9520714

Zhensong Li, Jie Zhu, Hong Cao

引用次数: 1

Abstract

In the field of natural language processing, pre-training can effectively improve the performance of downstream tasks. In recent years, pre-training has been continuously developed in Tibetan NLP. We built three pre-trained models of Tibetan Word2Vec, Tibetan ELMo, and Tibetan ALBERT, and applied them to the two downstream tasks of Tibetan text classification and Tibetan part-of-speech tagging. Comparing them with the baseline models of these two downstream tasks, it is found that the performance of the downstream tasks using the pre-training is significantly better than the baseline model. The three pre-trained models have also brought a gradual improvement in performance for Tibetan downstream tasks.

查看原文本刊更多论文

藏文自然语言处理的预训练研究

在自然语言处理领域，预训练可以有效地提高下游任务的性能。近年来，预训练在藏语自然语言处理中不断发展。构建了藏文Word2Vec、藏文ELMo和藏文ALBERT三个预训练模型，并将其应用于藏文文本分类和藏文词性标注两个下游任务。将它们与这两个下游任务的基线模型进行比较，发现使用预训练的下游任务的性能明显优于基线模型。这三种预训练模型也使藏区下游任务的性能逐步提高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)

自引率

0.00%

发文量