基于迁移学习的低资源语言问答系统

2021 International Conference on Computational Intelligence and Computing Applications (ICCICA) Pub Date : 2021-11-26 DOI:10.1109/iccica52458.2021.9697268

Aarushi Phade, Y. Haribhakta

{"title":"基于迁移学习的低资源语言问答系统","authors":"Aarushi Phade, Y. Haribhakta","doi":"10.1109/iccica52458.2021.9697268","DOIUrl":null,"url":null,"abstract":"This paper proposes a Question Answering System for Marathi language using Transfer Learning. A well performing Question Answering system leverages the word embeddings used in the system. Producing word embeddings for a language from the scratch is a drawn-out task and requires tremendous dataset and huge computing resources. Utilizing word embeddings created from a limited dataset in NLP tasks prompts average per-formance. Instead utilizing word embeddings from pre-trained models saves a lot of time, and gives great performance, since these models have more learnable parameters and are trained on huge datasets. Our framework uses Multilingual BERT model as pre-trained source model having 110M parameters which leads to effective word representation. We have fine-tuned this BERT model for QAS with the assistance of a small, custom dataset similar to SQuAD, intended for this framework. The system uses Bert-score and F1-score as its evaluation methods. It achieves F1-score of 56.7% and Bert-score of 69.08%. The system being the first of its kind in Marathi language lays the groundwork for future research.","PeriodicalId":327193,"journal":{"name":"2021 International Conference on Computational Intelligence and Computing Applications (ICCICA)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Question Answering System for low resource language using Transfer Learning\",\"authors\":\"Aarushi Phade, Y. Haribhakta\",\"doi\":\"10.1109/iccica52458.2021.9697268\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper proposes a Question Answering System for Marathi language using Transfer Learning. A well performing Question Answering system leverages the word embeddings used in the system. Producing word embeddings for a language from the scratch is a drawn-out task and requires tremendous dataset and huge computing resources. Utilizing word embeddings created from a limited dataset in NLP tasks prompts average per-formance. Instead utilizing word embeddings from pre-trained models saves a lot of time, and gives great performance, since these models have more learnable parameters and are trained on huge datasets. Our framework uses Multilingual BERT model as pre-trained source model having 110M parameters which leads to effective word representation. We have fine-tuned this BERT model for QAS with the assistance of a small, custom dataset similar to SQuAD, intended for this framework. The system uses Bert-score and F1-score as its evaluation methods. It achieves F1-score of 56.7% and Bert-score of 69.08%. The system being the first of its kind in Marathi language lays the groundwork for future research.\",\"PeriodicalId\":327193,\"journal\":{\"name\":\"2021 International Conference on Computational Intelligence and Computing Applications (ICCICA)\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Conference on Computational Intelligence and Computing Applications (ICCICA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/iccica52458.2021.9697268\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Computational Intelligence and Computing Applications (ICCICA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iccica52458.2021.9697268","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

本文提出了一种基于迁移学习的马拉地语问答系统。一个表现良好的问答系统利用了系统中使用的词嵌入。从头开始为一种语言生成词嵌入是一项漫长的任务，需要大量的数据集和巨大的计算资源。在NLP任务中使用从有限数据集创建的词嵌入可以提高平均性能。相反，使用预训练模型中的词嵌入可以节省大量时间，并提供出色的性能，因为这些模型具有更多可学习的参数，并且是在庞大的数据集上训练的。我们的框架使用多语言BERT模型作为预训练的源模型，该模型具有110M个参数，可以有效地表示单词。我们在一个类似于SQuAD的小型自定义数据集的帮助下，对QAS的BERT模型进行了微调。系统采用Bert-score和F1-score作为评价方法。f1得分56.7%，bert得分69.08%。该系统是马拉地语的首个此类系统，为未来的研究奠定了基础。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Question Answering System for low resource language using Transfer Learning

This paper proposes a Question Answering System for Marathi language using Transfer Learning. A well performing Question Answering system leverages the word embeddings used in the system. Producing word embeddings for a language from the scratch is a drawn-out task and requires tremendous dataset and huge computing resources. Utilizing word embeddings created from a limited dataset in NLP tasks prompts average per-formance. Instead utilizing word embeddings from pre-trained models saves a lot of time, and gives great performance, since these models have more learnable parameters and are trained on huge datasets. Our framework uses Multilingual BERT model as pre-trained source model having 110M parameters which leads to effective word representation. We have fine-tuned this BERT model for QAS with the assistance of a small, custom dataset similar to SQuAD, intended for this framework. The system uses Bert-score and F1-score as its evaluation methods. It achieves F1-score of 56.7% and Bert-score of 69.08%. The system being the first of its kind in Marathi language lays the groundwork for future research.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 International Conference on Computational Intelligence and Computing Applications (ICCICA)

自引率

0.00%

发文量