{"title":"基于迁移学习的低资源语言问答系统","authors":"Aarushi Phade, Y. Haribhakta","doi":"10.1109/iccica52458.2021.9697268","DOIUrl":null,"url":null,"abstract":"This paper proposes a Question Answering System for Marathi language using Transfer Learning. A well performing Question Answering system leverages the word embeddings used in the system. Producing word embeddings for a language from the scratch is a drawn-out task and requires tremendous dataset and huge computing resources. Utilizing word embeddings created from a limited dataset in NLP tasks prompts average per-formance. Instead utilizing word embeddings from pre-trained models saves a lot of time, and gives great performance, since these models have more learnable parameters and are trained on huge datasets. Our framework uses Multilingual BERT model as pre-trained source model having 110M parameters which leads to effective word representation. We have fine-tuned this BERT model for QAS with the assistance of a small, custom dataset similar to SQuAD, intended for this framework. The system uses Bert-score and F1-score as its evaluation methods. It achieves F1-score of 56.7% and Bert-score of 69.08%. The system being the first of its kind in Marathi language lays the groundwork for future research.","PeriodicalId":327193,"journal":{"name":"2021 International Conference on Computational Intelligence and Computing Applications (ICCICA)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Question Answering System for low resource language using Transfer Learning\",\"authors\":\"Aarushi Phade, Y. Haribhakta\",\"doi\":\"10.1109/iccica52458.2021.9697268\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper proposes a Question Answering System for Marathi language using Transfer Learning. A well performing Question Answering system leverages the word embeddings used in the system. Producing word embeddings for a language from the scratch is a drawn-out task and requires tremendous dataset and huge computing resources. Utilizing word embeddings created from a limited dataset in NLP tasks prompts average per-formance. Instead utilizing word embeddings from pre-trained models saves a lot of time, and gives great performance, since these models have more learnable parameters and are trained on huge datasets. Our framework uses Multilingual BERT model as pre-trained source model having 110M parameters which leads to effective word representation. We have fine-tuned this BERT model for QAS with the assistance of a small, custom dataset similar to SQuAD, intended for this framework. The system uses Bert-score and F1-score as its evaluation methods. It achieves F1-score of 56.7% and Bert-score of 69.08%. The system being the first of its kind in Marathi language lays the groundwork for future research.\",\"PeriodicalId\":327193,\"journal\":{\"name\":\"2021 International Conference on Computational Intelligence and Computing Applications (ICCICA)\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Conference on Computational Intelligence and Computing Applications (ICCICA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/iccica52458.2021.9697268\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Computational Intelligence and Computing Applications (ICCICA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iccica52458.2021.9697268","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Question Answering System for low resource language using Transfer Learning
This paper proposes a Question Answering System for Marathi language using Transfer Learning. A well performing Question Answering system leverages the word embeddings used in the system. Producing word embeddings for a language from the scratch is a drawn-out task and requires tremendous dataset and huge computing resources. Utilizing word embeddings created from a limited dataset in NLP tasks prompts average per-formance. Instead utilizing word embeddings from pre-trained models saves a lot of time, and gives great performance, since these models have more learnable parameters and are trained on huge datasets. Our framework uses Multilingual BERT model as pre-trained source model having 110M parameters which leads to effective word representation. We have fine-tuned this BERT model for QAS with the assistance of a small, custom dataset similar to SQuAD, intended for this framework. The system uses Bert-score and F1-score as its evaluation methods. It achieves F1-score of 56.7% and Bert-score of 69.08%. The system being the first of its kind in Marathi language lays the groundwork for future research.