Meltem Çetiner, Ahmet Yıldırım, Cüneyt Öksüz, Bahadir Onay
{"title":"基于规范化数据集的维吾尔语问答技术研究","authors":"Meltem Çetiner, Ahmet Yıldırım, Cüneyt Öksüz, Bahadir Onay","doi":"10.1109/UBMK52708.2021.9558981","DOIUrl":null,"url":null,"abstract":"Question Answering is a widely studied sub-field of Natural Language Processing (NLP). It studies information retrieval techniques that locate the answer in a corpus for a given query. Recently, deep learning techniques are widely employed in this field. This work uses a transfer learning method on Turkish Tax legislation documents. Experts in Tax-Law domain created 355 question-answer pairs in SQuAD 1.1 (Stanford Question Answering Dataset) format using law documents in UYAP (National Judiciary Informatics System). BERT (Bidirectional Encoder Representations from Transformers) contextual word embedding vectors are used to create a representation that can capture different meanings in word representations. Using both these embeddings and the model obtained from SQuAD 1.1 dataset, a system was deployed. Also, using the failing answers retrieved from the application of this model, a SQuAD 2.0 dataset were created that includes impossible-to-answer questions. New models were obtained by training with this dataset. Our observation is that the most successful model of SQuAD 2.0 dataset outperforms that of SQuAD 1.1 by 11% in exact matching measure and by 5% in F1.","PeriodicalId":106516,"journal":{"name":"2021 6th International Conference on Computer Science and Engineering (UBMK)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Mevzuat Verisetinde Soru Cevaplama Uygulamasi Question Answering Application on Legalisation Dataset\",\"authors\":\"Meltem Çetiner, Ahmet Yıldırım, Cüneyt Öksüz, Bahadir Onay\",\"doi\":\"10.1109/UBMK52708.2021.9558981\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Question Answering is a widely studied sub-field of Natural Language Processing (NLP). It studies information retrieval techniques that locate the answer in a corpus for a given query. Recently, deep learning techniques are widely employed in this field. This work uses a transfer learning method on Turkish Tax legislation documents. Experts in Tax-Law domain created 355 question-answer pairs in SQuAD 1.1 (Stanford Question Answering Dataset) format using law documents in UYAP (National Judiciary Informatics System). BERT (Bidirectional Encoder Representations from Transformers) contextual word embedding vectors are used to create a representation that can capture different meanings in word representations. Using both these embeddings and the model obtained from SQuAD 1.1 dataset, a system was deployed. Also, using the failing answers retrieved from the application of this model, a SQuAD 2.0 dataset were created that includes impossible-to-answer questions. New models were obtained by training with this dataset. Our observation is that the most successful model of SQuAD 2.0 dataset outperforms that of SQuAD 1.1 by 11% in exact matching measure and by 5% in F1.\",\"PeriodicalId\":106516,\"journal\":{\"name\":\"2021 6th International Conference on Computer Science and Engineering (UBMK)\",\"volume\":\"43 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-09-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 6th International Conference on Computer Science and Engineering (UBMK)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/UBMK52708.2021.9558981\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 6th International Conference on Computer Science and Engineering (UBMK)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/UBMK52708.2021.9558981","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Question Answering is a widely studied sub-field of Natural Language Processing (NLP). It studies information retrieval techniques that locate the answer in a corpus for a given query. Recently, deep learning techniques are widely employed in this field. This work uses a transfer learning method on Turkish Tax legislation documents. Experts in Tax-Law domain created 355 question-answer pairs in SQuAD 1.1 (Stanford Question Answering Dataset) format using law documents in UYAP (National Judiciary Informatics System). BERT (Bidirectional Encoder Representations from Transformers) contextual word embedding vectors are used to create a representation that can capture different meanings in word representations. Using both these embeddings and the model obtained from SQuAD 1.1 dataset, a system was deployed. Also, using the failing answers retrieved from the application of this model, a SQuAD 2.0 dataset were created that includes impossible-to-answer questions. New models were obtained by training with this dataset. Our observation is that the most successful model of SQuAD 2.0 dataset outperforms that of SQuAD 1.1 by 11% in exact matching measure and by 5% in F1.