Massively Multilingual Language Models for Cross Lingual Fact Extraction from Low Resource Indian Languages

Q3 Arts and Humanities

Icon Pub Date : 2023-02-09 DOI:10.48550/arXiv.2302.04790

Bhavyajeet Singh, Pavan Kandru, Anubhav Sharma, Vasudeva Varma

引用次数: 1

Abstract

Massive knowledge graphs like Wikidata attempt to capture world knowledge about multiple entities. Recent approaches concentrate on automatically enriching these KGs from text. However a lot of information present in the form of natural text in low resource languages is often missed out. Cross Lingual Information Extraction aims at extracting factual information in the form of English triples from low resource Indian Language text. Despite its massive potential, progress made on this task is lagging when compared to Monolingual Information Extraction. In this paper, we propose the task of Cross Lingual Fact Extraction(CLFE) from text and devise an end-to-end generative approach for the same which achieves an overall F1 score of 77.46

查看原文本刊更多论文

低资源印度语言跨语言事实抽取的大规模多语言模型

像Wikidata这样的海量知识图试图捕捉关于多个实体的世界知识。最近的方法集中于从文本中自动丰富这些KGs。然而，在资源匮乏的语言中，以自然文本的形式呈现的许多信息往往被遗漏了。跨语言信息提取旨在从低资源的印度语文本中提取英语三元组形式的事实信息。尽管有巨大的潜力，但与单语言信息提取相比，这项任务的进展是滞后的。在本文中，我们提出了从文本中提取跨语言事实（CLFE）的任务，并为此设计了一种端到端的生成方法，该方法的F1总分为77.46

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Icon Arts and Humanities-History and Philosophy of Science

CiteScore

0.30

自引率

0.00%

发文量