Massively Multilingual Language Models for Cross Lingual Fact Extraction from Low Resource Indian Languages

Q3 Arts and Humanities
Icon Pub Date : 2023-02-09 DOI:10.48550/arXiv.2302.04790
Bhavyajeet Singh, Pavan Kandru, Anubhav Sharma, Vasudeva Varma
{"title":"Massively Multilingual Language Models for Cross Lingual Fact Extraction from Low Resource Indian Languages","authors":"Bhavyajeet Singh, Pavan Kandru, Anubhav Sharma, Vasudeva Varma","doi":"10.48550/arXiv.2302.04790","DOIUrl":null,"url":null,"abstract":"Massive knowledge graphs like Wikidata attempt to capture world knowledge about multiple entities. Recent approaches concentrate on automatically enriching these KGs from text. However a lot of information present in the form of natural text in low resource languages is often missed out. Cross Lingual Information Extraction aims at extracting factual information in the form of English triples from low resource Indian Language text. Despite its massive potential, progress made on this task is lagging when compared to Monolingual Information Extraction. In this paper, we propose the task of Cross Lingual Fact Extraction(CLFE) from text and devise an end-to-end generative approach for the same which achieves an overall F1 score of 77.46","PeriodicalId":53637,"journal":{"name":"Icon","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Icon","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2302.04790","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Arts and Humanities","Score":null,"Total":0}
引用次数: 1

Abstract

Massive knowledge graphs like Wikidata attempt to capture world knowledge about multiple entities. Recent approaches concentrate on automatically enriching these KGs from text. However a lot of information present in the form of natural text in low resource languages is often missed out. Cross Lingual Information Extraction aims at extracting factual information in the form of English triples from low resource Indian Language text. Despite its massive potential, progress made on this task is lagging when compared to Monolingual Information Extraction. In this paper, we propose the task of Cross Lingual Fact Extraction(CLFE) from text and devise an end-to-end generative approach for the same which achieves an overall F1 score of 77.46
低资源印度语言跨语言事实抽取的大规模多语言模型
像Wikidata这样的海量知识图试图捕捉关于多个实体的世界知识。最近的方法集中于从文本中自动丰富这些KGs。然而,在资源匮乏的语言中,以自然文本的形式呈现的许多信息往往被遗漏了。跨语言信息提取旨在从低资源的印度语文本中提取英语三元组形式的事实信息。尽管有巨大的潜力,但与单语言信息提取相比,这项任务的进展是滞后的。在本文中,我们提出了从文本中提取跨语言事实(CLFE)的任务,并为此设计了一种端到端的生成方法,该方法的F1总分为77.46
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Icon
Icon Arts and Humanities-History and Philosophy of Science
CiteScore
0.30
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信