Study of Low Resource Language Document Extractive Summarization using Lexical chain and Bidirectional Encoder Representations from Transformers (BERT)

2021 International Conference on Computational Performance Evaluation (ComPE) Pub Date : 2021-12-01 DOI:10.1109/ComPE53109.2021.9751919

Pranjali Deshpande, Sunita Jahirabadkar

{"title":"Study of Low Resource Language Document Extractive Summarization using Lexical chain and Bidirectional Encoder Representations from Transformers (BERT)","authors":"Pranjali Deshpande, Sunita Jahirabadkar","doi":"10.1109/ComPE53109.2021.9751919","DOIUrl":null,"url":null,"abstract":"Language is the basic and unique tool of communication for humans. More than 7000 languages exist on our planet. Among these, the languages which lack in linguistic resources for building statistical NLP applications are known as low resource languages (LRL). Written communication is the most popular medium for humans to express and preserve their thoughts. Advancements in technology are bringing the world closer by facilitating remote communication access. Due to increase in the use of internet, with every second new textual information is getting generated. Not all this textual information is useful. With this context the task of summarization is gaining importance. Summary can be generated by two ways: Extractive and Abstractive. In Extractive summarization the key phrases and key sentences in the source document are retained, whereas Abstractive summary is generated by rewriting the key sentences. The task of summarization becomes more challenging in case of LRL documents. The paper focuses on the experiments carried out for extractive summarization of LRL documents using two approaches: Lexical chain and BERT.","PeriodicalId":211704,"journal":{"name":"2021 International Conference on Computational Performance Evaluation (ComPE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Computational Performance Evaluation (ComPE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ComPE53109.2021.9751919","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Language is the basic and unique tool of communication for humans. More than 7000 languages exist on our planet. Among these, the languages which lack in linguistic resources for building statistical NLP applications are known as low resource languages (LRL). Written communication is the most popular medium for humans to express and preserve their thoughts. Advancements in technology are bringing the world closer by facilitating remote communication access. Due to increase in the use of internet, with every second new textual information is getting generated. Not all this textual information is useful. With this context the task of summarization is gaining importance. Summary can be generated by two ways: Extractive and Abstractive. In Extractive summarization the key phrases and key sentences in the source document are retained, whereas Abstractive summary is generated by rewriting the key sentences. The task of summarization becomes more challenging in case of LRL documents. The paper focuses on the experiments carried out for extractive summarization of LRL documents using two approaches: Lexical chain and BERT.

查看原文本刊更多论文

基于词法链和双向编码器表示的低资源语言文档抽取摘要研究

语言是人类最基本、最独特的交流工具。地球上有7000多种语言。其中，缺乏语言资源来构建统计NLP应用的语言被称为低资源语言(LRL)。书面交流是人类表达和保存思想的最流行的媒介。技术的进步通过促进远程通信接入使世界更加紧密。由于互联网使用的增加，每秒钟都有新的文本信息产生。并非所有的文本信息都是有用的。在这种背景下，总结的任务变得越来越重要。摘要可以通过两种方式生成:抽取和抽象。摘要提取是保留源文档中的关键短语和关键句子，而摘要抽象是通过重写关键句子生成的。对于LRL文档，摘要任务变得更具挑战性。本文重点研究了使用词法链和BERT两种方法对LRL文档进行提取摘要的实验。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 International Conference on Computational Performance Evaluation (ComPE)

自引率

0.00%

发文量