Study of Low Resource Language Document Extractive Summarization using Lexical chain and Bidirectional Encoder Representations from Transformers (BERT)
{"title":"Study of Low Resource Language Document Extractive Summarization using Lexical chain and Bidirectional Encoder Representations from Transformers (BERT)","authors":"Pranjali Deshpande, Sunita Jahirabadkar","doi":"10.1109/ComPE53109.2021.9751919","DOIUrl":null,"url":null,"abstract":"Language is the basic and unique tool of communication for humans. More than 7000 languages exist on our planet. Among these, the languages which lack in linguistic resources for building statistical NLP applications are known as low resource languages (LRL). Written communication is the most popular medium for humans to express and preserve their thoughts. Advancements in technology are bringing the world closer by facilitating remote communication access. Due to increase in the use of internet, with every second new textual information is getting generated. Not all this textual information is useful. With this context the task of summarization is gaining importance. Summary can be generated by two ways: Extractive and Abstractive. In Extractive summarization the key phrases and key sentences in the source document are retained, whereas Abstractive summary is generated by rewriting the key sentences. The task of summarization becomes more challenging in case of LRL documents. The paper focuses on the experiments carried out for extractive summarization of LRL documents using two approaches: Lexical chain and BERT.","PeriodicalId":211704,"journal":{"name":"2021 International Conference on Computational Performance Evaluation (ComPE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Computational Performance Evaluation (ComPE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ComPE53109.2021.9751919","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Language is the basic and unique tool of communication for humans. More than 7000 languages exist on our planet. Among these, the languages which lack in linguistic resources for building statistical NLP applications are known as low resource languages (LRL). Written communication is the most popular medium for humans to express and preserve their thoughts. Advancements in technology are bringing the world closer by facilitating remote communication access. Due to increase in the use of internet, with every second new textual information is getting generated. Not all this textual information is useful. With this context the task of summarization is gaining importance. Summary can be generated by two ways: Extractive and Abstractive. In Extractive summarization the key phrases and key sentences in the source document are retained, whereas Abstractive summary is generated by rewriting the key sentences. The task of summarization becomes more challenging in case of LRL documents. The paper focuses on the experiments carried out for extractive summarization of LRL documents using two approaches: Lexical chain and BERT.