{"title":"基于词法链和双向编码器表示的低资源语言文档抽取摘要研究","authors":"Pranjali Deshpande, Sunita Jahirabadkar","doi":"10.1109/ComPE53109.2021.9751919","DOIUrl":null,"url":null,"abstract":"Language is the basic and unique tool of communication for humans. More than 7000 languages exist on our planet. Among these, the languages which lack in linguistic resources for building statistical NLP applications are known as low resource languages (LRL). Written communication is the most popular medium for humans to express and preserve their thoughts. Advancements in technology are bringing the world closer by facilitating remote communication access. Due to increase in the use of internet, with every second new textual information is getting generated. Not all this textual information is useful. With this context the task of summarization is gaining importance. Summary can be generated by two ways: Extractive and Abstractive. In Extractive summarization the key phrases and key sentences in the source document are retained, whereas Abstractive summary is generated by rewriting the key sentences. The task of summarization becomes more challenging in case of LRL documents. The paper focuses on the experiments carried out for extractive summarization of LRL documents using two approaches: Lexical chain and BERT.","PeriodicalId":211704,"journal":{"name":"2021 International Conference on Computational Performance Evaluation (ComPE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Study of Low Resource Language Document Extractive Summarization using Lexical chain and Bidirectional Encoder Representations from Transformers (BERT)\",\"authors\":\"Pranjali Deshpande, Sunita Jahirabadkar\",\"doi\":\"10.1109/ComPE53109.2021.9751919\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Language is the basic and unique tool of communication for humans. More than 7000 languages exist on our planet. Among these, the languages which lack in linguistic resources for building statistical NLP applications are known as low resource languages (LRL). Written communication is the most popular medium for humans to express and preserve their thoughts. Advancements in technology are bringing the world closer by facilitating remote communication access. Due to increase in the use of internet, with every second new textual information is getting generated. Not all this textual information is useful. With this context the task of summarization is gaining importance. Summary can be generated by two ways: Extractive and Abstractive. In Extractive summarization the key phrases and key sentences in the source document are retained, whereas Abstractive summary is generated by rewriting the key sentences. The task of summarization becomes more challenging in case of LRL documents. The paper focuses on the experiments carried out for extractive summarization of LRL documents using two approaches: Lexical chain and BERT.\",\"PeriodicalId\":211704,\"journal\":{\"name\":\"2021 International Conference on Computational Performance Evaluation (ComPE)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Conference on Computational Performance Evaluation (ComPE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ComPE53109.2021.9751919\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Computational Performance Evaluation (ComPE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ComPE53109.2021.9751919","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Study of Low Resource Language Document Extractive Summarization using Lexical chain and Bidirectional Encoder Representations from Transformers (BERT)
Language is the basic and unique tool of communication for humans. More than 7000 languages exist on our planet. Among these, the languages which lack in linguistic resources for building statistical NLP applications are known as low resource languages (LRL). Written communication is the most popular medium for humans to express and preserve their thoughts. Advancements in technology are bringing the world closer by facilitating remote communication access. Due to increase in the use of internet, with every second new textual information is getting generated. Not all this textual information is useful. With this context the task of summarization is gaining importance. Summary can be generated by two ways: Extractive and Abstractive. In Extractive summarization the key phrases and key sentences in the source document are retained, whereas Abstractive summary is generated by rewriting the key sentences. The task of summarization becomes more challenging in case of LRL documents. The paper focuses on the experiments carried out for extractive summarization of LRL documents using two approaches: Lexical chain and BERT.