{"title":"Automatic Summarizing the News from Inform.kz by Using Natural Language Processing Tools","authors":"B. Kynabay, Aimoldir Aldabergen, A. Zhamanov","doi":"10.1109/SIST50301.2021.9465885","DOIUrl":null,"url":null,"abstract":"The rapid rise of the information on the web brought up new problems of data access and processing. Therefore there is a need for tools that will help to overcome the problem of management and handling the Big Data in a quick manner. The primary goal of this work is to propose an efficient method for automatic text summarization by using Natural Language Processing (NLP) and Machine Learning (ML) techniques. This research introduces an abrupt, easily understandable and uncomplicated implementation of this method via overusing Python programming language. Efficient performance is necessary in web search tasks where an enormous of unstructured data need to be summarized very quickly. The novelty of the work is that text summarization is implemented on Kazakh texts. Extractive summarization uses new, keywords focused, approach. Contribution of the work is manually created stop words used for text summarization specifically for Kazakh language and dataset constructed by scraping news from country’s largest international news portal www.inform.kz. State-of-the-art results of the work show that it is possible to implement automatic text summarization for Kazakh language.","PeriodicalId":318915,"journal":{"name":"2021 IEEE International Conference on Smart Information Systems and Technologies (SIST)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Smart Information Systems and Technologies (SIST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SIST50301.2021.9465885","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
The rapid rise of the information on the web brought up new problems of data access and processing. Therefore there is a need for tools that will help to overcome the problem of management and handling the Big Data in a quick manner. The primary goal of this work is to propose an efficient method for automatic text summarization by using Natural Language Processing (NLP) and Machine Learning (ML) techniques. This research introduces an abrupt, easily understandable and uncomplicated implementation of this method via overusing Python programming language. Efficient performance is necessary in web search tasks where an enormous of unstructured data need to be summarized very quickly. The novelty of the work is that text summarization is implemented on Kazakh texts. Extractive summarization uses new, keywords focused, approach. Contribution of the work is manually created stop words used for text summarization specifically for Kazakh language and dataset constructed by scraping news from country’s largest international news portal www.inform.kz. State-of-the-art results of the work show that it is possible to implement automatic text summarization for Kazakh language.