Olesia Barkovska, Patrik Rusnak, Vitalii Tkachov, T. Muzyka
{"title":"Impact of Stemming on Efficiency of Messages Likelihood Definition in Telegram Newsfeeds","authors":"Olesia Barkovska, Patrik Rusnak, Vitalii Tkachov, T. Muzyka","doi":"10.1109/KhPIWeek57572.2022.9916415","DOIUrl":null,"url":null,"abstract":"The work is dedicated to the development of the system to define the credibility of text messages posted in Telegram newsfeeds. The topicality of the work is stipulated by the concentration of information and its ability to influence shaping of the social opinions on the state relations and political moods via news feeds in messengers and social networks, the number of which is constantly growing and supported by bots and biased authors. The proposed system functions on the basis of coordination of text parsing, text processing, database with messages from the official sources of information, and the client (author) database. The degree of similarity of the generated text messages is determined on the basis of defining Damerau-Levenshtein distance in the Text Processing Module. The work shows it is possible to increase the efficiency (up to 1,44 times for messages of around 1500 symbols) of the given module performance through incoming messages stemming at the preprocessing stage because this enables to reduce the computational complexity of Damerau-Levenshtein method at the expense of word shortening to their stem via neglecting auxiliary parts such as suffixes and endings. Thus, stemming helps to reduce the amount of symbols to be processed at the very stage of Damera u-Levenshtein algorithm application, which proves feasibily of applying stemming in the preprocessing block.","PeriodicalId":197096,"journal":{"name":"2022 IEEE 3rd KhPI Week on Advanced Technology (KhPIWeek)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 3rd KhPI Week on Advanced Technology (KhPIWeek)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/KhPIWeek57572.2022.9916415","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
The work is dedicated to the development of the system to define the credibility of text messages posted in Telegram newsfeeds. The topicality of the work is stipulated by the concentration of information and its ability to influence shaping of the social opinions on the state relations and political moods via news feeds in messengers and social networks, the number of which is constantly growing and supported by bots and biased authors. The proposed system functions on the basis of coordination of text parsing, text processing, database with messages from the official sources of information, and the client (author) database. The degree of similarity of the generated text messages is determined on the basis of defining Damerau-Levenshtein distance in the Text Processing Module. The work shows it is possible to increase the efficiency (up to 1,44 times for messages of around 1500 symbols) of the given module performance through incoming messages stemming at the preprocessing stage because this enables to reduce the computational complexity of Damerau-Levenshtein method at the expense of word shortening to their stem via neglecting auxiliary parts such as suffixes and endings. Thus, stemming helps to reduce the amount of symbols to be processed at the very stage of Damera u-Levenshtein algorithm application, which proves feasibily of applying stemming in the preprocessing block.