{"title":"Recognizing fake news based on natural language processing using the BM25 algorithm with fine-tuned parameters","authors":"Liudmyla Mishchenko, Iryna Klymenko","doi":"10.15587/1729-4061.2023.293513","DOIUrl":null,"url":null,"abstract":"The object of the research is the method of natural language processing (NLP) with balanced parameters of the BestMatch25 (ВМ25) algorithm to recognize and classify fake news based on natural language processing (NLP). The unsatisfactory accuracy and speed of existing methods for detecting fake news in unstructured input data demanded the development of a new approach for their effective detection. The study investigated the BM25 algorithm, methods for selecting parameters k1 and b, and their impact on the algorithm's effectiveness in detecting fake news. It was established that precise and detailed adjustment of these parameters is crucial in achieving optimal accuracy and data processing speed. The results showed that the successful selection of BM25 parameters improves the model's accuracy by up to 14 % compared to standard term frequency – inverse document frequency (TF-IDF) calculations. These results were made possible by experimentally tuning different combinations of k1 and b parameters, in which the algorithm shows the best speed indicator or the most accurate estimate of the importance of a term in a document. Balanced values of k1 and b parameters were identified, leading to the algorithm's optimal speed and accuracy in assessing word importance considering the input data's peculiarities. The balanced setting of the BM25 algorithm parameters explains the obtained results. They can be used for automated recognition and analysis of news and information on social media based on natural language processing. However, in practice, the effectiveness of the set of parameters depends on linguistic variations, content, and the theme within new input data sets","PeriodicalId":11433,"journal":{"name":"Eastern-European Journal of Enterprise Technologies","volume":"6 5","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Eastern-European Journal of Enterprise Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15587/1729-4061.2023.293513","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Mathematics","Score":null,"Total":0}
引用次数: 0
Abstract
The object of the research is the method of natural language processing (NLP) with balanced parameters of the BestMatch25 (ВМ25) algorithm to recognize and classify fake news based on natural language processing (NLP). The unsatisfactory accuracy and speed of existing methods for detecting fake news in unstructured input data demanded the development of a new approach for their effective detection. The study investigated the BM25 algorithm, methods for selecting parameters k1 and b, and their impact on the algorithm's effectiveness in detecting fake news. It was established that precise and detailed adjustment of these parameters is crucial in achieving optimal accuracy and data processing speed. The results showed that the successful selection of BM25 parameters improves the model's accuracy by up to 14 % compared to standard term frequency – inverse document frequency (TF-IDF) calculations. These results were made possible by experimentally tuning different combinations of k1 and b parameters, in which the algorithm shows the best speed indicator or the most accurate estimate of the importance of a term in a document. Balanced values of k1 and b parameters were identified, leading to the algorithm's optimal speed and accuracy in assessing word importance considering the input data's peculiarities. The balanced setting of the BM25 algorithm parameters explains the obtained results. They can be used for automated recognition and analysis of news and information on social media based on natural language processing. However, in practice, the effectiveness of the set of parameters depends on linguistic variations, content, and the theme within new input data sets
期刊介绍:
Terminology used in the title of the "East European Journal of Enterprise Technologies" - "enterprise technologies" should be read as "industrial technologies". "Eastern-European Journal of Enterprise Technologies" publishes all those best ideas from the science, which can be introduced in the industry. Since, obtaining the high-quality, competitive industrial products is based on introducing high technologies from various independent spheres of scientific researches, but united by a common end result - a finished high-technology product. Among these scientific spheres, there are engineering, power engineering and energy saving, technologies of inorganic and organic substances and materials science, information technologies and control systems. Publishing scientific papers in these directions are the main development "vectors" of the "Eastern-European Journal of Enterprise Technologies". Since, these are those directions of scientific researches, the results of which can be directly used in modern industrial production: space and aircraft industry, instrument-making industry, mechanical engineering, power engineering, chemical industry and metallurgy.