使用带有微调参数的 BM25 算法,基于自然语言处理识别假新闻

Q3 Mathematics
Liudmyla Mishchenko, Iryna Klymenko
{"title":"使用带有微调参数的 BM25 算法,基于自然语言处理识别假新闻","authors":"Liudmyla Mishchenko, Iryna Klymenko","doi":"10.15587/1729-4061.2023.293513","DOIUrl":null,"url":null,"abstract":"The object of the research is the method of natural language processing (NLP) with balanced parameters of the BestMatch25 (ВМ25) algorithm to recognize and classify fake news based on natural language processing (NLP). The unsatisfactory accuracy and speed of existing methods for detecting fake news in unstructured input data demanded the development of a new approach for their effective detection. The study investigated the BM25 algorithm, methods for selecting parameters k1 and b, and their impact on the algorithm's effectiveness in detecting fake news. It was established that precise and detailed adjustment of these parameters is crucial in achieving optimal accuracy and data processing speed. The results showed that the successful selection of BM25 parameters improves the model's accuracy by up to 14 % compared to standard term frequency – inverse document frequency (TF-IDF) calculations. These results were made possible by experimentally tuning different combinations of k1 and b parameters, in which the algorithm shows the best speed indicator or the most accurate estimate of the importance of a term in a document. Balanced values of k1 and b parameters were identified, leading to the algorithm's optimal speed and accuracy in assessing word importance considering the input data's peculiarities. The balanced setting of the BM25 algorithm parameters explains the obtained results. They can be used for automated recognition and analysis of news and information on social media based on natural language processing. However, in practice, the effectiveness of the set of parameters depends on linguistic variations, content, and the theme within new input data sets","PeriodicalId":11433,"journal":{"name":"Eastern-European Journal of Enterprise Technologies","volume":"6 5","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Recognizing fake news based on natural language processing using the BM25 algorithm with fine-tuned parameters\",\"authors\":\"Liudmyla Mishchenko, Iryna Klymenko\",\"doi\":\"10.15587/1729-4061.2023.293513\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The object of the research is the method of natural language processing (NLP) with balanced parameters of the BestMatch25 (ВМ25) algorithm to recognize and classify fake news based on natural language processing (NLP). The unsatisfactory accuracy and speed of existing methods for detecting fake news in unstructured input data demanded the development of a new approach for their effective detection. The study investigated the BM25 algorithm, methods for selecting parameters k1 and b, and their impact on the algorithm's effectiveness in detecting fake news. It was established that precise and detailed adjustment of these parameters is crucial in achieving optimal accuracy and data processing speed. The results showed that the successful selection of BM25 parameters improves the model's accuracy by up to 14 % compared to standard term frequency – inverse document frequency (TF-IDF) calculations. These results were made possible by experimentally tuning different combinations of k1 and b parameters, in which the algorithm shows the best speed indicator or the most accurate estimate of the importance of a term in a document. Balanced values of k1 and b parameters were identified, leading to the algorithm's optimal speed and accuracy in assessing word importance considering the input data's peculiarities. The balanced setting of the BM25 algorithm parameters explains the obtained results. They can be used for automated recognition and analysis of news and information on social media based on natural language processing. However, in practice, the effectiveness of the set of parameters depends on linguistic variations, content, and the theme within new input data sets\",\"PeriodicalId\":11433,\"journal\":{\"name\":\"Eastern-European Journal of Enterprise Technologies\",\"volume\":\"6 5\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-12-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Eastern-European Journal of Enterprise Technologies\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.15587/1729-4061.2023.293513\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Mathematics\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Eastern-European Journal of Enterprise Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15587/1729-4061.2023.293513","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Mathematics","Score":null,"Total":0}
引用次数: 0

摘要

研究对象是基于自然语言处理(NLP)的最佳匹配25(ВМ25)算法平衡参数的自然语言处理(NLP)方法,用于识别和分类假新闻。现有方法在非结构化输入数据中检测假新闻的准确性和速度都不尽如人意,因此需要开发一种新方法来有效检测假新闻。研究调查了 BM25 算法、参数 k1 和 b 的选择方法及其对算法检测假新闻有效性的影响。结果表明,精确细致地调整这些参数对于实现最佳准确性和数据处理速度至关重要。结果表明,与标准词频-反向文档频率(TF-IDF)计算相比,成功选择 BM25 参数可将模型的准确性提高 14%。这些结果是通过实验调整不同的 k1 和 b 参数组合得出的,在这些组合中,算法显示出最佳的速度指标或对文档中术语重要性的最准确估计。考虑到输入数据的特殊性,我们确定了 k1 和 b 参数的平衡值,从而使算法在评估词语重要性时达到最佳速度和准确性。BM25 算法参数的均衡设置解释了所获得结果的原因。它们可用于基于自然语言处理的社交媒体新闻和信息的自动识别和分析。然而,在实践中,参数集的有效性取决于新输入数据集的语言变化、内容和主题。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Recognizing fake news based on natural language processing using the BM25 algorithm with fine-tuned parameters
The object of the research is the method of natural language processing (NLP) with balanced parameters of the BestMatch25 (ВМ25) algorithm to recognize and classify fake news based on natural language processing (NLP). The unsatisfactory accuracy and speed of existing methods for detecting fake news in unstructured input data demanded the development of a new approach for their effective detection. The study investigated the BM25 algorithm, methods for selecting parameters k1 and b, and their impact on the algorithm's effectiveness in detecting fake news. It was established that precise and detailed adjustment of these parameters is crucial in achieving optimal accuracy and data processing speed. The results showed that the successful selection of BM25 parameters improves the model's accuracy by up to 14 % compared to standard term frequency – inverse document frequency (TF-IDF) calculations. These results were made possible by experimentally tuning different combinations of k1 and b parameters, in which the algorithm shows the best speed indicator or the most accurate estimate of the importance of a term in a document. Balanced values of k1 and b parameters were identified, leading to the algorithm's optimal speed and accuracy in assessing word importance considering the input data's peculiarities. The balanced setting of the BM25 algorithm parameters explains the obtained results. They can be used for automated recognition and analysis of news and information on social media based on natural language processing. However, in practice, the effectiveness of the set of parameters depends on linguistic variations, content, and the theme within new input data sets
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Eastern-European Journal of Enterprise Technologies
Eastern-European Journal of Enterprise Technologies Mathematics-Applied Mathematics
CiteScore
2.00
自引率
0.00%
发文量
369
审稿时长
6 weeks
期刊介绍: Terminology used in the title of the "East European Journal of Enterprise Technologies" - "enterprise technologies" should be read as "industrial technologies". "Eastern-European Journal of Enterprise Technologies" publishes all those best ideas from the science, which can be introduced in the industry. Since, obtaining the high-quality, competitive industrial products is based on introducing high technologies from various independent spheres of scientific researches, but united by a common end result - a finished high-technology product. Among these scientific spheres, there are engineering, power engineering and energy saving, technologies of inorganic and organic substances and materials science, information technologies and control systems. Publishing scientific papers in these directions are the main development "vectors" of the "Eastern-European Journal of Enterprise Technologies". Since, these are those directions of scientific researches, the results of which can be directly used in modern industrial production: space and aircraft industry, instrument-making industry, mechanical engineering, power engineering, chemical industry and metallurgy.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信