{"title":"Application of Deep Neural Networks for Automatic Irony Detection in Russian-Language Texts","authors":"M. A. Kosterin, I. V. Paramonov","doi":"10.3103/S0146411624700469","DOIUrl":null,"url":null,"abstract":"<p>This paper examines automatic methods for classifying Russian-language sentences into two classes: ironic and nonironic. The methods under consideration can be divided into three categories: classifiers based on language model embeddings, classifiers based on sentiment information, and classifiers that train embeddings to detect irony. The components of classifiers are neural networks such as BERT, RoBERTa, BiLSTM, and CNN, as well as an attention mechanism and fully connected layers. Experiments to detect irony are carried out using two corpora of Russian-language sentences: the first corpus is composed of journalistic texts from OpenCorpora, while the second corpus is an extension of the first one and is supplemented with ironic sentences from Wiktionary. The best results are demonstrated by a group of classifiers based on pure embeddings of language models with the maximum F-measure value of 0.84, achieved by a combination of RoBERTa, BiLSTM, an attention mechanism, and a pair of fully connected layers in experiments on an extended corpus. In general, using the extended corpus produces results that are 2–5% better than those using the basic corpus. The achieved results are the best for the problem under consideration for the Russian language and are comparable to the best ones for English.</p>","PeriodicalId":46238,"journal":{"name":"AUTOMATIC CONTROL AND COMPUTER SCIENCES","volume":"58 7","pages":"1073 - 1081"},"PeriodicalIF":0.6000,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"AUTOMATIC CONTROL AND COMPUTER SCIENCES","FirstCategoryId":"1085","ListUrlMain":"https://link.springer.com/article/10.3103/S0146411624700469","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
This paper examines automatic methods for classifying Russian-language sentences into two classes: ironic and nonironic. The methods under consideration can be divided into three categories: classifiers based on language model embeddings, classifiers based on sentiment information, and classifiers that train embeddings to detect irony. The components of classifiers are neural networks such as BERT, RoBERTa, BiLSTM, and CNN, as well as an attention mechanism and fully connected layers. Experiments to detect irony are carried out using two corpora of Russian-language sentences: the first corpus is composed of journalistic texts from OpenCorpora, while the second corpus is an extension of the first one and is supplemented with ironic sentences from Wiktionary. The best results are demonstrated by a group of classifiers based on pure embeddings of language models with the maximum F-measure value of 0.84, achieved by a combination of RoBERTa, BiLSTM, an attention mechanism, and a pair of fully connected layers in experiments on an extended corpus. In general, using the extended corpus produces results that are 2–5% better than those using the basic corpus. The achieved results are the best for the problem under consideration for the Russian language and are comparable to the best ones for English.
期刊介绍:
Automatic Control and Computer Sciences is a peer reviewed journal that publishes articles on• Control systems, cyber-physical system, real-time systems, robotics, smart sensors, embedded intelligence • Network information technologies, information security, statistical methods of data processing, distributed artificial intelligence, complex systems modeling, knowledge representation, processing and management • Signal and image processing, machine learning, machine perception, computer vision