Proceedings of The 5th International Workshop on Deep Learning in Computational Physics

Evaluation of Machine Learning Methods for Relation Extraction Between Drug Adverse Effects and Medications in Russian Texts of Internet User Reviews 俄文互联网用户评论文本中药物不良反应与药物关系提取的机器学习方法评价

Proceedings of The 5th International Workshop on Deep Learning in Computational Physics — PoS(DLCP2021) Pub Date : 2021-12-01 DOI: 10.22323/1.410.0006

A. Sboev, A. Selivanov, R. Rybka, I. Moloshnikov, Gleb Rylkov

{"title":"Evaluation of Machine Learning Methods for Relation Extraction Between Drug Adverse Effects and Medications in Russian Texts of Internet User Reviews","authors":"A. Sboev, A. Selivanov, R. Rybka, I. Moloshnikov, Gleb Rylkov","doi":"10.22323/1.410.0006","DOIUrl":"https://doi.org/10.22323/1.410.0006","url":null,"abstract":"The research considers an automatic extraction of relations between mentions of medications and adverse drug reactions in Russian-language drug reviews. This text analyzing method might be useful for pharmacovigilance and medicines reprofiling. Its application to Russian-language reviews hasn’t been studied yet due to the lack of corpora with relation annotation in Russian. The study is aimed at solving this problem. It is based on the original dataset gathered by our group. It consists of annotated relations between entities from the Russian Drug Review Corpus, that contains the Internet users’ reviews on medications in Russian language. Computational experiments were carried out on developed corpora using classical machine learning methods, as well as amore advanced neural networkmodel based on Transformer layers –XLM-RoBERTa-sag. The list of applied classical machine learning methods consists of support vector machine, logistic regression, Naive Bayes classifier and gradient boosting. The concatenation of TF-IDF entity vectors of character n-grams was used as a text representation. Based on a set of experiments, the following hyperparameters of these methods were selected: the size of n-grams and the limitation on the frequency of occurrence of n-grams (too rare or too frequent n-grams were excluded from the feature vector). For XLM-RoBERTa-sag, the input data is represented as usual for such type of models (languagemodels based on Transformer topology). The following input text representation types were considered during the experiments: a whole text, a text of target entity pairs; a text of target entity pairs with words between them; a text of target entity pairs and the whole input text, the latter input type is the one that maximizes accuracy. It is shown that XLM-RoBERTa-sag model achieves a result of 95%, according to the macro-averaged f1 metric, which is the stateof-the-art result of recognition of the relations between mentions of adverse drug reactions and medications in Russian-language online reviews. The Naive Bayes classifier with multivariate normal distribution achieves the best result among classical machine learning methods: 75%, which exceeds the result of random label generation by 21%.","PeriodicalId":217453,"journal":{"name":"Proceedings of The 5th International Workshop on Deep Learning in Computational Physics — PoS(DLCP2021)","volume":"69 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132802531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

The Russian Language Corpus and a Neural Network to Analyse Internet Tweet Reports About Covid-19 俄语语料库和神经网络分析有关Covid-19的互联网推特报道

Proceedings of The 5th International Workshop on Deep Learning in Computational Physics — PoS(DLCP2021) Pub Date : 2021-12-01 DOI: 10.22323/1.410.0017

A. Sboev, I. Moloshnikov, A. Naumov, Anastasia Levochkina, R. Rybka

{"title":"The Russian Language Corpus and a Neural Network to Analyse Internet Tweet Reports About Covid-19","authors":"A. Sboev, I. Moloshnikov, A. Naumov, Anastasia Levochkina, R. Rybka","doi":"10.22323/1.410.0017","DOIUrl":"https://doi.org/10.22323/1.410.0017","url":null,"abstract":"This work is aimed at creating a tool for filtering messages from Twitter users by the presence of mentions of coronavirus disease in them. For this purpose, a corpus of Russian-language tweets was created, which contains the part of 10 thousand tweets that are manually divided into several classes with different levels of confidence: potentially have covid, have covid now, other cases, and an unmarked part – 2 million tweets on the topic of the pandemic. The paper presents the process of creating a corpus of tweets from the stage of data collection, their preliminary filtering and subsequent annotation according to the presence of disease description. Machine learning methods were compared according to classification task on tweets. It is shown that a model based on the XLM-RoBERTa topology with additional training on corpus of unmarked tweets gives the F1 score of 0.85 on binary classification task (\"potentially have covid have covid now\" vs \"other\"). This is 12% higher relative to the simplest model using TF-IDF encoding and SVM classifier and 5% higher relative to the RuDR-BERT-based model. The created toolkit will expand the feature space of models for predicting the spread of coronavirus infection and other pandemics by adding the dynamics of discussion on social networks, which characterizes people’s attitudes towards it. © Copyright owned by the author(s) under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (CC BY-NC-ND 4.0).","PeriodicalId":217453,"journal":{"name":"Proceedings of The 5th International Workshop on Deep Learning in Computational Physics — PoS(DLCP2021)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134456089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Gamma/Hadron Separation for a Ground Based IACT in Experiment TAIGA Using Machine Learning Methods Random Forest 基于机器学习方法的TAIGA实验中地面IACT的伽玛/强子分离

Proceedings of The 5th International Workshop on Deep Learning in Computational Physics — PoS(DLCP2021) Pub Date : 2021-12-01 DOI: 10.22323/1.410.0008

Maria Vasyutina

引用次数: 2

Application of Deep Learning Technique to an Analysis of Hard Scattering Processes at Colliders 深度学习技术在对撞机硬散射过程分析中的应用

Proceedings of The 5th International Workshop on Deep Learning in Computational Physics — PoS(DLCP2021) Pub Date : 2021-09-14 DOI: 10.22323/1.410.0012

L. Dudko, P. Volkov, G. Vorotnikov, A. Zaborenko

引用次数: 0

Proceedings of The 5th International Workshop on Deep Learning in Computational Physics — PoS(DLCP2021)最新文献