Proceedings of The 5th International Workshop on Deep Learning in Computational Physics — PoS(DLCP2021)最新文献

筛选
英文 中文
Evaluation of Machine Learning Methods for Relation Extraction Between Drug Adverse Effects and Medications in Russian Texts of Internet User Reviews 俄文互联网用户评论文本中药物不良反应与药物关系提取的机器学习方法评价
A. Sboev, A. Selivanov, R. Rybka, I. Moloshnikov, Gleb Rylkov
{"title":"Evaluation of Machine Learning Methods for Relation Extraction Between Drug Adverse Effects and Medications in Russian Texts of Internet User Reviews","authors":"A. Sboev, A. Selivanov, R. Rybka, I. Moloshnikov, Gleb Rylkov","doi":"10.22323/1.410.0006","DOIUrl":"https://doi.org/10.22323/1.410.0006","url":null,"abstract":"The research considers an automatic extraction of relations between mentions of medications and adverse drug reactions in Russian-language drug reviews. This text analyzing method might be useful for pharmacovigilance and medicines reprofiling. Its application to Russian-language reviews hasn’t been studied yet due to the lack of corpora with relation annotation in Russian. The study is aimed at solving this problem. It is based on the original dataset gathered by our group. It consists of annotated relations between entities from the Russian Drug Review Corpus, that contains the Internet users’ reviews on medications in Russian language. Computational experiments were carried out on developed corpora using classical machine learning methods, as well as amore advanced neural networkmodel based on Transformer layers –XLM-RoBERTa-sag. The list of applied classical machine learning methods consists of support vector machine, logistic regression, Naive Bayes classifier and gradient boosting. The concatenation of TF-IDF entity vectors of character n-grams was used as a text representation. Based on a set of experiments, the following hyperparameters of these methods were selected: the size of n-grams and the limitation on the frequency of occurrence of n-grams (too rare or too frequent n-grams were excluded from the feature vector). For XLM-RoBERTa-sag, the input data is represented as usual for such type of models (languagemodels based on Transformer topology). The following input text representation types were considered during the experiments: a whole text, a text of target entity pairs; a text of target entity pairs with words between them; a text of target entity pairs and the whole input text, the latter input type is the one that maximizes accuracy. It is shown that XLM-RoBERTa-sag model achieves a result of 95%, according to the macro-averaged f1 metric, which is the stateof-the-art result of recognition of the relations between mentions of adverse drug reactions and medications in Russian-language online reviews. The Naive Bayes classifier with multivariate normal distribution achieves the best result among classical machine learning methods: 75%, which exceeds the result of random label generation by 21%.","PeriodicalId":217453,"journal":{"name":"Proceedings of The 5th International Workshop on Deep Learning in Computational Physics — PoS(DLCP2021)","volume":"69 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132802531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
The Russian Language Corpus and a Neural Network to Analyse Internet Tweet Reports About Covid-19 俄语语料库和神经网络分析有关Covid-19的互联网推特报道
A. Sboev, I. Moloshnikov, A. Naumov, Anastasia Levochkina, R. Rybka
{"title":"The Russian Language Corpus and a Neural Network to Analyse Internet Tweet Reports About Covid-19","authors":"A. Sboev, I. Moloshnikov, A. Naumov, Anastasia Levochkina, R. Rybka","doi":"10.22323/1.410.0017","DOIUrl":"https://doi.org/10.22323/1.410.0017","url":null,"abstract":"This work is aimed at creating a tool for filtering messages from Twitter users by the presence of mentions of coronavirus disease in them. For this purpose, a corpus of Russian-language tweets was created, which contains the part of 10 thousand tweets that are manually divided into several classes with different levels of confidence: potentially have covid, have covid now, other cases, and an unmarked part – 2 million tweets on the topic of the pandemic. The paper presents the process of creating a corpus of tweets from the stage of data collection, their preliminary filtering and subsequent annotation according to the presence of disease description. Machine learning methods were compared according to classification task on tweets. It is shown that a model based on the XLM-RoBERTa topology with additional training on corpus of unmarked tweets gives the F1 score of 0.85 on binary classification task (\"potentially have covid have covid now\" vs \"other\"). This is 12% higher relative to the simplest model using TF-IDF encoding and SVM classifier and 5% higher relative to the RuDR-BERT-based model. The created toolkit will expand the feature space of models for predicting the spread of coronavirus infection and other pandemics by adding the dynamics of discussion on social networks, which characterizes people’s attitudes towards it. © Copyright owned by the author(s) under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (CC BY-NC-ND 4.0).","PeriodicalId":217453,"journal":{"name":"Proceedings of The 5th International Workshop on Deep Learning in Computational Physics — PoS(DLCP2021)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134456089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Gamma/Hadron Separation for a Ground Based IACT in Experiment TAIGA Using Machine Learning Methods Random Forest 基于机器学习方法的TAIGA实验中地面IACT的伽玛/强子分离
Maria Vasyutina
{"title":"Gamma/Hadron Separation for a Ground Based IACT in Experiment TAIGA Using Machine Learning Methods Random Forest","authors":"Maria Vasyutina","doi":"10.22323/1.410.0008","DOIUrl":"https://doi.org/10.22323/1.410.0008","url":null,"abstract":"","PeriodicalId":217453,"journal":{"name":"Proceedings of The 5th International Workshop on Deep Learning in Computational Physics — PoS(DLCP2021)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129873516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Application of Deep Learning Technique to an Analysis of Hard Scattering Processes at Colliders 深度学习技术在对撞机硬散射过程分析中的应用
L. Dudko, P. Volkov, G. Vorotnikov, A. Zaborenko
{"title":"Application of Deep Learning Technique to an Analysis of Hard Scattering Processes at Colliders","authors":"L. Dudko, P. Volkov, G. Vorotnikov, A. Zaborenko","doi":"10.22323/1.410.0012","DOIUrl":"https://doi.org/10.22323/1.410.0012","url":null,"abstract":"Deep neural networks have rightfully won the place of one of the most accurate analysis tools in high energy physics. In this paper we will cover several methods of improving the performance of a deep neural network in a classification task in an instance of top quark analysis. The approaches and recommendations will cover hyperparameter tuning, boosting on errors and AutoML algorithms applied to collider physics.","PeriodicalId":217453,"journal":{"name":"Proceedings of The 5th International Workshop on Deep Learning in Computational Physics — PoS(DLCP2021)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127754175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信