M N Asatryan, I S Shmyr, B I Timofeev, D N Shcherbinin, V G Agasaryan, T A Timofeeva, I F Ershov, E R Gerasimuk, A V Nozdracheva, T A Semenenko, D Y Logunov, A L Gintsburg
{"title":"Development, study, and comparison of models of cross-immunity to the influenza virus using statistical methods and machine learning.","authors":"M N Asatryan, I S Shmyr, B I Timofeev, D N Shcherbinin, V G Agasaryan, T A Timofeeva, I F Ershov, E R Gerasimuk, A V Nozdracheva, T A Semenenko, D Y Logunov, A L Gintsburg","doi":"10.36233/0507-4088-250","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>The World Health Organization considers the values of antibody titers in the hemagglutination inhibition assay as one of the most important criteria for assessing successful vaccination. Mathematical modeling of cross-immunity allows for identification on a real-time basis of new antigenic variants, which is of paramount importance for human health.</p><p><strong>Materials and methods: </strong>This study uses statistical methods and machine learning techniques from simple to complex: logistic regression model, random forest method, and gradient boosting. The calculations used the AAindex matrices in parallel to the Hamming distance. The calculations were carried out with different types and values of antigenic escape thresholds, on four data sets. The results were compared using common binary classification metrics.</p><p><strong>Results: </strong>Significant differentiation is shown depending on the data sets used. The best results were demonstrated by all three models for the forecast autumn season of 2022, which were preliminary trained on the February season of the same year (Auroc 0.934; 0.958; 0.956, respectively). The lowest results were obtained for the entire forecast year 2023, they were set up on data from two seasons of 2022 (Aucroc 0.614; 0.658; 0.775). The dependence of the results on the types of thresholds used and their values turned out to be insignificant. The additional use of AAindex matrices did not significantly improve the results of the models without introducing significant deterioration.</p><p><strong>Conclusion: </strong>More complex models show better results. When developing cross-immunity models, testing on a variety of data sets is important to make strong claims about their prognostic robustness.</p>","PeriodicalId":23669,"journal":{"name":"Voprosy virusologii","volume":"69 4","pages":"349-362"},"PeriodicalIF":0.0000,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Voprosy virusologii","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.36233/0507-4088-250","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction: The World Health Organization considers the values of antibody titers in the hemagglutination inhibition assay as one of the most important criteria for assessing successful vaccination. Mathematical modeling of cross-immunity allows for identification on a real-time basis of new antigenic variants, which is of paramount importance for human health.
Materials and methods: This study uses statistical methods and machine learning techniques from simple to complex: logistic regression model, random forest method, and gradient boosting. The calculations used the AAindex matrices in parallel to the Hamming distance. The calculations were carried out with different types and values of antigenic escape thresholds, on four data sets. The results were compared using common binary classification metrics.
Results: Significant differentiation is shown depending on the data sets used. The best results were demonstrated by all three models for the forecast autumn season of 2022, which were preliminary trained on the February season of the same year (Auroc 0.934; 0.958; 0.956, respectively). The lowest results were obtained for the entire forecast year 2023, they were set up on data from two seasons of 2022 (Aucroc 0.614; 0.658; 0.775). The dependence of the results on the types of thresholds used and their values turned out to be insignificant. The additional use of AAindex matrices did not significantly improve the results of the models without introducing significant deterioration.
Conclusion: More complex models show better results. When developing cross-immunity models, testing on a variety of data sets is important to make strong claims about their prognostic robustness.
期刊介绍:
The journal deals with advances in virology in Russia and abroad. It publishes papers dealing with investigations of viral diseases of man, animals and plants, the results of experimental research on different problems of general and special virology. The journal publishes materials are which promote introduction into practice of the achievements of the virological science in the eradication and incidence reduction of infectious diseases, as well as their diagnosis, treatment and prevention. The reader will find a description of new methods of investigation, new apparatus and devices.