{"title":"A Supervised Machine Learning Approach for Distinguishing Between Additive and Replacing Horizontal Gene Transfers","authors":"Abhijit Mondal, Misagh Kordi, Mukul S. Bansal","doi":"10.1145/3388440.3412428","DOIUrl":null,"url":null,"abstract":"Horizontal gene transfer is one of the most important drivers of microbial gene and genome evolution. Despite its central role in microbial evolution, several aspects of horizontal gene transfer remain poorly understood. In particular, transfers can be either additive or replacing depending on whether the transferred gene adds itself as a new gene in the recipient genome or replaces an existing homologous gene. However, despite recent efforts, there do not yet exist effective computational approaches for classifying inferred transfers as being additive or replacing. In this work, we address this gap by devising a novel supervised machine learning approach for classifying transfers as being either additive or replacing. Our approach is based on phylogenetic reconciliation, a standard computational technique for inferring transfers. Our classifier, named ARTra, uses as features the classifications provided by several simple reconciliation-based classification rules, along with topological information from the gene tree, and ensembles them to produce a more accurate classification. ARTra is efficient and robust and significantly improves upon the classification accuracy of the only existing computational approach for this problem. We demonstrate the accuracy of ARTra by applying it to a wide range of simulated datasets and to a large biological dataset. Our results show that ARTra performs well over a broad range of evolutionary conditions and on real data, and that it does so even when trained only on a narrow range of such conditions and only using simulated data. An open-source implementation of ARTra is freely available from https://compbio.engr.uconn.edu/software/ARTra/.","PeriodicalId":411338,"journal":{"name":"Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3388440.3412428","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Horizontal gene transfer is one of the most important drivers of microbial gene and genome evolution. Despite its central role in microbial evolution, several aspects of horizontal gene transfer remain poorly understood. In particular, transfers can be either additive or replacing depending on whether the transferred gene adds itself as a new gene in the recipient genome or replaces an existing homologous gene. However, despite recent efforts, there do not yet exist effective computational approaches for classifying inferred transfers as being additive or replacing. In this work, we address this gap by devising a novel supervised machine learning approach for classifying transfers as being either additive or replacing. Our approach is based on phylogenetic reconciliation, a standard computational technique for inferring transfers. Our classifier, named ARTra, uses as features the classifications provided by several simple reconciliation-based classification rules, along with topological information from the gene tree, and ensembles them to produce a more accurate classification. ARTra is efficient and robust and significantly improves upon the classification accuracy of the only existing computational approach for this problem. We demonstrate the accuracy of ARTra by applying it to a wide range of simulated datasets and to a large biological dataset. Our results show that ARTra performs well over a broad range of evolutionary conditions and on real data, and that it does so even when trained only on a narrow range of such conditions and only using simulated data. An open-source implementation of ARTra is freely available from https://compbio.engr.uconn.edu/software/ARTra/.