{"title":"A Long-Short Term Memory Network for Detecting CRISPR Arrays","authors":"Shantanu Deshmukh, P. Heller, Natalia Khuri","doi":"10.1109/ICMLA.2019.00114","DOIUrl":null,"url":null,"abstract":"Clustered Regularly Interspaced Short Palindromic Repeat is a pattern found in the DNA sequences of some archeal and bacterial organisms. Together with CRISPR associated genes, CRISPR arrays provide immunity against phages and other mobile exogenous elements. CRISPR-based immunity mechanism can be manipulated to perform genome editing at low cost. To improve the specificity of CRISPR-based genome editing, better software and experimental tools are needed, and accurate detection of CRISPR arrays in DNA sequences is the first step toward this goal. In this work, a CRISPR array detection pipeline, CRISPRLstm, is presented that leverages the power of artificial intelligence. More specifically, Long-Short Term Memory models are used to discriminate between valid and invalid arrays. The predictions by CRISPRLstm are better or in good agreement with other freely available tools, and CRISPRLstm outperforms Random forest classifier in identifying valid repeat sequences. CRISPRLstm predictor is publicly available as a web-based application with an interactive user interface.","PeriodicalId":436714,"journal":{"name":"2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)","volume":"110 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2019.00114","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Clustered Regularly Interspaced Short Palindromic Repeat is a pattern found in the DNA sequences of some archeal and bacterial organisms. Together with CRISPR associated genes, CRISPR arrays provide immunity against phages and other mobile exogenous elements. CRISPR-based immunity mechanism can be manipulated to perform genome editing at low cost. To improve the specificity of CRISPR-based genome editing, better software and experimental tools are needed, and accurate detection of CRISPR arrays in DNA sequences is the first step toward this goal. In this work, a CRISPR array detection pipeline, CRISPRLstm, is presented that leverages the power of artificial intelligence. More specifically, Long-Short Term Memory models are used to discriminate between valid and invalid arrays. The predictions by CRISPRLstm are better or in good agreement with other freely available tools, and CRISPRLstm outperforms Random forest classifier in identifying valid repeat sequences. CRISPRLstm predictor is publicly available as a web-based application with an interactive user interface.