Julian Hoßbach, Samuel Tovey, Tobias Ensslen, Jan C. Behrends, Christian Holm
{"title":"Peptide Classification from Statistical Analysis of Nanopore Translocation Experiments","authors":"Julian Hoßbach, Samuel Tovey, Tobias Ensslen, Jan C. Behrends, Christian Holm","doi":"arxiv-2408.14275","DOIUrl":null,"url":null,"abstract":"Protein characterization using nanopore-based devices promises to be a\nbreakthrough method in basic research, diagnostics, and analytics. Current\nresearch includes the use of machine learning to achieve this task. In this\nwork, a comprehensive statistical analysis of nanopore current signals is\nperformed and demonstrated to be sufficient for classifying up to 42 peptides\nwith 70 % accuracy. Two sets of features, the statistical moments and the\ncatch22 set, are compared both in their representations and after training\nsmall classifier neural networks. We demonstrate that complex features of the\nevents, captured in both the catch22 set and the central moments, are key in\nclassifying peptides with otherwise similar mean currents. These results\nhighlight the efficacy of purely statistical analysis of nanopore data and\nsuggest a path forward for more sophisticated classification techniques.","PeriodicalId":501040,"journal":{"name":"arXiv - PHYS - Biological Physics","volume":"39 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - PHYS - Biological Physics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.14275","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Protein characterization using nanopore-based devices promises to be a
breakthrough method in basic research, diagnostics, and analytics. Current
research includes the use of machine learning to achieve this task. In this
work, a comprehensive statistical analysis of nanopore current signals is
performed and demonstrated to be sufficient for classifying up to 42 peptides
with 70 % accuracy. Two sets of features, the statistical moments and the
catch22 set, are compared both in their representations and after training
small classifier neural networks. We demonstrate that complex features of the
events, captured in both the catch22 set and the central moments, are key in
classifying peptides with otherwise similar mean currents. These results
highlight the efficacy of purely statistical analysis of nanopore data and
suggest a path forward for more sophisticated classification techniques.