An Wu, Yu Pan, Fuqi Zhou, Jinghui Yan, Chuanlu Liu
{"title":"A Vectorization Method Induced By Maximal Margin Classification For Persistent Diagrams","authors":"An Wu, Yu Pan, Fuqi Zhou, Jinghui Yan, Chuanlu Liu","doi":"arxiv-2407.21298","DOIUrl":null,"url":null,"abstract":"Persistent homology is an effective method for extracting topological\ninformation, represented as persistent diagrams, of spatial structure data.\nHence it is well-suited for the study of protein structures. Attempts to\nincorporate Persistent homology in machine learning methods of protein function\nprediction have resulted in several techniques for vectorizing persistent\ndiagrams. However, current vectorization methods are excessively artificial and\ncannot ensure the effective utilization of information or the rationality of\nthe methods. To address this problem, we propose a more geometrical\nvectorization method of persistent diagrams based on maximal margin\nclassification for Banach space, and additionaly propose a framework that\nutilizes topological data analysis to identify proteins with specific\nfunctions. We evaluated our vectorization method using a binary classification\ntask on proteins and compared it with the statistical methods that exhibit the\nbest performance among thirteen commonly used vectorization methods. The\nexperimental results indicate that our approach surpasses the statistical\nmethods in both robustness and precision.","PeriodicalId":501022,"journal":{"name":"arXiv - QuanBio - Biomolecules","volume":"48 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Biomolecules","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.21298","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Persistent homology is an effective method for extracting topological
information, represented as persistent diagrams, of spatial structure data.
Hence it is well-suited for the study of protein structures. Attempts to
incorporate Persistent homology in machine learning methods of protein function
prediction have resulted in several techniques for vectorizing persistent
diagrams. However, current vectorization methods are excessively artificial and
cannot ensure the effective utilization of information or the rationality of
the methods. To address this problem, we propose a more geometrical
vectorization method of persistent diagrams based on maximal margin
classification for Banach space, and additionaly propose a framework that
utilizes topological data analysis to identify proteins with specific
functions. We evaluated our vectorization method using a binary classification
task on proteins and compared it with the statistical methods that exhibit the
best performance among thirteen commonly used vectorization methods. The
experimental results indicate that our approach surpasses the statistical
methods in both robustness and precision.