{"title":"Hybrid Spectral/Subspace Clustering of Molecular Dynamics Simulations","authors":"I. Syzonenko, Joshua L. Phillips","doi":"10.1145/3233547.3233595","DOIUrl":null,"url":null,"abstract":"Data clustering approaches are widely used in many domains including molecular dynamics (MD) simulation. Modern applications of clustering for MD simulation data must be capable of assessing both natively folded and disordered proteins. We compare the performance of the spectral clustering with a more recent subspace clustering approach, and a newly proposed 'hybrid' clustering algorithm which seeks to combine the useful characteristics of both methods on MD data from both protein classes. Results are analysed in terms of accuracy, stability, data density, and other properties. We conclude with what combinations of algorithms/improvements/data density will provide results that are either more accurate or more stable. We find that subspace clustering produces better results than standard spectral clustering, especially for disordered proteins and regardless of input data density or choice of affinity scaling. Additionally, our hybrid approach improves subspace results in most cases and entropic affinity scaling leads to a better performance of both spectral clustering and our hybrid approach.","PeriodicalId":131906,"journal":{"name":"Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3233547.3233595","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Data clustering approaches are widely used in many domains including molecular dynamics (MD) simulation. Modern applications of clustering for MD simulation data must be capable of assessing both natively folded and disordered proteins. We compare the performance of the spectral clustering with a more recent subspace clustering approach, and a newly proposed 'hybrid' clustering algorithm which seeks to combine the useful characteristics of both methods on MD data from both protein classes. Results are analysed in terms of accuracy, stability, data density, and other properties. We conclude with what combinations of algorithms/improvements/data density will provide results that are either more accurate or more stable. We find that subspace clustering produces better results than standard spectral clustering, especially for disordered proteins and regardless of input data density or choice of affinity scaling. Additionally, our hybrid approach improves subspace results in most cases and entropic affinity scaling leads to a better performance of both spectral clustering and our hybrid approach.