{"title":"分子动力学模拟的混合光谱/子空间聚类","authors":"I. Syzonenko, Joshua L. Phillips","doi":"10.1145/3233547.3233595","DOIUrl":null,"url":null,"abstract":"Data clustering approaches are widely used in many domains including molecular dynamics (MD) simulation. Modern applications of clustering for MD simulation data must be capable of assessing both natively folded and disordered proteins. We compare the performance of the spectral clustering with a more recent subspace clustering approach, and a newly proposed 'hybrid' clustering algorithm which seeks to combine the useful characteristics of both methods on MD data from both protein classes. Results are analysed in terms of accuracy, stability, data density, and other properties. We conclude with what combinations of algorithms/improvements/data density will provide results that are either more accurate or more stable. We find that subspace clustering produces better results than standard spectral clustering, especially for disordered proteins and regardless of input data density or choice of affinity scaling. Additionally, our hybrid approach improves subspace results in most cases and entropic affinity scaling leads to a better performance of both spectral clustering and our hybrid approach.","PeriodicalId":131906,"journal":{"name":"Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Hybrid Spectral/Subspace Clustering of Molecular Dynamics Simulations\",\"authors\":\"I. Syzonenko, Joshua L. Phillips\",\"doi\":\"10.1145/3233547.3233595\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data clustering approaches are widely used in many domains including molecular dynamics (MD) simulation. Modern applications of clustering for MD simulation data must be capable of assessing both natively folded and disordered proteins. We compare the performance of the spectral clustering with a more recent subspace clustering approach, and a newly proposed 'hybrid' clustering algorithm which seeks to combine the useful characteristics of both methods on MD data from both protein classes. Results are analysed in terms of accuracy, stability, data density, and other properties. We conclude with what combinations of algorithms/improvements/data density will provide results that are either more accurate or more stable. We find that subspace clustering produces better results than standard spectral clustering, especially for disordered proteins and regardless of input data density or choice of affinity scaling. Additionally, our hybrid approach improves subspace results in most cases and entropic affinity scaling leads to a better performance of both spectral clustering and our hybrid approach.\",\"PeriodicalId\":131906,\"journal\":{\"name\":\"Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-08-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3233547.3233595\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3233547.3233595","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Hybrid Spectral/Subspace Clustering of Molecular Dynamics Simulations
Data clustering approaches are widely used in many domains including molecular dynamics (MD) simulation. Modern applications of clustering for MD simulation data must be capable of assessing both natively folded and disordered proteins. We compare the performance of the spectral clustering with a more recent subspace clustering approach, and a newly proposed 'hybrid' clustering algorithm which seeks to combine the useful characteristics of both methods on MD data from both protein classes. Results are analysed in terms of accuracy, stability, data density, and other properties. We conclude with what combinations of algorithms/improvements/data density will provide results that are either more accurate or more stable. We find that subspace clustering produces better results than standard spectral clustering, especially for disordered proteins and regardless of input data density or choice of affinity scaling. Additionally, our hybrid approach improves subspace results in most cases and entropic affinity scaling leads to a better performance of both spectral clustering and our hybrid approach.