{"title":"Physically interpretable performance metrics for clustering.","authors":"Kinjal Mondal, Jeffery B Klauda","doi":"10.1063/5.0241122","DOIUrl":null,"url":null,"abstract":"<p><p>Clustering is a type of machine learning technique, which is used to group huge amounts of data based on their similarity into separate groups or clusters. Clustering is a very important task that is nowadays used to analyze the huge and diverse amount of data coming out of molecular dynamics (MD) simulations. Typically, the data from the MD simulations in terms of their various frames in the trajectory are clustered into different groups and a representative element from each group is studied separately. Now, a very important question coming in this process is: what is the quality of the clusters that are obtained? There are several performance metrics that are available in the literature such as the silhouette index and the Davies-Bouldin Index that are often used to analyze the quality of clustering. However, most of these metrics focus on the overlap or the similarity of the clusters in the reduced dimension that is used for clustering and do not focus on the physically important properties or the parameters of the system. To address this issue, we have developed two physically interpretable scoring metrics that focus on the physical parameters of the system that we are analyzing. We have used and tested our algorithm on three different systems: (1) Ising model, (2) peptide folding and unfolding of WT HP35, (3) a protein-ligand trajectory of an enzyme and substrate, and (4) a protein-ligand dissociated trajectory. We show that the scoring metrics provide us clusters that match with our physical intuition about the systems.</p>","PeriodicalId":15313,"journal":{"name":"Journal of Chemical Physics","volume":"161 24","pages":""},"PeriodicalIF":3.1000,"publicationDate":"2024-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemical Physics","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1063/5.0241122","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Clustering is a type of machine learning technique, which is used to group huge amounts of data based on their similarity into separate groups or clusters. Clustering is a very important task that is nowadays used to analyze the huge and diverse amount of data coming out of molecular dynamics (MD) simulations. Typically, the data from the MD simulations in terms of their various frames in the trajectory are clustered into different groups and a representative element from each group is studied separately. Now, a very important question coming in this process is: what is the quality of the clusters that are obtained? There are several performance metrics that are available in the literature such as the silhouette index and the Davies-Bouldin Index that are often used to analyze the quality of clustering. However, most of these metrics focus on the overlap or the similarity of the clusters in the reduced dimension that is used for clustering and do not focus on the physically important properties or the parameters of the system. To address this issue, we have developed two physically interpretable scoring metrics that focus on the physical parameters of the system that we are analyzing. We have used and tested our algorithm on three different systems: (1) Ising model, (2) peptide folding and unfolding of WT HP35, (3) a protein-ligand trajectory of an enzyme and substrate, and (4) a protein-ligand dissociated trajectory. We show that the scoring metrics provide us clusters that match with our physical intuition about the systems.
期刊介绍:
The Journal of Chemical Physics publishes quantitative and rigorous science of long-lasting value in methods and applications of chemical physics. The Journal also publishes brief Communications of significant new findings, Perspectives on the latest advances in the field, and Special Topic issues. The Journal focuses on innovative research in experimental and theoretical areas of chemical physics, including spectroscopy, dynamics, kinetics, statistical mechanics, and quantum mechanics. In addition, topical areas such as polymers, soft matter, materials, surfaces/interfaces, and systems of biological relevance are of increasing importance.
Topical coverage includes:
Theoretical Methods and Algorithms
Advanced Experimental Techniques
Atoms, Molecules, and Clusters
Liquids, Glasses, and Crystals
Surfaces, Interfaces, and Materials
Polymers and Soft Matter
Biological Molecules and Networks.