{"title":"一种灵活、准确的聚类算法。","authors":"Jakub Kára,Kyle Acheson,Adam Kirrander","doi":"10.1021/acs.jctc.4c01750","DOIUrl":null,"url":null,"abstract":"We propose an accurate clustering algorithm suitable for the varied and multidimensional data sets that correspond to temporal snapshots from on-the-fly nonadiabatic trajectory-based simulations of photoexcited dynamics. The algorithm approximates the underlying probability density function using variable kernel density estimation, with local maxima corresponding to cluster centers. Each data point is then assigned to one of the maxima by employing a maximization procedure. Finally, clusters artificially separated by minor fluctuations in the probability density are merged. The algorithm does not require parameter tuning, which ensures flexibility and reduces the risk of bias. It is tested on several synthetic data sets, where it consistently outperforms conventional clustering algorithms. As a final example, the algorithm is applied to the excited dynamics of the norbornadiene ⇌ quadricyclane (C7H8) molecular photoswitch, demonstrating how distinct reaction pathways can be identified.","PeriodicalId":45,"journal":{"name":"Journal of Chemical Theory and Computation","volume":"14 1","pages":""},"PeriodicalIF":5.7000,"publicationDate":"2025-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DONKEY: A Flexible and Accurate Algorithm for Clustering.\",\"authors\":\"Jakub Kára,Kyle Acheson,Adam Kirrander\",\"doi\":\"10.1021/acs.jctc.4c01750\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose an accurate clustering algorithm suitable for the varied and multidimensional data sets that correspond to temporal snapshots from on-the-fly nonadiabatic trajectory-based simulations of photoexcited dynamics. The algorithm approximates the underlying probability density function using variable kernel density estimation, with local maxima corresponding to cluster centers. Each data point is then assigned to one of the maxima by employing a maximization procedure. Finally, clusters artificially separated by minor fluctuations in the probability density are merged. The algorithm does not require parameter tuning, which ensures flexibility and reduces the risk of bias. It is tested on several synthetic data sets, where it consistently outperforms conventional clustering algorithms. As a final example, the algorithm is applied to the excited dynamics of the norbornadiene ⇌ quadricyclane (C7H8) molecular photoswitch, demonstrating how distinct reaction pathways can be identified.\",\"PeriodicalId\":45,\"journal\":{\"name\":\"Journal of Chemical Theory and Computation\",\"volume\":\"14 1\",\"pages\":\"\"},\"PeriodicalIF\":5.7000,\"publicationDate\":\"2025-05-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Chemical Theory and Computation\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://doi.org/10.1021/acs.jctc.4c01750\",\"RegionNum\":1,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"CHEMISTRY, PHYSICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemical Theory and Computation","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1021/acs.jctc.4c01750","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}
DONKEY: A Flexible and Accurate Algorithm for Clustering.
We propose an accurate clustering algorithm suitable for the varied and multidimensional data sets that correspond to temporal snapshots from on-the-fly nonadiabatic trajectory-based simulations of photoexcited dynamics. The algorithm approximates the underlying probability density function using variable kernel density estimation, with local maxima corresponding to cluster centers. Each data point is then assigned to one of the maxima by employing a maximization procedure. Finally, clusters artificially separated by minor fluctuations in the probability density are merged. The algorithm does not require parameter tuning, which ensures flexibility and reduces the risk of bias. It is tested on several synthetic data sets, where it consistently outperforms conventional clustering algorithms. As a final example, the algorithm is applied to the excited dynamics of the norbornadiene ⇌ quadricyclane (C7H8) molecular photoswitch, demonstrating how distinct reaction pathways can be identified.
期刊介绍:
The Journal of Chemical Theory and Computation invites new and original contributions with the understanding that, if accepted, they will not be published elsewhere. Papers reporting new theories, methodology, and/or important applications in quantum electronic structure, molecular dynamics, and statistical mechanics are appropriate for submission to this Journal. Specific topics include advances in or applications of ab initio quantum mechanics, density functional theory, design and properties of new materials, surface science, Monte Carlo simulations, solvation models, QM/MM calculations, biomolecular structure prediction, and molecular dynamics in the broadest sense including gas-phase dynamics, ab initio dynamics, biomolecular dynamics, and protein folding. The Journal does not consider papers that are straightforward applications of known methods including DFT and molecular dynamics. The Journal favors submissions that include advances in theory or methodology with applications to compelling problems.