Brandon L. Butler , Domagoj Fijan , Sharon C. Glotzer
{"title":"利用杜宾检测分子模拟中的事件变化点","authors":"Brandon L. Butler , Domagoj Fijan , Sharon C. Glotzer","doi":"10.1016/j.cpc.2024.109297","DOIUrl":null,"url":null,"abstract":"<div><p>Particle tracking is commonly used to study time-dependent behavior in many different types of physical and chemical systems involving constituents that span many length scales, including atoms, molecules, nanoparticles, granular particles, and even larger objects. Behaviors of interest studied using particle tracking information include disorder-order transitions, thermodynamic phase transitions, structural transitions, protein folding, crystallization, gelation, swarming, avalanches and fracture. A common challenge in studies of these systems involves change detection. Change point detection discerns when a temporal signal undergoes a change in distribution. These changes can be local or global, instantaneous or prolonged, obvious or subtle. Moreover, system-wide changes marking an interesting physical or chemical phenomenon (e.g. crystallization of a liquid) are often preceded by events (e.g. pre-nucleation clusters) that are localized and can occur anywhere at anytime in the system. For these reasons, detecting events in particle trajectories generated by molecular simulation is challenging and typically accomplished via <em>ad hoc</em> solutions unique to the behavior and system under study. Consequently, methods for event detection lack generality, and those used in one field are not easily used by scientists in other fields. Here we present a new Python-based tool, <span>dupin</span>, that allows for universal event detection from particle trajectory data irrespective of the system details. <span>dupin</span> works by creating a signal representing the simulation and partitioning the signal based on events (changes within the trajectory). This approach allows for studies where manual annotating of event boundaries would require a prohibitive amount of time. Furthermore, <span>dupin</span> can serve as a tool in automated and reproducible workflows. We demonstrate the application of <span>dupin</span> using three examples and discuss its applicability to a wider class of problems.</p></div><div><h3>Program summary</h3><p><em>Program Title:</em> <span>dupin</span></p><p><em>CPC Library link to program files:</em> <span>https://doi.org/10.17632/kjcn97zc46.1</span><svg><path></path></svg>%</p><p><em>Developer's repository link::</em> <span>https://github.com/glotzerlab/dupin</span><svg><path></path></svg></p><p><em>Licensing provisions:</em> BSD 3-clause</p><p><em>Programming language:</em> Python</p><p><em>Nature of problem:</em> In the field of molecular simulations, detecting structural transitions or events within trajectories can be both challenging and time-consuming for larger studies due to the requirement of a manual approach. This issue is particularly pronounced in studies involving hundreds or thousands of simulations, where manual detection and analysis of transitions become infeasible. Our goal is to develop an automated, accurate and efficient method for detecting transition points in simulation trajectories, which both saves time and aids researchers in uncovering important events and their underlying causes in various systems. Additionally, we aim to facilitate new machine learning applications to important materials problems such as predicting and designing crystallization pathways, predicting defect formation, and describing the behavior of active matter, all of which involve structural transitions occurring over time. The developed method should be applicable to offline and online detection, enabling event-dependent triggers for advanced simulation/experimental protocols and efficient processing and storing of data.</p><p><em>Solution method:</em> We develop a versatile python package called <span>dupin</span> for detecting molecular events and structural transitions in simulation trajectories. <span>dupin</span>'s workflow pipeline includes three major stages: data preprocessing, data augmentation, and detection. The components of this pipeline collectively improve the accuracy and efficiency of identifying structural changes in particle trajectory data. In data preprocessing, we generate and aggregate data into a comprehensive representation of the system. Data augmentation techniques such as feature selection and dimensionality reduction counteract the noise arising from high-dimensional data and enhance computational performance. We detect change points within the trajectory indicating transition events using a cost-based event detection method. In <span>dupin</span>, we implement two cost functions based on piecewise linear fits, which offer different levels of sensitivity to sudden shifts and changes in the signal. The package can use any cost-based detection algorithm but has a special interface for the Python package <span>ruptures</span>. Regardless of detection algorithm, we use the cost function and “elbow” detection to determine the correct number of change points. The detection scheme can be applied both offline and online, enabling real-time analysis of molecular events as simulations progress. As an example, <span>dupin</span> may be used to trigger a high frequency storage of frames within a simulation upon nucleation and subsequent solidification of a liquid into a crystal. Our method demonstrates a high degree of accuracy in detecting transition points within simulation trajectories when provided with informative descriptors. By automating the detection process, our solution enables efficient change point detection for studies with large-scale simulations.</p><p><em>Additional comments including restrictions and unusual features:</em> Our package, <span>dupin</span>, has great promise in detecting transition points within simulation trajectories with a high degree of accuracy; nonetheless, it is essential to note that it relies heavily on the selection of informative descriptors. The accuracy of the detection may be compromised if the chosen descriptors do not effectively capture the changes in a system's properties. However, this restriction can be mitigated by selecting a diverse range of descriptors and applying a feature selection tool to refine the signal.</p></div>","PeriodicalId":285,"journal":{"name":"Computer Physics Communications","volume":null,"pages":null},"PeriodicalIF":7.2000,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Change point detection of events in molecular simulations using dupin\",\"authors\":\"Brandon L. Butler , Domagoj Fijan , Sharon C. Glotzer\",\"doi\":\"10.1016/j.cpc.2024.109297\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Particle tracking is commonly used to study time-dependent behavior in many different types of physical and chemical systems involving constituents that span many length scales, including atoms, molecules, nanoparticles, granular particles, and even larger objects. Behaviors of interest studied using particle tracking information include disorder-order transitions, thermodynamic phase transitions, structural transitions, protein folding, crystallization, gelation, swarming, avalanches and fracture. A common challenge in studies of these systems involves change detection. Change point detection discerns when a temporal signal undergoes a change in distribution. These changes can be local or global, instantaneous or prolonged, obvious or subtle. Moreover, system-wide changes marking an interesting physical or chemical phenomenon (e.g. crystallization of a liquid) are often preceded by events (e.g. pre-nucleation clusters) that are localized and can occur anywhere at anytime in the system. For these reasons, detecting events in particle trajectories generated by molecular simulation is challenging and typically accomplished via <em>ad hoc</em> solutions unique to the behavior and system under study. Consequently, methods for event detection lack generality, and those used in one field are not easily used by scientists in other fields. Here we present a new Python-based tool, <span>dupin</span>, that allows for universal event detection from particle trajectory data irrespective of the system details. <span>dupin</span> works by creating a signal representing the simulation and partitioning the signal based on events (changes within the trajectory). This approach allows for studies where manual annotating of event boundaries would require a prohibitive amount of time. Furthermore, <span>dupin</span> can serve as a tool in automated and reproducible workflows. We demonstrate the application of <span>dupin</span> using three examples and discuss its applicability to a wider class of problems.</p></div><div><h3>Program summary</h3><p><em>Program Title:</em> <span>dupin</span></p><p><em>CPC Library link to program files:</em> <span>https://doi.org/10.17632/kjcn97zc46.1</span><svg><path></path></svg>%</p><p><em>Developer's repository link::</em> <span>https://github.com/glotzerlab/dupin</span><svg><path></path></svg></p><p><em>Licensing provisions:</em> BSD 3-clause</p><p><em>Programming language:</em> Python</p><p><em>Nature of problem:</em> In the field of molecular simulations, detecting structural transitions or events within trajectories can be both challenging and time-consuming for larger studies due to the requirement of a manual approach. This issue is particularly pronounced in studies involving hundreds or thousands of simulations, where manual detection and analysis of transitions become infeasible. Our goal is to develop an automated, accurate and efficient method for detecting transition points in simulation trajectories, which both saves time and aids researchers in uncovering important events and their underlying causes in various systems. Additionally, we aim to facilitate new machine learning applications to important materials problems such as predicting and designing crystallization pathways, predicting defect formation, and describing the behavior of active matter, all of which involve structural transitions occurring over time. The developed method should be applicable to offline and online detection, enabling event-dependent triggers for advanced simulation/experimental protocols and efficient processing and storing of data.</p><p><em>Solution method:</em> We develop a versatile python package called <span>dupin</span> for detecting molecular events and structural transitions in simulation trajectories. <span>dupin</span>'s workflow pipeline includes three major stages: data preprocessing, data augmentation, and detection. The components of this pipeline collectively improve the accuracy and efficiency of identifying structural changes in particle trajectory data. In data preprocessing, we generate and aggregate data into a comprehensive representation of the system. Data augmentation techniques such as feature selection and dimensionality reduction counteract the noise arising from high-dimensional data and enhance computational performance. We detect change points within the trajectory indicating transition events using a cost-based event detection method. In <span>dupin</span>, we implement two cost functions based on piecewise linear fits, which offer different levels of sensitivity to sudden shifts and changes in the signal. The package can use any cost-based detection algorithm but has a special interface for the Python package <span>ruptures</span>. Regardless of detection algorithm, we use the cost function and “elbow” detection to determine the correct number of change points. The detection scheme can be applied both offline and online, enabling real-time analysis of molecular events as simulations progress. As an example, <span>dupin</span> may be used to trigger a high frequency storage of frames within a simulation upon nucleation and subsequent solidification of a liquid into a crystal. Our method demonstrates a high degree of accuracy in detecting transition points within simulation trajectories when provided with informative descriptors. By automating the detection process, our solution enables efficient change point detection for studies with large-scale simulations.</p><p><em>Additional comments including restrictions and unusual features:</em> Our package, <span>dupin</span>, has great promise in detecting transition points within simulation trajectories with a high degree of accuracy; nonetheless, it is essential to note that it relies heavily on the selection of informative descriptors. The accuracy of the detection may be compromised if the chosen descriptors do not effectively capture the changes in a system's properties. However, this restriction can be mitigated by selecting a diverse range of descriptors and applying a feature selection tool to refine the signal.</p></div>\",\"PeriodicalId\":285,\"journal\":{\"name\":\"Computer Physics Communications\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":7.2000,\"publicationDate\":\"2024-06-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Physics Communications\",\"FirstCategoryId\":\"101\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0010465524002200\",\"RegionNum\":2,\"RegionCategory\":\"物理与天体物理\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Physics Communications","FirstCategoryId":"101","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0010465524002200","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
Change point detection of events in molecular simulations using dupin
Particle tracking is commonly used to study time-dependent behavior in many different types of physical and chemical systems involving constituents that span many length scales, including atoms, molecules, nanoparticles, granular particles, and even larger objects. Behaviors of interest studied using particle tracking information include disorder-order transitions, thermodynamic phase transitions, structural transitions, protein folding, crystallization, gelation, swarming, avalanches and fracture. A common challenge in studies of these systems involves change detection. Change point detection discerns when a temporal signal undergoes a change in distribution. These changes can be local or global, instantaneous or prolonged, obvious or subtle. Moreover, system-wide changes marking an interesting physical or chemical phenomenon (e.g. crystallization of a liquid) are often preceded by events (e.g. pre-nucleation clusters) that are localized and can occur anywhere at anytime in the system. For these reasons, detecting events in particle trajectories generated by molecular simulation is challenging and typically accomplished via ad hoc solutions unique to the behavior and system under study. Consequently, methods for event detection lack generality, and those used in one field are not easily used by scientists in other fields. Here we present a new Python-based tool, dupin, that allows for universal event detection from particle trajectory data irrespective of the system details. dupin works by creating a signal representing the simulation and partitioning the signal based on events (changes within the trajectory). This approach allows for studies where manual annotating of event boundaries would require a prohibitive amount of time. Furthermore, dupin can serve as a tool in automated and reproducible workflows. We demonstrate the application of dupin using three examples and discuss its applicability to a wider class of problems.
Program summary
Program Title:dupin
CPC Library link to program files:https://doi.org/10.17632/kjcn97zc46.1%
Nature of problem: In the field of molecular simulations, detecting structural transitions or events within trajectories can be both challenging and time-consuming for larger studies due to the requirement of a manual approach. This issue is particularly pronounced in studies involving hundreds or thousands of simulations, where manual detection and analysis of transitions become infeasible. Our goal is to develop an automated, accurate and efficient method for detecting transition points in simulation trajectories, which both saves time and aids researchers in uncovering important events and their underlying causes in various systems. Additionally, we aim to facilitate new machine learning applications to important materials problems such as predicting and designing crystallization pathways, predicting defect formation, and describing the behavior of active matter, all of which involve structural transitions occurring over time. The developed method should be applicable to offline and online detection, enabling event-dependent triggers for advanced simulation/experimental protocols and efficient processing and storing of data.
Solution method: We develop a versatile python package called dupin for detecting molecular events and structural transitions in simulation trajectories. dupin's workflow pipeline includes three major stages: data preprocessing, data augmentation, and detection. The components of this pipeline collectively improve the accuracy and efficiency of identifying structural changes in particle trajectory data. In data preprocessing, we generate and aggregate data into a comprehensive representation of the system. Data augmentation techniques such as feature selection and dimensionality reduction counteract the noise arising from high-dimensional data and enhance computational performance. We detect change points within the trajectory indicating transition events using a cost-based event detection method. In dupin, we implement two cost functions based on piecewise linear fits, which offer different levels of sensitivity to sudden shifts and changes in the signal. The package can use any cost-based detection algorithm but has a special interface for the Python package ruptures. Regardless of detection algorithm, we use the cost function and “elbow” detection to determine the correct number of change points. The detection scheme can be applied both offline and online, enabling real-time analysis of molecular events as simulations progress. As an example, dupin may be used to trigger a high frequency storage of frames within a simulation upon nucleation and subsequent solidification of a liquid into a crystal. Our method demonstrates a high degree of accuracy in detecting transition points within simulation trajectories when provided with informative descriptors. By automating the detection process, our solution enables efficient change point detection for studies with large-scale simulations.
Additional comments including restrictions and unusual features: Our package, dupin, has great promise in detecting transition points within simulation trajectories with a high degree of accuracy; nonetheless, it is essential to note that it relies heavily on the selection of informative descriptors. The accuracy of the detection may be compromised if the chosen descriptors do not effectively capture the changes in a system's properties. However, this restriction can be mitigated by selecting a diverse range of descriptors and applying a feature selection tool to refine the signal.
期刊介绍:
The focus of CPC is on contemporary computational methods and techniques and their implementation, the effectiveness of which will normally be evidenced by the author(s) within the context of a substantive problem in physics. Within this setting CPC publishes two types of paper.
Computer Programs in Physics (CPiP)
These papers describe significant computer programs to be archived in the CPC Program Library which is held in the Mendeley Data repository. The submitted software must be covered by an approved open source licence. Papers and associated computer programs that address a problem of contemporary interest in physics that cannot be solved by current software are particularly encouraged.
Computational Physics Papers (CP)
These are research papers in, but are not limited to, the following themes across computational physics and related disciplines.
mathematical and numerical methods and algorithms;
computational models including those associated with the design, control and analysis of experiments; and
algebraic computation.
Each will normally include software implementation and performance details. The software implementation should, ideally, be available via GitHub, Zenodo or an institutional repository.In addition, research papers on the impact of advanced computer architecture and special purpose computers on computing in the physical sciences and software topics related to, and of importance in, the physical sciences may be considered.