Md Hasan Anowar , Abdullah Shamail , Xiaoyu Wang , Goce Trajcevski , Sohail Murad , Cynthia J. Jameson , Ashfaq Khokhar
{"title":"Compressing generalized trajectories of molecular motion for efficient detection of chemical interactions","authors":"Md Hasan Anowar , Abdullah Shamail , Xiaoyu Wang , Goce Trajcevski , Sohail Murad , Cynthia J. Jameson , Ashfaq Khokhar","doi":"10.1016/j.is.2024.102426","DOIUrl":null,"url":null,"abstract":"<div><p>Molecular Dynamics (MD) simulation is often used to study properties of various chemical interactions in domains such as drug discovery and development, particularly when executing real experimental studies is costly and/or unsafe. Studying the motion of trajectories of molecules/atoms generated from MD simulations provides a detailed atomic level spatial location of every atom for every time frame in the experiment. The analysis of this data leads to an atomic and molecular level understanding of interactions among the constituents of the system of interest. However, the data is extremely large and poses storage and processing challenges in the querying and analysis of associated atom level motion trajectories. We take a first step towards applying domain-specific <em>generalization</em> techniques for the data representation, subsequently used for applying trajectory compression algorithms towards reducing the storage requirements and speeding up the processing of within-distance queries over MD simulation data. We demonstrate that this <em>generalization-aware</em> compression, when applied to the dataset used in this case study, yields significant improvements in terms of data reduction and processing time without sacrificing the effectiveness of <em>within-distance</em> queries for threshold-based detection of molecular events of interest, such as the formation of <em>Hydrogen Bonds</em> (H-Bonds).</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"125 ","pages":"Article 102426"},"PeriodicalIF":3.0000,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S030643792400084X","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Molecular Dynamics (MD) simulation is often used to study properties of various chemical interactions in domains such as drug discovery and development, particularly when executing real experimental studies is costly and/or unsafe. Studying the motion of trajectories of molecules/atoms generated from MD simulations provides a detailed atomic level spatial location of every atom for every time frame in the experiment. The analysis of this data leads to an atomic and molecular level understanding of interactions among the constituents of the system of interest. However, the data is extremely large and poses storage and processing challenges in the querying and analysis of associated atom level motion trajectories. We take a first step towards applying domain-specific generalization techniques for the data representation, subsequently used for applying trajectory compression algorithms towards reducing the storage requirements and speeding up the processing of within-distance queries over MD simulation data. We demonstrate that this generalization-aware compression, when applied to the dataset used in this case study, yields significant improvements in terms of data reduction and processing time without sacrificing the effectiveness of within-distance queries for threshold-based detection of molecular events of interest, such as the formation of Hydrogen Bonds (H-Bonds).
期刊介绍:
Information systems are the software and hardware systems that support data-intensive applications. The journal Information Systems publishes articles concerning the design and implementation of languages, data models, process models, algorithms, software and hardware for information systems.
Subject areas include data management issues as presented in the principal international database conferences (e.g., ACM SIGMOD/PODS, VLDB, ICDE and ICDT/EDBT) as well as data-related issues from the fields of data mining/machine learning, information retrieval coordinated with structured data, internet and cloud data management, business process management, web semantics, visual and audio information systems, scientific computing, and data science. Implementation papers having to do with massively parallel data management, fault tolerance in practice, and special purpose hardware for data-intensive systems are also welcome. Manuscripts from application domains, such as urban informatics, social and natural science, and Internet of Things, are also welcome. All papers should highlight innovative solutions to data management problems such as new data models, performance enhancements, and show how those innovations contribute to the goals of the application.