发布求助

文献互助智能选刊最新文献

Uncertainty-Aware Visualization of Biomolecular Structures

IF 2.9 4区计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computer Graphics Forum Pub Date : 2025-05-27 DOI:10.1111/cgf.70155

A. Sterzik, C. Gillmann, M. Krone, K. Lawonn

{"title":"Uncertainty-Aware Visualization of Biomolecular Structures","authors":"A. Sterzik, C. Gillmann, M. Krone, K. Lawonn","doi":"10.1111/cgf.70155","DOIUrl":null,"url":null,"abstract":"Molecular structure visualization is fundamental to molecular biology, aiding in understanding complex biological processes. While advancements in molecular visualization have greatly improved the representation of these structures, inherent uncertainties—such as inaccuracies in atomic positions or variability in secondary structure classifications—impact the accuracy of the visualizations. Uncertainty-aware visualization (UAV) emerged as a response to these challenges, integrating uncertainty into visual representations to improve data interpretation and decision-making. Despite extensive work on both molecular and uncertainty visualization (UV), there is a lack of comprehensive surveys addressing the intersection of these two fields. This paper provides a state-of-the-art review of UAV approaches for biomolecular structures. We propose a classification schema that organizes existing methods based on the type of molecule visualized, the manifestation of uncertainty, and the mapping of uncertainty to a visual representation. Using this framework, we identified research gaps and areas for future exploration in uncertainty-aware biomolecular structure visualization.The visualization of molecular structures has long been an integral part of structural molecular biology and continues to develop as a critical tool in scientific and industrial research. Since the early days of the field, molecular visualization has enabled researchers to gain an intuitive understanding of complex molecular systems that would otherwise be inaccessible through raw numerical data alone. Structural representations provide insight into the form and function of molecules, which are inherently linked. A well-known example is the lock-and-key metaphor for molecular docking, which, while simplified, illustrates the importance of molecular shape in determining biological interactions. By transforming abstract data into interpretable visual models, visualization allows scientists to identify relationships, mechanisms, and patterns essential for advancing molecular biology, drug design, and bioinformatics.Levinthal [Lev66] was the first to use known positions of atoms, obtained by early X-Ray diffraction studies [Ken61, Per64], to interactively visualize 3D structural images of small proteins. Over the years, molecular visualization has advanced significantly. The field has seen the development of diverse visualization and visual analysis techniques, driven by growing computational capabilities and the increasing complexity of biological data. This progress is reflected in comprehensive surveys such as the one by Kozlíková et al. [KKF*17], which provides a detailed overview of the state-of-the-art in molecular structure visualization.One of the inherent challenges with visualization is the potential introduction of uncertainty at any stage of the visualization pipeline, from data acquisition to the final visualization and its interpretation by the viewer [BAOL12]. There are many potential sources of uncertainty. For instance, the obtained positions of atoms can be inaccurate due to uncertainty in the data acquisition process. Another important example is the variability in molecular conformations, as represented by ensembles. Ensembles obtained from techniques such as nuclear magnetic resonance (NMR) spectroscopy illustrate a range of possible structures rather than a single, static conformation. Similarly, derived properties like the root mean square fluctuation (RMSF), which depends on atom positions during a dynamics simulation, can also be susceptible to uncertainties. Other forms of uncertainty come from computational methods such as assignments of secondary structure elements: Different classifiers that assign structural elements to parts of the amino acid sequence yield different results. Understanding these uncertainties is essential as they are directly impacting the reliability and interpretation of molecular visualizations. Coming back to the earlier example of molecular docking, uncertainties in atomic positions can severely affect the docking process. In such cases, it is not only the most probable structure that matters but also other plausible conformations, as they may influence the outcome and provide a more comprehensive understanding of molecular interactions.To address these issues and communicate associated uncertainties, the field of UAV has emerged. It focuses on integrating data and its uncertainty into a unified computation and visualization pipeline, ultimately helping users make more informed decisions. This is particularly important because visualization tends to suggest a level of accuracy rarely present in the data [JS03]. However, incorporating uncertainty information into a visualization presents its own challenges. It tends to increase the visual complexity, potentially hindering the perception of the core data [BHJ*14]. In the context of biomolecular structures, UV has been an active area of research for over 30 years, with some of the earliest examples shown in Figure 2. Despite this, existing surveys focus on either specific aspects of molecular visualization or UV in general, with no survey addressing the unique intersection of these two topics.Thus, this report provides an overview of existing UAV approaches for biomolecular structures. While this is particularly relevant to the study of biomolecules, it also holds mutual benefits for the broader visualization community. Molecular structure data is versatile, complex, and derived from a variety of sources. The techniques developed for visualizing uncertainty can enhance molecular visualization and provide valuable insights for other fields that handle complex and uncertain data. Additionally, innovations in these fields could inspire new methods and improvements in the visualization of uncertainty for biomolecular structures.To structure and provide an overview of works on UAV of biomolecular structures, we introduce a classification schema (Figure 1) that categorizes approaches by molecule type, uncertainty manifestation, and uncertainty mapping. This classification allows for a better understanding of the diverse techniques and helps identify trends and gaps in current methodologies. For example, there is a significant lack of UAV techniques for large-scale data, despite the rapid growth of structural datasets. By identifying such gaps, we highlight promising opportunities for advancing the field.Our survey's main contributions can be summarized as follows:This report is organized as follows: First, we discuss related surveys (Section 2). Next, we introduce the necessary background on molecular structure visualization (Section 3), and UAV (Section 4). Then, we discuss the scope of this paper in more detail and present our proposed taxonomy for uncertainty-aware visualization approaches for biomolecular structures (Section 5). Section 6, Section 7, and Section 8 then describes the approaches for small molecules, proteins, and nucleic acids, respectively. Afterward, we discuss the existing approaches (Section 9) and future research directions (Section 10) before we conclude this report in Section 11.While surveys on UV and biomolecular visualization individually are manifold, a comprehensive survey of the intersection of both areas is still missing in the literature. Surveys on biological or biomolecular visualization frequently acknowledge uncertainty but typically address it in a limited scope, often focusing solely on individual aspects. Kozlíková et al. [KKF*17] provide a broad overview of state-of-the-art techniques in biomolecular structure visualization. Their report includes a section on molecular dynamics visualization, covering methods for illustrating biomolecular flexibility and volumetric representations.Krone et al. [KKL*16] review techniques for visualizing biomolecular cavities, also discussing several sources of uncertainty inherent to this field. They highlight the challenges of visualizing dynamic cavities, which fluctuate over time as molecular structures are inherently dynamic. Additionally, the authors emphasize the need for visualization tools that can compare and evaluate different cavity detection methods, noting significant discrepancies—up to 200%—in volume measurements across tools. Despite these challenges, only a few tools can currently calculate uncertainties in their measurements. The increasing size and complexity of data drive the need for multiscale visualization approaches. The 2019 survey by Miao et al. [MKK*19] explores these multiscale molecular visualization methods and emphasizes the importance of overview visualizations. For this purpose, highly abstracted and/or aggregated UV techniques are featured in their review. The recent review of biomolecular visualization by Li and Wei [LW24] includes a brief section on positional UV, outlining common encodings and some newer techniques.Since molecular dynamics (MD) simulation computes the dynamic properties of molecules, this topic is closely related to molecular flexibility and positional uncertainty. Belghit et al. [BSD*24] reviewed the visualization of MD trajectories, while Corey et al. [CBC23] specifically reviewed the visualization of MD simulations for membrane systems.In 1997, Pang et al. [PWL97] developed a classification scheme for early UV approaches. It incorporates properties of the data and visualization themselves, such as its dimensionality and discrete or continuous nature. Additionally, they categorize approaches according to how they add uncertainty information to the visualizations. They identified the methods: glyphs, adding/modifying geometry, modifying attributes, animation, sonification, and psycho-visual. Brodlie et al. [BAOL12] discussed reasons for UV being a difficult problem. One major reason is the complexity of uncertainty itself, with even the terminology of uncertainty often being unclear. For biomolecular UV, for example, uncertainty is often used interchangeably with error, precision, or flexibility. Another difficulty is that uncertainty adds another dimension to a visualization. For one- or two-dimensional data, this makes the visualization more complex but is relatively easily solvable. Most biomolecular structure visualizations are inherently three-dimensional. Thus, adding another dimension—the uncertainty—becomes more challenging. Additionally, they specifically emphasize the distinction of visualization of uncertainty and uncertainty of visualization: Research usually considers the visualization of uncertainty in a dataset. However, the visualization itself—through filtering, mapping, and rendering—adds another layer of uncertainty. Bonneau et al. [BHJ*14] formally describe uncertainty and discuss UV in several fields like medical visualization or weather and climate.Ensemble visualization is closely related to the visualization of uncertainty. In fact, a common metaphor for indicating uncertainty in biomolecular structures is ensemble visualization. The ensemble visualization survey by Wang et al. [WHLS19] does not explicitly address biomolecular structures; however, several techniques for visualizing ensembles of surfaces or volumes are either currently applied to biomolecular structures or are adaptable for this purpose.Padilla et al. [PKH21] recently outlined best practices for UV, discussing cognitive theories that explain how these methods affect viewers' judgments. They primarily examined lower-dimensional (1D, 2D) UV, including error bars, icon arrays, and 2D ensemble plots. In their 2006 survey on UV, Griethe and Schumann [GS06] highlighted the scarcity of usability studies as a key gap in the field. Over a decade later, Hullman et al. [HQC*19] conducted an in-depth survey on evaluating UV, identifying several unresolved issues. Notably, they observed a prevalent research bias toward assessing performance accuracy over decision quality. They proposed guidelines for adopting transparent and valid evaluation methods.Weiskopf [Wei22] discusses UV and illustrates general concepts with examples of biological data visualization. However, his paper is not a systematic survey and because it focuses on biological data in general, only a few molecular structure visualizations are featured. Therefore, a comprehensive overview of uncertainty in biomolecular structure visualization is still missing in the literature.In this section, we provide a brief overview of molecular structure visualization. We begin with a summary of biomolecules (Section 3.1), followed by the description of various methods for acquiring biomolecular structure data (Section 3.2). Finally, we discuss molecular representation models (Section 3.3). We provide only brief overviews of these topics; for more in-depth information on biomolecules and biomolecular data acquisition, please see Nelson and Cox [NC21], Berg et al. [BTS10], and Alberts et al. [AHJ*22]. Further information on representation models and visualization can be found in the report by Kozlíková et al. [KKF*17].Uncertainty visualization (UV) focuses on explicitly representing errors or uncertainty in the data. Uncertainty-aware visualization (UAV) builds on this by integrating the visualization of uncertainty directly with the data itself, ensuring both are effectively represented [GSWS21]. Here, we focus on the latter approach, where the data and its uncertainty are visualized together. However, such visualizations are often still referred to as UV. In this section, we outline potential sources of uncertainty (Section 4.1), discuss how these map to biomolecular data (Section 4.2), introduce common mapping strategies for visualizing uncertainties (Section 4.3), and discuss specifics of uncertainty visualization for biomolecular data and commonalities with other domains (Section 4.4).Next, we specify our scope and selection process (Section 5.1) and detail our classification schema (Section 5.2).While the methods discussed in this section are illustrated with small molecules, they are not inherently restricted to them. Certain techniques, particularly those that rely on ball-and-stick representations, may not be optimal for larger molecules, though theoretically, they could still be used. On the other hand, methods employing volume visualization should be more readily applicable to various types of biomolecules.Explicit In 1999, Rheingans and Joshi were the first to write a publication explicitly about molecular UV [RJ99]. They proposed three options for visualizing molecules with positional uncertainty (Figure 2). Two methods are based on a likelihood volume computed from the ensemble. The likelihood volume stores the probability of an atom being at each voxel. They gave two options for visualizing this volume. The first volume visualization extracts transparent isosurfaces from the volume and renders them on top of the opaque ball-and-stick visualization (Figure 2b). The other volume visualization method renders the volume directly, resulting in a cloud-like appearance (Figure 2c). The third method superposes each conformer in the ensemble using its ball-and-stick representation. Each conformation can either be rendered opaquely, or semi-transparently (Figure 2a) so that more certain regions appear opaque while uncertain regions appear more transparent.Schmidt-Ehrenberg et al. [SBH02] used direct volume rendering (DVR) or opaque isosurfaces to display likelihood volumes. They focused on metastable conformations, which are states in which the molecule stays for a long time. Therefore, they visualized several metastable conformations of a molecule at once, while also displaying the likelihood volume for each of the metastable conformations separately.Implicit Best and Hege [BH02] let users cluster groups of atoms into ellipsoids in their application to reduce clutter and thus ease the detection of conformational changes. Additionally, they blended several time frames of a molecular simulation to facilitate detecting large changes. Similar to the implicit mapping proposed by Rheingans and Joshi [RJ99], Sabando et al. [SUS*21] visualized chemical compound similarity by layering semi-transparent visualizations of individual conformations. They added an option to invert the opacity. As a result, more uncertain parts of the molecule can be highlighted by appearing more opaque.Ranges Knoll et al. [KCL*13] defined the interval between the chemical bond radii and the vdW radii as an uncertainty interval over a charge density distribution. They rendered the molecules in black using the ball-and-stick representation. Then, they overlaid a visualization of an electron density volume. They used DVR with a custom transfer function. In most of their examples, they used two peaks for displaying the isolevel of chemical bonds in blue and the isolevel corresponding to the vdW radii in red. In their paper, they focused on molecular interfaces rather than biomolecules. Due to using the ball-and-stick representation, their visualization is suited to small molecules rather than larger biomolecules.Size Wagner and Himmel [WH17] used color and size to encode the RMSD between two molecular structures in a modified ball-and-stick representation. The sphere color indicates the atom pair's absolute RMSD. The sphere size is proportional to the relative contribution of the atom pair to the total RMSD.The methods developed for small molecules generally provide overview visualizations of uncertainty. They indicate the presence of uncertainty in the molecule and give a rough sense of its magnitude. However, apart from the combined color and size coding by Wagner and Himmel [WH17], making more detailed judgments is likely challenging. These methods also do not facilitate compare or integrate tasks.Proteins are the most extensively studied biomolecules for UV, with visualizations available for all types of uncertainty manifestations and mappings according to our schema. Similar to small molecules, many methods for UV apply to other biomolecule types. Nonetheless, certain techniques are specific to proteins, such as the visualization of secondary structures or cavities.Most papers about nucleic acids focus on visualizing RNA. Only two papers specifically visualized DNA.In the following, we will discuss patterns and trends of UV techniques for biomolecular structures. Figure 12 shows a parallel sets plot [KBH06] of the distribution and correlation of the collected literature's attributes along our schemas main dimensions. In the visualization, the dimensions—molecule type, manifestation, and mapping—are represented with consistent total contributions across layers. However, papers often vary in the number of applicable categories per dimension. We distribute each paper's contribution equally among the applicable categories within a given dimension. For example, if a paper employs three mapping strategies, each mapping receives a proportional width of 1/3 of the paper's total contribution. Conversely, a paper with only one mapping allocates its full contribution to that mapping. This strategy ensures that each paper contributes equally to the total width per dimension.In the previous chapter, we explored trends and outlined various challenges in the field. Here, we narrow our focus and highlight a few areas we consider the most important for future research.Complex and Multiscale Systems Advances in imaging technologies enable the visualization of increasingly complex systems. While some approaches exist for visualizing systems of interacting molecules, such as protein-ligand interactions, there is significant room for improvement. UAV for even larger systems, such as mesoscale structures or visualizations spanning multiple scales, remain largely unexplored. Methods that work well for small to medium-sized molecular structures often become cluttered or impractical when applied to more complex systems. This highlights the need for approaches that can effectively manage the scale and complexity of these data. A potential solution is adopting more abstract and illustrative visualization techniques. These approaches could offer clearer, more interpretable visualizations while addressing the unique challenges of complex molecular systems.Conformational Space While there is already a substantial amount of literature visualizing the conformational space, the problem remains challenging, particularly for larger molecules, as existing techniques often become cluttered and difficult to interpret. Initial efforts to enhance these visualizations were undertaken by Melvin et al. [MS16], who employed focus and context, as well as a deliberately selected subset of conformations, to reduce visual complexity. Nevertheless, easily comprehensible visualizations of the entire conformational space remain scarce. Similar to the visualization of complex systems, the volume and complexity of the data can be overwhelming. However, the multidimensional nature of conformational spaces introduces additional challenges. As with complex systems, abstraction and illustrative techniques offer promising solutions. Simplified representations that emphasize key states or transitions could enhance clarity, while interactive tools for exploring specific regions of the conformational landscape could improve usability. Alternatively, animations—similar to HOPs [Hul16]—could dynamically represent conformational changes, reducing the need to convey all information within a single, static visualization.Comparison Techniques Comparing molecular configurations and their uncertainties typically involves side-by-side visualizations or simple overlays. These methods are effective for small datasets but become impractical as the number of molecules increases. Overlays face challenges similar to visualizing conformational space, such as becoming cluttered and difficult to interpret. This issue is especially problematic when comparing dynamic molecules. Ensembles of molecular dynamics (MD) simulations have become crucial for addressing the lack of reproducibility in individual simulations [WSC21]. However, MD ensembles generate vast amounts of data, requiring new visualization techniques capable of handling this complexity without overwhelming users. This challenge was also highlighted by Belghit et al. [BSD*24]. This scalability problem is not limited to biomolecular systems; it also affects broader ensemble visualization methods [WHLS19]. Advancements in both fields could offer mutual benefits, leading to better solutions for visualizing large, complex datasets.Nucleic Acids Despite being the focus of numerous current research topics [WEFC21,BJZ*22, WD23], nucleic acids are often underrepresented in terms of specialized UV techniques. Current visualizations primarily focus on base pair probabilities and rely on relatively simple methods. While these approaches provide some insight into nucleic acid structure, they are limited in scope and fail to capture the full complexity of these molecules. Comprehensive 3D visualizations, which could provide a more complete representation of nucleic acid structures, are largely absent. Additionally, visualizing the conformational space of nucleic acids remains a significant gap despite its importance for understanding molecular dynamics, interactions, and function.Evaluation UV techniques for biomolecular structures are not being evaluated rigorously enough. For example, it is crucial to determine which methods are most intuitive for viewers, how effectively they support the inference of quantitative values, and whether they accurately represent the conformational space of biomolecules. Since humans often reason in non-intuitive ways when interpreting uncertainty [PKH21], it is essential to rigorously evaluate uncertainty visualizations to ensure they effectively support accurate understanding and decision-making. Evaluating UV methods is inherently challenging [Hul16], even for simple 1D and 2D visualizations. Incorporating and evaluating uncertainty in the context of 3D biomolecular structure visualizations is even more complex. However, addressing this gap is essential. It will enable practitioners to choose the most suitable methods for specific problems, identify shortcomings in existing techniques, and drive the development of new methods that better meet the demands of UAV.In this report, we presented UAV approaches for biomolecular structures, classifying the methods along three primary dimensions: molecule type, manifestation of uncertainty, and mapping strategy. While these methods demonstrate strategies for representing uncertainty for individual molecules, they often fall short of addressing the demands of modern biomolecular research. The constantly increasing size and complexity of biomolecular datasets require visualization techniques that scale effectively and remain interpretable. Current approaches frequently appear cluttered even for small to medium-sized molecules, making them unsuitable for larger systems such as mesoscale data or ensembles of dynamic structures. These gaps highlight the need for innovative visualization strategies that can handle the complexity and scale of modern biomolecular data. Advancing UAV methods is essential for improving how researchers interpret and interact with complex biomolecular data, enabling clearer insights and more informed decision-making. Given the parallels to other domains, such as general uncertainty and ensemble visualization, advancements in UAV for biomolecular structures are likely to have a broader impact, benefiting the entire field of UAV.We want to thank Frank Cordes, Ute Hellmich, Philipp Schnee, Philipp Thiel, Markus Weber, and Christoph Wiedemann for the insightful discussions on UV for biomolecular structures. We would also like to thank Katarína Furmanová, Barbora Kozlíková, and Robin Skånberg for sharing images with us for this report. This work was partially funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) — Project-ID 437702916. Open Access funding enabled and organized by Projekt DEAL.","PeriodicalId":10687,"journal":{"name":"Computer Graphics Forum","volume":"44 3","pages":""},"PeriodicalIF":2.9000,"publicationDate":"2025-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/cgf.70155","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Graphics Forum","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/cgf.70155","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Molecular structure visualization is fundamental to molecular biology, aiding in understanding complex biological processes. While advancements in molecular visualization have greatly improved the representation of these structures, inherent uncertainties—such as inaccuracies in atomic positions or variability in secondary structure classifications—impact the accuracy of the visualizations. Uncertainty-aware visualization (UAV) emerged as a response to these challenges, integrating uncertainty into visual representations to improve data interpretation and decision-making. Despite extensive work on both molecular and uncertainty visualization (UV), there is a lack of comprehensive surveys addressing the intersection of these two fields. This paper provides a state-of-the-art review of UAV approaches for biomolecular structures. We propose a classification schema that organizes existing methods based on the type of molecule visualized, the manifestation of uncertainty, and the mapping of uncertainty to a visual representation. Using this framework, we identified research gaps and areas for future exploration in uncertainty-aware biomolecular structure visualization.

The visualization of molecular structures has long been an integral part of structural molecular biology and continues to develop as a critical tool in scientific and industrial research. Since the early days of the field, molecular visualization has enabled researchers to gain an intuitive understanding of complex molecular systems that would otherwise be inaccessible through raw numerical data alone. Structural representations provide insight into the form and function of molecules, which are inherently linked. A well-known example is the lock-and-key metaphor for molecular docking, which, while simplified, illustrates the importance of molecular shape in determining biological interactions. By transforming abstract data into interpretable visual models, visualization allows scientists to identify relationships, mechanisms, and patterns essential for advancing molecular biology, drug design, and bioinformatics.

Levinthal [Lev66] was the first to use known positions of atoms, obtained by early X-Ray diffraction studies [Ken61, Per64], to interactively visualize 3D structural images of small proteins. Over the years, molecular visualization has advanced significantly. The field has seen the development of diverse visualization and visual analysis techniques, driven by growing computational capabilities and the increasing complexity of biological data. This progress is reflected in comprehensive surveys such as the one by Kozlíková et al. [KKF*17], which provides a detailed overview of the state-of-the-art in molecular structure visualization.

One of the inherent challenges with visualization is the potential introduction of uncertainty at any stage of the visualization pipeline, from data acquisition to the final visualization and its interpretation by the viewer [BAOL12]. There are many potential sources of uncertainty. For instance, the obtained positions of atoms can be inaccurate due to uncertainty in the data acquisition process. Another important example is the variability in molecular conformations, as represented by ensembles. Ensembles obtained from techniques such as nuclear magnetic resonance (NMR) spectroscopy illustrate a range of possible structures rather than a single, static conformation. Similarly, derived properties like the root mean square fluctuation (RMSF), which depends on atom positions during a dynamics simulation, can also be susceptible to uncertainties. Other forms of uncertainty come from computational methods such as assignments of secondary structure elements: Different classifiers that assign structural elements to parts of the amino acid sequence yield different results. Understanding these uncertainties is essential as they are directly impacting the reliability and interpretation of molecular visualizations. Coming back to the earlier example of molecular docking, uncertainties in atomic positions can severely affect the docking process. In such cases, it is not only the most probable structure that matters but also other plausible conformations, as they may influence the outcome and provide a more comprehensive understanding of molecular interactions.

To address these issues and communicate associated uncertainties, the field of UAV has emerged. It focuses on integrating data and its uncertainty into a unified computation and visualization pipeline, ultimately helping users make more informed decisions. This is particularly important because visualization tends to suggest a level of accuracy rarely present in the data [JS03]. However, incorporating uncertainty information into a visualization presents its own challenges. It tends to increase the visual complexity, potentially hindering the perception of the core data [BHJ*14]. In the context of biomolecular structures, UV has been an active area of research for over 30 years, with some of the earliest examples shown in Figure 2. Despite this, existing surveys focus on either specific aspects of molecular visualization or UV in general, with no survey addressing the unique intersection of these two topics.

Thus, this report provides an overview of existing UAV approaches for biomolecular structures. While this is particularly relevant to the study of biomolecules, it also holds mutual benefits for the broader visualization community. Molecular structure data is versatile, complex, and derived from a variety of sources. The techniques developed for visualizing uncertainty can enhance molecular visualization and provide valuable insights for other fields that handle complex and uncertain data. Additionally, innovations in these fields could inspire new methods and improvements in the visualization of uncertainty for biomolecular structures.

To structure and provide an overview of works on UAV of biomolecular structures, we introduce a classification schema (Figure 1) that categorizes approaches by molecule type, uncertainty manifestation, and uncertainty mapping. This classification allows for a better understanding of the diverse techniques and helps identify trends and gaps in current methodologies. For example, there is a significant lack of UAV techniques for large-scale data, despite the rapid growth of structural datasets. By identifying such gaps, we highlight promising opportunities for advancing the field.

Our survey's main contributions can be summarized as follows:

This report is organized as follows: First, we discuss related surveys (Section 2). Next, we introduce the necessary background on molecular structure visualization (Section 3), and UAV (Section 4). Then, we discuss the scope of this paper in more detail and present our proposed taxonomy for uncertainty-aware visualization approaches for biomolecular structures (Section 5). Section 6, Section 7, and Section 8 then describes the approaches for small molecules, proteins, and nucleic acids, respectively. Afterward, we discuss the existing approaches (Section 9) and future research directions (Section 10) before we conclude this report in Section 11.

While surveys on UV and biomolecular visualization individually are manifold, a comprehensive survey of the intersection of both areas is still missing in the literature. Surveys on biological or biomolecular visualization frequently acknowledge uncertainty but typically address it in a limited scope, often focusing solely on individual aspects. Kozlíková et al. [KKF*17] provide a broad overview of state-of-the-art techniques in biomolecular structure visualization. Their report includes a section on molecular dynamics visualization, covering methods for illustrating biomolecular flexibility and volumetric representations.

Krone et al. [KKL*16] review techniques for visualizing biomolecular cavities, also discussing several sources of uncertainty inherent to this field. They highlight the challenges of visualizing dynamic cavities, which fluctuate over time as molecular structures are inherently dynamic. Additionally, the authors emphasize the need for visualization tools that can compare and evaluate different cavity detection methods, noting significant discrepancies—up to 200%—in volume measurements across tools. Despite these challenges, only a few tools can currently calculate uncertainties in their measurements. The increasing size and complexity of data drive the need for multiscale visualization approaches. The 2019 survey by Miao et al. [MKK*19] explores these multiscale molecular visualization methods and emphasizes the importance of overview visualizations. For this purpose, highly abstracted and/or aggregated UV techniques are featured in their review. The recent review of biomolecular visualization by Li and Wei [LW24] includes a brief section on positional UV, outlining common encodings and some newer techniques.

Since molecular dynamics (MD) simulation computes the dynamic properties of molecules, this topic is closely related to molecular flexibility and positional uncertainty. Belghit et al. [BSD*24] reviewed the visualization of MD trajectories, while Corey et al. [CBC23] specifically reviewed the visualization of MD simulations for membrane systems.

In 1997, Pang et al. [PWL97] developed a classification scheme for early UV approaches. It incorporates properties of the data and visualization themselves, such as its dimensionality and discrete or continuous nature. Additionally, they categorize approaches according to how they add uncertainty information to the visualizations. They identified the methods: glyphs, adding/modifying geometry, modifying attributes, animation, sonification, and psycho-visual. Brodlie et al. [BAOL12] discussed reasons for UV being a difficult problem. One major reason is the complexity of uncertainty itself, with even the terminology of uncertainty often being unclear. For biomolecular UV, for example, uncertainty is often used interchangeably with error, precision, or flexibility. Another difficulty is that uncertainty adds another dimension to a visualization. For one- or two-dimensional data, this makes the visualization more complex but is relatively easily solvable. Most biomolecular structure visualizations are inherently three-dimensional. Thus, adding another dimension—the uncertainty—becomes more challenging. Additionally, they specifically emphasize the distinction of visualization of uncertainty and uncertainty of visualization: Research usually considers the visualization of uncertainty in a dataset. However, the visualization itself—through filtering, mapping, and rendering—adds another layer of uncertainty. Bonneau et al. [BHJ*14] formally describe uncertainty and discuss UV in several fields like medical visualization or weather and climate.

Ensemble visualization is closely related to the visualization of uncertainty. In fact, a common metaphor for indicating uncertainty in biomolecular structures is ensemble visualization. The ensemble visualization survey by Wang et al. [WHLS19] does not explicitly address biomolecular structures; however, several techniques for visualizing ensembles of surfaces or volumes are either currently applied to biomolecular structures or are adaptable for this purpose.

Padilla et al. [PKH21] recently outlined best practices for UV, discussing cognitive theories that explain how these methods affect viewers' judgments. They primarily examined lower-dimensional (1D, 2D) UV, including error bars, icon arrays, and 2D ensemble plots. In their 2006 survey on UV, Griethe and Schumann [GS06] highlighted the scarcity of usability studies as a key gap in the field. Over a decade later, Hullman et al. [HQC*19] conducted an in-depth survey on evaluating UV, identifying several unresolved issues. Notably, they observed a prevalent research bias toward assessing performance accuracy over decision quality. They proposed guidelines for adopting transparent and valid evaluation methods.

Weiskopf [Wei22] discusses UV and illustrates general concepts with examples of biological data visualization. However, his paper is not a systematic survey and because it focuses on biological data in general, only a few molecular structure visualizations are featured. Therefore, a comprehensive overview of uncertainty in biomolecular structure visualization is still missing in the literature.

In this section, we provide a brief overview of molecular structure visualization. We begin with a summary of biomolecules (Section 3.1), followed by the description of various methods for acquiring biomolecular structure data (Section 3.2). Finally, we discuss molecular representation models (Section 3.3). We provide only brief overviews of these topics; for more in-depth information on biomolecules and biomolecular data acquisition, please see Nelson and Cox [NC21], Berg et al. [BTS10], and Alberts et al. [AHJ*22]. Further information on representation models and visualization can be found in the report by Kozlíková et al. [KKF*17].

Uncertainty visualization (UV) focuses on explicitly representing errors or uncertainty in the data. Uncertainty-aware visualization (UAV) builds on this by integrating the visualization of uncertainty directly with the data itself, ensuring both are effectively represented [GSWS21]. Here, we focus on the latter approach, where the data and its uncertainty are visualized together. However, such visualizations are often still referred to as UV. In this section, we outline potential sources of uncertainty (Section 4.1), discuss how these map to biomolecular data (Section 4.2), introduce common mapping strategies for visualizing uncertainties (Section 4.3), and discuss specifics of uncertainty visualization for biomolecular data and commonalities with other domains (Section 4.4).

Next, we specify our scope and selection process (Section 5.1) and detail our classification schema (Section 5.2).

While the methods discussed in this section are illustrated with small molecules, they are not inherently restricted to them. Certain techniques, particularly those that rely on ball-and-stick representations, may not be optimal for larger molecules, though theoretically, they could still be used. On the other hand, methods employing volume visualization should be more readily applicable to various types of biomolecules.

Explicit In 1999, Rheingans and Joshi were the first to write a publication explicitly about molecular UV [RJ99]. They proposed three options for visualizing molecules with positional uncertainty (Figure 2). Two methods are based on a likelihood volume computed from the ensemble. The likelihood volume stores the probability of an atom being at each voxel. They gave two options for visualizing this volume. The first volume visualization extracts transparent isosurfaces from the volume and renders them on top of the opaque ball-and-stick visualization (Figure 2b). The other volume visualization method renders the volume directly, resulting in a cloud-like appearance (Figure 2c). The third method superposes each conformer in the ensemble using its ball-and-stick representation. Each conformation can either be rendered opaquely, or semi-transparently (Figure 2a) so that more certain regions appear opaque while uncertain regions appear more transparent.

Schmidt-Ehrenberg et al. [SBH02] used direct volume rendering (DVR) or opaque isosurfaces to display likelihood volumes. They focused on metastable conformations, which are states in which the molecule stays for a long time. Therefore, they visualized several metastable conformations of a molecule at once, while also displaying the likelihood volume for each of the metastable conformations separately.

Implicit Best and Hege [BH02] let users cluster groups of atoms into ellipsoids in their application to reduce clutter and thus ease the detection of conformational changes. Additionally, they blended several time frames of a molecular simulation to facilitate detecting large changes. Similar to the implicit mapping proposed by Rheingans and Joshi [RJ99], Sabando et al. [SUS*21] visualized chemical compound similarity by layering semi-transparent visualizations of individual conformations. They added an option to invert the opacity. As a result, more uncertain parts of the molecule can be highlighted by appearing more opaque.

Ranges Knoll et al. [KCL*13] defined the interval between the chemical bond radii and the vdW radii as an uncertainty interval over a charge density distribution. They rendered the molecules in black using the ball-and-stick representation. Then, they overlaid a visualization of an electron density volume. They used DVR with a custom transfer function. In most of their examples, they used two peaks for displaying the isolevel of chemical bonds in blue and the isolevel corresponding to the vdW radii in red. In their paper, they focused on molecular interfaces rather than biomolecules. Due to using the ball-and-stick representation, their visualization is suited to small molecules rather than larger biomolecules.

Size Wagner and Himmel [WH17] used color and size to encode the RMSD between two molecular structures in a modified ball-and-stick representation. The sphere color indicates the atom pair's absolute RMSD. The sphere size is proportional to the relative contribution of the atom pair to the total RMSD.

The methods developed for small molecules generally provide overview visualizations of uncertainty. They indicate the presence of uncertainty in the molecule and give a rough sense of its magnitude. However, apart from the combined color and size coding by Wagner and Himmel [WH17], making more detailed judgments is likely challenging. These methods also do not facilitate compare or integrate tasks.

Proteins are the most extensively studied biomolecules for UV, with visualizations available for all types of uncertainty manifestations and mappings according to our schema. Similar to small molecules, many methods for UV apply to other biomolecule types. Nonetheless, certain techniques are specific to proteins, such as the visualization of secondary structures or cavities.

Most papers about nucleic acids focus on visualizing RNA. Only two papers specifically visualized DNA.

In the following, we will discuss patterns and trends of UV techniques for biomolecular structures. Figure 12 shows a parallel sets plot [KBH06] of the distribution and correlation of the collected literature's attributes along our schemas main dimensions. In the visualization, the dimensions—molecule type, manifestation, and mapping—are represented with consistent total contributions across layers. However, papers often vary in the number of applicable categories per dimension. We distribute each paper's contribution equally among the applicable categories within a given dimension. For example, if a paper employs three mapping strategies, each mapping receives a proportional width of 1/3 of the paper's total contribution. Conversely, a paper with only one mapping allocates its full contribution to that mapping. This strategy ensures that each paper contributes equally to the total width per dimension.

In the previous chapter, we explored trends and outlined various challenges in the field. Here, we narrow our focus and highlight a few areas we consider the most important for future research.

Complex and Multiscale Systems Advances in imaging technologies enable the visualization of increasingly complex systems. While some approaches exist for visualizing systems of interacting molecules, such as protein-ligand interactions, there is significant room for improvement. UAV for even larger systems, such as mesoscale structures or visualizations spanning multiple scales, remain largely unexplored. Methods that work well for small to medium-sized molecular structures often become cluttered or impractical when applied to more complex systems. This highlights the need for approaches that can effectively manage the scale and complexity of these data. A potential solution is adopting more abstract and illustrative visualization techniques. These approaches could offer clearer, more interpretable visualizations while addressing the unique challenges of complex molecular systems.

Conformational Space While there is already a substantial amount of literature visualizing the conformational space, the problem remains challenging, particularly for larger molecules, as existing techniques often become cluttered and difficult to interpret. Initial efforts to enhance these visualizations were undertaken by Melvin et al. [MS16], who employed focus and context, as well as a deliberately selected subset of conformations, to reduce visual complexity. Nevertheless, easily comprehensible visualizations of the entire conformational space remain scarce. Similar to the visualization of complex systems, the volume and complexity of the data can be overwhelming. However, the multidimensional nature of conformational spaces introduces additional challenges. As with complex systems, abstraction and illustrative techniques offer promising solutions. Simplified representations that emphasize key states or transitions could enhance clarity, while interactive tools for exploring specific regions of the conformational landscape could improve usability. Alternatively, animations—similar to HOPs [Hul16]—could dynamically represent conformational changes, reducing the need to convey all information within a single, static visualization.

Comparison Techniques Comparing molecular configurations and their uncertainties typically involves side-by-side visualizations or simple overlays. These methods are effective for small datasets but become impractical as the number of molecules increases. Overlays face challenges similar to visualizing conformational space, such as becoming cluttered and difficult to interpret. This issue is especially problematic when comparing dynamic molecules. Ensembles of molecular dynamics (MD) simulations have become crucial for addressing the lack of reproducibility in individual simulations [WSC21]. However, MD ensembles generate vast amounts of data, requiring new visualization techniques capable of handling this complexity without overwhelming users. This challenge was also highlighted by Belghit et al. [BSD*24]. This scalability problem is not limited to biomolecular systems; it also affects broader ensemble visualization methods [WHLS19]. Advancements in both fields could offer mutual benefits, leading to better solutions for visualizing large, complex datasets.

Nucleic Acids Despite being the focus of numerous current research topics [WEFC21,BJZ*22, WD23], nucleic acids are often underrepresented in terms of specialized UV techniques. Current visualizations primarily focus on base pair probabilities and rely on relatively simple methods. While these approaches provide some insight into nucleic acid structure, they are limited in scope and fail to capture the full complexity of these molecules. Comprehensive 3D visualizations, which could provide a more complete representation of nucleic acid structures, are largely absent. Additionally, visualizing the conformational space of nucleic acids remains a significant gap despite its importance for understanding molecular dynamics, interactions, and function.

Evaluation UV techniques for biomolecular structures are not being evaluated rigorously enough. For example, it is crucial to determine which methods are most intuitive for viewers, how effectively they support the inference of quantitative values, and whether they accurately represent the conformational space of biomolecules. Since humans often reason in non-intuitive ways when interpreting uncertainty [PKH21], it is essential to rigorously evaluate uncertainty visualizations to ensure they effectively support accurate understanding and decision-making. Evaluating UV methods is inherently challenging [Hul16], even for simple 1D and 2D visualizations. Incorporating and evaluating uncertainty in the context of 3D biomolecular structure visualizations is even more complex. However, addressing this gap is essential. It will enable practitioners to choose the most suitable methods for specific problems, identify shortcomings in existing techniques, and drive the development of new methods that better meet the demands of UAV.

In this report, we presented UAV approaches for biomolecular structures, classifying the methods along three primary dimensions: molecule type, manifestation of uncertainty, and mapping strategy. While these methods demonstrate strategies for representing uncertainty for individual molecules, they often fall short of addressing the demands of modern biomolecular research. The constantly increasing size and complexity of biomolecular datasets require visualization techniques that scale effectively and remain interpretable. Current approaches frequently appear cluttered even for small to medium-sized molecules, making them unsuitable for larger systems such as mesoscale data or ensembles of dynamic structures. These gaps highlight the need for innovative visualization strategies that can handle the complexity and scale of modern biomolecular data. Advancing UAV methods is essential for improving how researchers interpret and interact with complex biomolecular data, enabling clearer insights and more informed decision-making. Given the parallels to other domains, such as general uncertainty and ensemble visualization, advancements in UAV for biomolecular structures are likely to have a broader impact, benefiting the entire field of UAV.

We want to thank Frank Cordes, Ute Hellmich, Philipp Schnee, Philipp Thiel, Markus Weber, and Christoph Wiedemann for the insightful discussions on UV for biomolecular structures. We would also like to thank Katarína Furmanová, Barbora Kozlíková, and Robin Skånberg for sharing images with us for this report. This work was partially funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) — Project-ID 437702916. Open Access funding enabled and organized by Projekt DEAL.

Abstract Image

查看原文本刊更多论文

生物分子结构的不确定性感知可视化

分子结构可视化是分子生物学的基础，有助于理解复杂的生物过程。虽然分子可视化技术的进步极大地改善了这些结构的表示，但固有的不确定性——例如原子位置的不准确性或二级结构分类的可变性——影响了可视化的准确性。不确定性感知可视化（UAV）作为对这些挑战的回应而出现，将不确定性集成到视觉表示中，以改进数据解释和决策。尽管在分子和不确定性可视化（UV）方面做了大量的工作，但缺乏针对这两个领域交叉的全面调查。本文综述了无人机在生物分子结构研究中的最新进展。我们提出了一种分类模式，该模式基于可视化分子的类型、不确定性的表现形式以及不确定性到视觉表示的映射来组织现有的方法。利用这一框架，我们确定了在不确定性感知生物分子结构可视化方面的研究空白和未来探索领域。分子结构可视化长期以来一直是结构分子生物学的重要组成部分，并不断发展成为科学和工业研究的重要工具。从该领域的早期开始，分子可视化使研究人员能够直观地了解复杂的分子系统，否则仅通过原始数值数据是无法获得的。结构表征提供了对分子的形式和功能的洞察，它们是内在联系的。一个众所周知的例子是分子对接的锁和钥匙比喻，虽然简化了，但它说明了分子形状在决定生物相互作用中的重要性。通过将抽象数据转换为可解释的可视化模型，可视化使科学家能够识别关系、机制和模式，这些对于推进分子生物学、药物设计和生物信息学至关重要。Levinthal [Lev66]是第一个使用由早期x射线衍射研究[Ken61， Per64]获得的已知原子位置来交互式可视化小蛋白质的3D结构图像的人。近年来，分子可视化技术取得了显著进展。在不断增长的计算能力和日益复杂的生物数据的推动下，该领域已经看到了各种可视化和可视化分析技术的发展。这一进展反映在全面的调查中，如Kozlíková等人的调查[KKF*17]，该调查提供了最新分子结构可视化的详细概述。可视化的固有挑战之一是在可视化管道的任何阶段（从数据采集到最终的可视化以及观看者对其的解释）都可能引入不确定性[BAOL12]。不确定性的潜在来源有很多。例如，由于数据采集过程中的不确定性，所获得的原子位置可能是不准确的。另一个重要的例子是分子构象的可变性，如系综所代表的。从核磁共振（NMR）光谱学等技术获得的集合说明了一系列可能的结构，而不是单一的静态构象。类似地，在动力学模拟过程中依赖于原子位置的均方根波动（RMSF）等衍生属性也可能容易受到不确定性的影响。其他形式的不确定性来自计算方法，如二级结构元素的分配：将结构元素分配给氨基酸序列的不同分类器产生不同的结果。了解这些不确定性至关重要，因为它们直接影响分子可视化的可靠性和解释。回到前面分子对接的例子，原子位置的不确定性会严重影响对接过程。在这种情况下，重要的不仅是最可能的结构，还有其他可能的构象，因为它们可能影响结果，并提供对分子相互作用的更全面的理解。为了解决这些问题并传达相关的不确定性，无人机领域应运而生。它专注于将数据及其不确定性集成到统一的计算和可视化管道中，最终帮助用户做出更明智的决策。这一点尤其重要，因为可视化倾向于表明数据中很少存在的精确度[JS03]。然而，将不确定性信息整合到可视化中有其自身的挑战。它往往会增加视觉复杂性，潜在地阻碍对核心数据的感知[BHJ*14]。在生物分子结构的背景下，紫外线已经成为一个活跃的研究领域超过30年，一些最早的例子如图2所示。尽管如此，现有的调查要么集中在分子可视化的特定方面，要么集中在一般的紫外线上，没有调查解决这两个主题的独特交集。因此，本报告概述了现有的无人机生物分子结构方法。虽然这与生物分子的研究特别相关，但它也为更广泛的可视化社区带来了互利。分子结构数据是通用的，复杂的，并且来源于各种来源。为可视化不确定性而开发的技术可以增强分子可视化，并为处理复杂和不确定数据的其他领域提供有价值的见解。此外，这些领域的创新可以激发新的方法和改进生物分子结构的不确定性可视化。为了构建和概述生物分子结构无人机的工作，我们引入了一个分类模式（图1），该模式根据分子类型、不确定性表现和不确定性映射对方法进行了分类。这种分类有助于更好地理解各种技术，并有助于确定当前方法中的趋势和差距。例如，尽管结构数据集快速增长，但大规模数据的无人机技术明显缺乏。通过确定这些差距，我们强调了推进该领域的有希望的机会。我们的调查的主要贡献可以概括如下：本报告的组织如下：首先，我们讨论了相关的调查（第2节）。接下来，我们将介绍分子结构可视化（第3节）和无人机（第4节）的必要背景。然后，我们更详细地讨论了本文的范围，并提出了我们提出的生物分子结构不确定性感知可视化方法的分类（第5节）。第6节、第7节和第8节分别描述了小分子、蛋白质和核酸的方法。之后，我们讨论了现有的方法（第9节）和未来的研究方向（第10节），然后在第11节结束本报告。虽然对UV和生物分子可视化的调查是多方面的，但文献中仍然缺乏对这两个领域交叉的全面调查。对生物或生物分子可视化的调查经常承认不确定性，但通常在有限的范围内解决它，通常只关注单个方面。Kozlíková等人[KKF*17]提供了生物分子结构可视化技术的广泛概述。他们的报告包括分子动力学可视化部分，涵盖了说明生物分子灵活性和体积表示的方法。Krone等[KKL*16]综述了生物分子空腔可视化技术，并讨论了该领域固有的几个不确定性来源。他们强调了可视化动态腔的挑战，动态腔随着时间的推移而波动，因为分子结构本身就是动态的。此外，作者强调需要可视化工具来比较和评估不同的空腔检测方法，注意到不同工具在体积测量上的显着差异（高达200%）。尽管存在这些挑战，但目前只有少数工具可以计算其测量中的不确定性。不断增长的数据规模和复杂性推动了对多尺度可视化方法的需求。Miao等人[MKK*19]在2019年的调查中探讨了这些多尺度分子可视化方法，并强调了概述可视化的重要性。为此，高度抽象和/或聚合的UV技术在他们的评论中具有特色。Li和Wei [LW24]最近对生物分子可视化的综述包括位置UV的简要部分，概述了常见的编码和一些较新的技术。由于分子动力学（MD）模拟计算的是分子的动力学性质，因此这一课题与分子的柔韧性和位置不确定性密切相关。Belghit等人[BSD*24]回顾了MD轨迹的可视化，而Corey等人[CBC23]专门回顾了膜系统MD模拟的可视化。1997年，Pang等[PWL97]开发了一种早期UV方法的分类方案。它结合了数据和可视化本身的属性，例如它的维度和离散或连续性质。此外，他们根据如何向可视化中添加不确定性信息对方法进行分类。他们确定了这些方法：字形、添加/修改几何形状、修改属性、动画、声音化和心理视觉。Brodlie等人[BAOL12]讨论了UV是一个难题的原因。一个主要原因是不确定性本身的复杂性，甚至不确定性的术语也常常是不清楚的。例如，对于生物分子紫外，不确定性通常与误差、精度或灵活性交替使用。另一个困难是，不确定性给可视化增加了另一个维度。对于一维或二维数据，这使得可视化更加复杂，但相对容易解决。大多数生物分子结构可视化本质上是三维的。因此，增加另一个维度——不确定性——变得更具挑战性。此外，他们特别强调了不确定性可视化和可视化的不确定性的区别：研究通常考虑数据集中不确定性的可视化。然而，可视化本身（通过过滤、映射和呈现）增加了另一层不确定性。Bonneau等[BHJ*14]正式描述了不确定性，并讨论了紫外线在医学可视化或天气和气候等几个领域的应用。集成可视化与不确定性可视化密切相关。事实上，表明生物分子结构不确定性的一个常用比喻是集合可视化。Wang等人的集合可视化调查[WHLS19]没有明确地处理生物分子结构；然而，几种表面或体积的可视化技术要么目前应用于生物分子结构，要么适用于这一目的。Padilla等人[PKH21]最近概述了紫外线的最佳实践，讨论了解释这些方法如何影响观众判断的认知理论。他们主要检查了低维（1D, 2D） UV，包括误差条，图标阵列和2D集合图。在2006年对UV的调查中，Griethe和Schumann [GS06]强调了可用性研究的缺乏，这是该领域的一个关键缺口。十多年后，Hullman等[HQC*19]对UV评估进行了深入调查，发现了几个尚未解决的问题。值得注意的是，他们观察到一种普遍的研究倾向于评估绩效准确性而不是决策质量。他们提出了采用透明和有效评价方法的指导方针。Weiskopf [Wei22]讨论了UV，并举例说明了生物数据可视化的一般概念。然而，他的论文并不是一个系统的调查，因为它关注的是一般的生物数据，所以只有少数的分子结构可视化是有特色的。因此，文献中对生物分子结构可视化中的不确定性仍缺乏全面的概述。在本节中，我们将简要概述分子结构可视化。我们首先概述生物分子（第3.1节），然后描述获取生物分子结构数据的各种方法（第3.2节）。最后，我们讨论分子表示模型（第3.3节）。我们只提供这些主题的简要概述；关于生物分子和生物分子数据采集的更多信息，请参见Nelson and Cox [NC21]， Berg et al. [BTS10], Alberts et al. [AHJ*22]。关于表示模型和可视化的更多信息可以在Kozlíková等人的报告中找到[KKF*17]。不确定性可视化（UV）侧重于显式地表示数据中的错误或不确定性。不确定性感知可视化（UAV）建立在此基础上，将不确定性可视化直接与数据本身相结合，确保两者都得到有效表示[GSWS21]。在这里，我们关注后一种方法，其中数据及其不确定性一起可视化。然而，这种可视化通常仍然被称为UV。在本节中，我们概述了潜在的不确定性来源（第4.1节），讨论了这些不确定性如何映射到生物分子数据（第4.2节），介绍了可视化不确定性的常见映射策略（第4.3节），并讨论了生物分子数据的不确定性可视化的细节以及与其他领域的共性（第4.4节）。接下来，我们指定范围和选择过程（第5.1节），并详细说明我们的分类模式（第5.2节）。虽然本节中讨论的方法是用小分子来说明的，但它们并不局限于小分子。某些技术，特别是那些依赖于球棒表示的技术，可能不是大分子的最佳选择，尽管从理论上讲，它们仍然可以使用。另一方面，采用体积可视化的方法应该更容易适用于各种类型的生物分子。1999年，Rheingans和Joshi首次明确发表了一篇关于分子UV的文章[RJ99]。他们提出了三种方法来可视化具有位置不确定性的分子（图2）。两种方法基于从集合计算的似然体积。可能性体积存储了原子在每个体素上的概率。他们给出了可视化这个体积的两种选择。第一个体可视化从体中提取透明等值面，并将其呈现在不透明球棒可视化的顶部（图2b）。另一种体积可视化方法直接呈现体积，产生类似云的外观（图2c）。第三种方法使用球棒表示叠加集合中的每个一致性。每种构象都可以不透明或半透明呈现（图2a），以便更确定的区域看起来不透明，而不确定的区域看起来更透明。Schmidt-Ehrenberg等人[shbh02]使用直接体绘制（DVR）或不透明等值面来显示似然体积。他们专注于亚稳态构象，这是分子长期保持的状态。因此，他们同时可视化了一个分子的几个亚稳构象，同时也分别显示了每个亚稳构象的可能性体积。隐式Best和Hege [BH02]让用户在其应用程序中将原子群聚集成椭球，以减少杂波，从而简化构象变化的检测。此外，他们混合了分子模拟的几个时间框架，以方便检测大的变化。与Rheingans和Joshi [RJ99]提出的隐式映射类似，Sabando等[SUS*21]通过分层半透明的单个构象可视化来可视化化合物的相似性。他们增加了一个选项来反转不透明度。因此，分子中更多不确定的部分可以通过看起来更不透明来突出显示。Knoll等[KCL*13]将化学键半径与vdW半径之间的区间定义为电荷密度分布上的不确定区间。他们用黑色的球棍表示分子。然后，他们覆盖了电子密度体积的可视化。他们用的是带有自定义传输功能的DVR。在他们的大多数例子中，他们使用两个峰来显示化学键的等能级（蓝色）和与vdW半径对应的等能级（红色）。在他们的论文中，他们关注的是分子界面而不是生物分子。由于使用球棒表示，它们的可视化适用于小分子而不是较大的生物分子。Wagner和Himmel [WH17]使用颜色和大小对两个分子结构之间的RMSD进行编码，以改进的球棒表示。球体颜色表示原子对的绝对RMSD。球体的大小与原子对对总RMSD的相对贡献成正比。为小分子开发的方法通常提供不确定性的总体可视化。它们表明分子中存在不确定性，并给出其大小的粗略感觉。然而，除了Wagner和Himmel [WH17]结合的颜色和大小编码外，做出更详细的判断可能具有挑战性。这些方法也不便于比较或集成任务。蛋白质是紫外线研究最广泛的生物分子，根据我们的模式，可视化可用于所有类型的不确定性表现和映射。与小分子类似，许多紫外检测方法也适用于其他生物分子类型。尽管如此，某些技术是特定于蛋白质的，比如二级结构或空腔的可视化。大多数关于核酸的论文都集中在可视化RNA上。只有两篇论文专门将DNA可视化。下面，我们将讨论生物分子结构紫外技术的模式和发展趋势。图12显示了一个平行集图[KBH06]，显示了所收集文献的属性沿模式主要维度的分布和相关性。在可视化中，维度——分子类型、表现形式和映射——用跨层一致的总贡献表示。然而，论文通常在每个维度的适用类别数量上有所不同。我们将每篇论文的贡献平均分配到给定维度内的适用类别中。例如，如果一篇论文采用了三种映射策略，每一种映射的比例宽度为论文总贡献的1/3。相反，只有一个映射的论文将其全部贡献分配给该映射。这种策略确保每张纸对每个维度的总宽度的贡献是相等的。在前一章中，我们探讨了该领域的趋势并概述了各种挑战。在这里，我们缩小我们的焦点，突出几个我们认为对未来研究最重要的领域。复杂和多尺度系统成像技术的进步使越来越复杂的系统的可视化成为可能。虽然存在一些方法来可视化相互作用分子的系统，如蛋白质-配体相互作用，但仍有很大的改进空间。用于更大系统的无人机，如中尺度结构或跨越多个尺度的可视化，在很大程度上仍未被探索。适用于中小型分子结构的方法在应用于更复杂的系统时往往变得混乱或不切实际。这凸显了对能够有效管理这些数据的规模和复杂性的方法的需求。一个潜在的解决方案是采用更加抽象和说明性的可视化技术。这些方法可以提供更清晰，更可解释的可视化，同时解决复杂分子系统的独特挑战。虽然已经有大量的文献对构象空间进行了可视化，但这个问题仍然具有挑战性，特别是对于大分子，因为现有的技术经常变得混乱，难以解释。最初的努力是由Melvin等人[MS16]进行的，他们采用焦点和上下文，以及故意选择的构象子集，以降低视觉复杂性。然而，整个构象空间的易于理解的可视化仍然很少。与复杂系统的可视化类似，数据的数量和复杂性可能是压倒性的。然而，构象空间的多维性带来了额外的挑战。对于复杂的系统，抽象和说明性技术提供了有希望的解决方案。强调关键状态或过渡的简化表示可以提高清晰度，而用于探索构象景观特定区域的交互式工具可以提高可用性。另外，动画——类似于HOPs [Hul16]——可以动态地表示构象变化，减少了在单个静态可视化中传达所有信息的需要。比较技术比较分子构型及其不确定性通常涉及并排可视化或简单叠加。这些方法对小数据集是有效的，但随着分子数量的增加而变得不切实际。覆盖层面临着类似于可视化构象空间的挑战，例如变得混乱和难以解释。这个问题在比较动态分子时尤其成问题。分子动力学（MD）模拟的集成对于解决单个模拟缺乏可重复性的问题至关重要[WSC21]。然而，MD集成产生大量的数据，需要新的可视化技术能够处理这种复杂性，而不会让用户不知所措。Belghit等人也强调了这一挑战[BSD*24]。这种可扩展性问题并不局限于生物分子系统；它也会影响更广泛的集成可视化方法[WHLS19]。这两个领域的进步可以带来互惠互利，为大型复杂数据集的可视化提供更好的解决方案。尽管核酸是当前众多研究课题的焦点[WEFC21，BJZ*22, WD23]，但在专门的紫外技术方面，核酸的代表性往往不足。当前的可视化主要关注碱基对概率，并依赖于相对简单的方法。虽然这些方法提供了对核酸结构的一些见解，但它们的范围有限，无法捕捉到这些分子的全部复杂性。能够提供更完整的核酸结构表示的全面的三维可视化在很大程度上是缺失的。此外，可视化核酸的构象空间仍然是一个重大的差距，尽管它对理解分子动力学，相互作用和功能很重要。评价生物分子结构的紫外技术还没有得到足够严格的评价。例如，确定哪些方法对观众来说是最直观的，它们如何有效地支持定量值的推断，以及它们是否准确地表示生物分子的构象空间，这些都是至关重要的。由于人类在解释不确定性时经常以非直觉的方式进行推理[PKH21]，因此必须严格评估不确定性可视化，以确保它们有效地支持准确的理解和决策。评估UV方法本质上是具有挑战性的[Hul16]，即使是简单的1D和2D可视化。在三维生物分子结构可视化的背景下，整合和评估不确定性更加复杂。然而，解决这一差距至关重要。它将使从业者能够针对特定问题选择最合适的方法，识别现有技术中的缺点，并推动更好地满足无人机需求的新方法的发展。在本报告中，我们介绍了生物分子结构的无人机方法，并从三个主要维度对方法进行了分类：分子类型、不确定性的表现和制图策略。虽然这些方法展示了代表单个分子不确定性的策略，但它们往往不能满足现代生物分子研究的要求。不断增加的生物分子数据集的大小和复杂性需要可视化技术，有效缩放和保持可解释性。目前的方法即使对于小到中等大小的分子也经常显得混乱，这使得它们不适合大型系统，如中尺度数据或动态结构的集合。这些差距突出了对创新的可视化策略的需求，这些策略可以处理现代生物分子数据的复杂性和规模。先进的无人机方法对于改善研究人员如何解释复杂的生物分子数据并与之交互至关重要，从而实现更清晰的见解和更明智的决策。考虑到与其他领域的相似之处，例如一般不确定性和集成可视化，无人机在生物分子结构方面的进步可能会产生更广泛的影响，使整个无人机领域受益。我们要感谢Frank Cordes， Ute Hellmich, Philipp Schnee, Philipp Thiel， Markus Weber和Christoph Wiedemann对UV在生物分子结构中的应用所做的富有见地的讨论。我们还要感谢Katarína furmanov<e:1>、Barbora Kozlíková和Robin skamatnberg与我们分享了这篇报道的图片。这项工作部分由德国研究基金会（DFG）资助-项目编号437702916。由Projekt DEAL支持和组织的开放获取资金。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computer Graphics Forum 工程技术-计算机：软件工程

CiteScore

5.80

自引率

12.00%

发文量

175

审稿时长

3-6 weeks

期刊介绍： Computer Graphics Forum is the official journal of Eurographics, published in cooperation with Wiley-Blackwell, and is a unique, international source of information for computer graphics professionals interested in graphics developments worldwide. It is now one of the leading journals for researchers, developers and users of computer graphics in both commercial and academic environments. The journal reports on the latest developments in the field throughout the world and covers all aspects of the theory, practice and application of computer graphics.