{"title":"Specific Nucleic Acid Detection Using a Nanoparticle Hybridization Assay","authors":"A. A. Aldakheel, C. B. Raub, H. T. Bui","doi":"arxiv-2409.03983","DOIUrl":"https://doi.org/arxiv-2409.03983","url":null,"abstract":"Simple methods to detect biomolecules including specific nucleic acid\u0000sequences have received renewed attention since the Severe Acute Respiratory\u0000Syndrome Coronavirus 2 (SARS-CoV-2) virus pandemic. Notably, biomolecule\u0000detection that uses some form of signal amplification will have some form of\u0000amplification-related error, which in the polymerase chain reaction involves\u0000mispriming and subsequent signal amplification in the no template control,\u0000ultimately providing a limit of detection. To demonstrate the feasibility of\u0000the detection of a DNA target sequence without molecular or chemical signal\u0000amplification that avoids amplification errors, a gold nanoparticle aggregation\u0000assay was developed and tested. Two primers bracketing a 94 base pair target\u0000sequence from SARS-CoV-2 were conjugated to 10 nm diameter gold nanoparticles\u0000by the salt aging method, with conjugation and primer-target hybridization\u0000confirmed by agarose gel electrophoresis and nanospectrophotometry. Upon mixing\u0000of both conjugated nanoparticles with target, a surface plasmon resonance shift\u0000of 6 nm was observed, and lower electrophoretic mobility of a band containing\u0000both DNA fluorescence and gold absorption signals. This did not occur in the\u0000presence of a control DNA molecule of the same size and composition as the\u0000target but with a randomly scrambled base position. Nanoparticle tracking at 30\u0000frames per second using a sensitive darkfield microscope revealed a lower\u0000measured diffusion coefficient of scattering objects in the target mixture than\u0000in the control mixture or with bare gold nanoparticles.","PeriodicalId":501022,"journal":{"name":"arXiv - QuanBio - Biomolecules","volume":"283 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142216117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marta Rigoli, Raffaello Potestio, Roberto Menichetti
{"title":"A multi-scale analysis of the CzrA transcription repressor highlights the allosteric changes induced by metal ion binding","authors":"Marta Rigoli, Raffaello Potestio, Roberto Menichetti","doi":"arxiv-2409.03584","DOIUrl":"https://doi.org/arxiv-2409.03584","url":null,"abstract":"Allosteric regulation is a widespread strategy employed by several proteins\u0000to transduce chemical signals and perform biological functions. Metal sensor\u0000proteins are exemplary in this respect, e.g., in that they selectively bind and\u0000unbind DNA depending on the state of a distal ion coordination site. In this\u0000work, we carry out an investigation of the structural and mechanical properties\u0000of the CzrA transcription repressor through the analysis of microsecond-long\u0000molecular dynamics (MD) trajectories; the latter are processed through the\u0000mapping entropy optimisation workflow (MEOW), a recently developed\u0000information-theoretical method that highlights, in an unsupervised manner,\u0000residues of particular mechanical, functional, and biological importance. This\u0000approach allows us to unveil how differences in the properties of the molecule\u0000are controlled by the state of the zinc coordination site, with particular\u0000attention to the DNA binding region. These changes correlate with a\u0000redistribution of the conformational variability of the residues throughout the\u0000molecule, in spite of an overall consistency of its architecture in the two\u0000(ion-bound and free) coordination states. The results of this work corroborate\u0000previous studies, provide novel insight into the fine details of the mechanics\u0000of CzrA, and showcase the MEOW approach as a novel instrument for the study of\u0000allosteric regulation and other processes in proteins through the analysis of\u0000plain MD simulations.","PeriodicalId":501022,"journal":{"name":"arXiv - QuanBio - Biomolecules","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142216118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multiview Random Vector Functional Link Network for Predicting DNA-Binding Proteins","authors":"A. Quadir, M. Sajid, M. Tanveer","doi":"arxiv-2409.02588","DOIUrl":"https://doi.org/arxiv-2409.02588","url":null,"abstract":"The identification of DNA-binding proteins (DBPs) is a critical task due to\u0000their significant impact on various biological activities. Understanding the\u0000mechanisms underlying protein-DNA interactions is essential for elucidating\u0000various life activities. In recent years, machine learning-based models have\u0000been prominently utilized for DBP prediction. In this paper, to predict DBPs,\u0000we propose a novel framework termed a multiview random vector functional link\u0000(MvRVFL) network, which fuses neural network architecture with multiview\u0000learning. The proposed MvRVFL model combines the benefits of late and early\u0000fusion, allowing for distinct regularization parameters across different views\u0000while leveraging a closed-form solution to determine unknown parameters\u0000efficiently. The primal objective function incorporates a coupling term aimed\u0000at minimizing a composite of errors stemming from all views. From each of the\u0000three protein views of the DBP datasets, we extract five features. These\u0000features are then fused together by incorporating a hidden feature during the\u0000model training process. The performance of the proposed MvRVFL model on the DBP\u0000dataset surpasses that of baseline models, demonstrating its superior\u0000effectiveness. Furthermore, we extend our assessment to the UCI, KEEL, AwA, and\u0000Corel5k datasets, to establish the practicality of the proposed models. The\u0000consistency error bound, the generalization error bound, and empirical\u0000findings, coupled with rigorous statistical analyses, confirm the superior\u0000generalization capabilities of the MvRVFL model compared to the baseline\u0000models.","PeriodicalId":501022,"journal":{"name":"arXiv - QuanBio - Biomolecules","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142216158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zi Hao Liu, Maria Tsanai, Oufan Zhang, Julie Forman-Kay, Teresa Head-Gordon
{"title":"Computational Methods to Investigate Intrinsically Disordered Proteins and their Complexes","authors":"Zi Hao Liu, Maria Tsanai, Oufan Zhang, Julie Forman-Kay, Teresa Head-Gordon","doi":"arxiv-2409.02240","DOIUrl":"https://doi.org/arxiv-2409.02240","url":null,"abstract":"In 1999 Wright and Dyson highlighted the fact that large sections of the\u0000proteome of all organisms are comprised of protein sequences that lack globular\u0000folded structures under physiological conditions. Since then the biophysics\u0000community has made significant strides in unraveling the intricate structural\u0000and dynamic characteristics of intrinsically disordered proteins (IDPs) and\u0000intrinsically disordered regions (IDRs). Unlike crystallographic beamlines and\u0000their role in streamlining acquisition of structures for folded proteins, an\u0000integrated experimental and computational approach aimed at IDPs/IDRs has\u0000emerged. In this Perspective we aim to provide a robust overview of current\u0000computational tools for IDPs and IDRs, and most recently their complexes and\u0000phase separated states, including statistical models, physics-based approaches,\u0000and machine learning methods that permit structural ensemble generation and\u0000validation against many solution experimental data types.","PeriodicalId":501022,"journal":{"name":"arXiv - QuanBio - Biomolecules","volume":"59 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142216159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dingshuo Chen, Zhixun Li, Yuyan Ni, Guibin Zhang, Ding Wang, Qiang Liu, Shu Wu, Jeffrey Xu Yu, Liang Wang
{"title":"Beyond Efficiency: Molecular Data Pruning for Enhanced Generalization","authors":"Dingshuo Chen, Zhixun Li, Yuyan Ni, Guibin Zhang, Ding Wang, Qiang Liu, Shu Wu, Jeffrey Xu Yu, Liang Wang","doi":"arxiv-2409.01081","DOIUrl":"https://doi.org/arxiv-2409.01081","url":null,"abstract":"With the emergence of various molecular tasks and massive datasets, how to\u0000perform efficient training has become an urgent yet under-explored issue in the\u0000area. Data pruning (DP), as an oft-stated approach to saving training burdens,\u0000filters out less influential samples to form a coreset for training. However,\u0000the increasing reliance on pretrained models for molecular tasks renders\u0000traditional in-domain DP methods incompatible. Therefore, we propose a\u0000Molecular data Pruning framework for enhanced Generalization (MolPeg), which\u0000focuses on the source-free data pruning scenario, where data pruning is applied\u0000with pretrained models. By maintaining two models with different updating paces\u0000during training, we introduce a novel scoring function to measure the\u0000informativeness of samples based on the loss discrepancy. As a plug-and-play\u0000framework, MolPeg realizes the perception of both source and target domain and\u0000consistently outperforms existing DP methods across four downstream tasks.\u0000Remarkably, it can surpass the performance obtained from full-dataset training,\u0000even when pruning up to 60-70% of the data on HIV and PCBA dataset. Our work\u0000suggests that the discovery of effective data-pruning metrics could provide a\u0000viable path to both enhanced efficiency and superior generalization in transfer\u0000learning.","PeriodicalId":501022,"journal":{"name":"arXiv - QuanBio - Biomolecules","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142226945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Leveraging Deep Generative Model For Computational Protein Design And Optimization","authors":"Boqiao Lai","doi":"arxiv-2408.17241","DOIUrl":"https://doi.org/arxiv-2408.17241","url":null,"abstract":"Proteins are the fundamental macromolecules that play diverse and crucial\u0000roles in all living matter and have tremendous implications in healthcare,\u0000manufacturing, and biotechnology. Their functions are largely determined by the\u0000sequences of amino acids that compose them and their unique three-dimensional\u0000structures when folded. The recent surge in highly accurate computational\u0000protein structure prediction tools has equipped scientists with the means to\u0000derive preliminary structural insights without the onerous costs of\u0000experimental structure determination. These breakthroughs hold profound promise\u0000for building robust and efficient in silico protein design systems. While the prospect of designing de novo proteins with precise computational\u0000accuracy remains a grand challenge in biochemical engineering, conventional\u0000assembly-based and rational design methods often grapple with the expansive\u0000design space, resulting in suboptimal design success rates. Despite recently\u0000emerged deep learning-based models have shown promise in improving the\u0000efficiency of the computational protein design process, a significant gap\u0000persists between current design paradigms and their experimental realization.\u0000This thesis will investigate the potential of deep generative models in\u0000refining protein structure and sequence design methods, aiming to develop\u0000frameworks capable of crafting novel protein sequences with predetermined\u0000structures or specific functionalities. By harnessing extensive protein\u0000databases and cutting-edge neural architectures, this research aims to enhance\u0000precision and robustness in current protein design paradigms, potentially\u0000paving the way for advancements across various scientific fields.","PeriodicalId":501022,"journal":{"name":"arXiv - QuanBio - Biomolecules","volume":"70 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142216161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lenard Neander, Cedric Hannemann, Roland R. Netz, Anil Kumar Sahoo
{"title":"Quantitative Prediction of Protein-Polyelectrolyte Binding Thermodynamics: Adsorption of Heparin-Analog Polysulfates to the SARS-CoV-2 Spike Protein RBD","authors":"Lenard Neander, Cedric Hannemann, Roland R. Netz, Anil Kumar Sahoo","doi":"arxiv-2409.00210","DOIUrl":"https://doi.org/arxiv-2409.00210","url":null,"abstract":"Interactions of polyelectrolytes (PEs) with proteins play a crucial role in\u0000numerous biological processes, such as the internalization of virus particles\u0000into host cells. Although docking, machine learning methods, and molecular\u0000dynamics (MD) simulations are utilized to estimate binding poses and binding\u0000free energies of small-molecule drugs to proteins, quantitative prediction of\u0000the binding thermodynamics of PE-based drugs presents a significant obstacle in\u0000computer-aided drug design. This is due to the sluggish dynamics of PEs caused\u0000by their size and strong charge-charge correlations. In this paper, we\u0000introduce advanced sampling methods based on a force-spectroscopy setup and\u0000theoretical modeling to overcome this barrier. We exemplify our method with\u0000explicit solvent all-atom MD simulations of interactions of anionic PEs that\u0000show antiviral properties, namely heparin and linear polyglycerol sulfate\u0000(LPGS), with the SARS-CoV-2 spike protein receptor binding domain (RBD). Our\u0000prediction for the binding free energy of LPGS to the wild-type RBD matches\u0000experimentally measured dissociation constants within thermal energy, kT, and\u0000correctly reproduces the experimental PE-length dependence. We find that LPGS\u0000binds to the Delta-variant RBD with an additional free-energy gain of 2.4 kT,\u0000compared to the wild-type RBD, in accord with electrostatic arguments. We show\u0000that the LPGS-RBD binding is solvent-dominated and enthalpy-driven, though with\u0000a large entropy-enthalpy compensation. Our method is applicable to general\u0000polymer adsorption phenomena and predicts precise binding free energies and\u0000re-configurational friction as needed for drug and drug-delivery design.","PeriodicalId":501022,"journal":{"name":"arXiv - QuanBio - Biomolecules","volume":"53 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142216160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lihang Liu, Shanzhuo Zhang, Yang Xue, Xianbin Ye, Kunrui Zhu, Yuxin Li, Yang Liu, Xiaonan Zhang, Xiaomin Fang
{"title":"Technical Report of HelixFold3 for Biomolecular Structure Prediction","authors":"Lihang Liu, Shanzhuo Zhang, Yang Xue, Xianbin Ye, Kunrui Zhu, Yuxin Li, Yang Liu, Xiaonan Zhang, Xiaomin Fang","doi":"arxiv-2408.16975","DOIUrl":"https://doi.org/arxiv-2408.16975","url":null,"abstract":"The AlphaFold series has transformed protein structure prediction with\u0000remarkable accuracy, often matching experimental methods. AlphaFold2,\u0000AlphaFold-Multimer, and the latest AlphaFold3 represent significant strides in\u0000predicting single protein chains, protein complexes, and biomolecular\u0000structures. While AlphaFold2 and AlphaFold-Multimer are open-sourced,\u0000facilitating rapid and reliable predictions, AlphaFold3 remains partially\u0000accessible through a limited online server and has not been open-sourced,\u0000restricting further development. To address these challenges, the PaddleHelix\u0000team is developing HelixFold3, aiming to replicate AlphaFold3's capabilities.\u0000Using insights from previous models and extensive datasets, HelixFold3 achieves\u0000an accuracy comparable to AlphaFold3 in predicting the structures of\u0000conventional ligands, nucleic acids, and proteins. The initial release of\u0000HelixFold3 is available as open source on GitHub for academic research,\u0000promising to advance biomolecular research and accelerate discoveries. We also\u0000provide online service at PaddleHelix website at\u0000https://paddlehelix.baidu.com/app/all/helixfold3/forecast.","PeriodicalId":501022,"journal":{"name":"arXiv - QuanBio - Biomolecules","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142216172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sully F. Chen, Robert J. Steele, Beakal Lemeneh, Shivanand P. Lad, Eric Oermann
{"title":"Large-Scale Multi-omic Biosequence Transformers for Modeling Peptide-Nucleotide Interactions","authors":"Sully F. Chen, Robert J. Steele, Beakal Lemeneh, Shivanand P. Lad, Eric Oermann","doi":"arxiv-2408.16245","DOIUrl":"https://doi.org/arxiv-2408.16245","url":null,"abstract":"The transformer architecture has revolutionized bioinformatics and driven\u0000progress in the understanding and prediction of the properties of biomolecules.\u0000Almost all research on large-scale biosequence transformers has focused on one\u0000domain at a time (single-omic), usually nucleotides or peptides. These models\u0000have seen incredible success in downstream tasks in each domain and have\u0000achieved particularly noteworthy breakthroughs in sequences of peptides and\u0000structural modeling. However, these single-omic models are naturally incapable\u0000of modeling multi-omic tasks, one of the most biologically critical being\u0000nucleotide-peptide interactions. We present our work training the first multi-omic nucleotide-peptide\u0000foundation models. We show that these multi-omic models (MOMs) can learn joint\u0000representations between various single-omic distributions that are emergently\u0000consistent with the Central Dogma of molecular biology, despite only being\u0000trained on unlabeled biosequences. We further demonstrate that MOMs can be\u0000fine-tuned to achieve state-of-the-art results on peptide-nucleotide\u0000interaction tasks, namely predicting the change in Gibbs free energy\u0000({Delta}G) of the binding interaction between a given oligonucleotide and\u0000peptide, as well as the effect on this binding interaction due to mutations in\u0000the oligonucleotide sequence ({Delta}{Delta}G). Remarkably, we show that multi-omic biosequence transformers emergently learn\u0000useful structural information without any prior structural training, allowing\u0000us to predict which peptide residues are most involved in the\u0000peptide-nucleotide binding interaction. Lastly, we provide evidence that\u0000multi-omic biosequence models are non-inferior to foundation models trained on\u0000single-omics distributions, suggesting a more generalized or foundational\u0000approach to building these models.","PeriodicalId":501022,"journal":{"name":"arXiv - QuanBio - Biomolecules","volume":"318 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142226946","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"S-MolSearch: 3D Semi-supervised Contrastive Learning for Bioactive Molecule Search","authors":"Gengmo Zhou, Zhen Wang, Feng Yu, Guolin Ke, Zhewei Wei, Zhifeng Gao","doi":"arxiv-2409.07462","DOIUrl":"https://doi.org/arxiv-2409.07462","url":null,"abstract":"Virtual Screening is an essential technique in the early phases of drug\u0000discovery, aimed at identifying promising drug candidates from vast molecular\u0000libraries. Recently, ligand-based virtual screening has garnered significant\u0000attention due to its efficacy in conducting extensive database screenings\u0000without relying on specific protein-binding site information. Obtaining binding\u0000affinity data for complexes is highly expensive, resulting in a limited amount\u0000of available data that covers a relatively small chemical space. Moreover,\u0000these datasets contain a significant amount of inconsistent noise. It is\u0000challenging to identify an inductive bias that consistently maintains the\u0000integrity of molecular activity during data augmentation. To tackle these\u0000challenges, we propose S-MolSearch, the first framework to our knowledge, that\u0000leverages molecular 3D information and affinity information in semi-supervised\u0000contrastive learning for ligand-based virtual screening. Drawing on the\u0000principles of inverse optimal transport, S-MolSearch efficiently processes both\u0000labeled and unlabeled data, training molecular structural encoders while\u0000generating soft labels for the unlabeled data. This design allows S-MolSearch\u0000to adaptively utilize unlabeled data within the learning process. Empirically,\u0000S-MolSearch demonstrates superior performance on widely-used benchmarks\u0000LIT-PCBA and DUD-E. It surpasses both structure-based and ligand-based virtual\u0000screening methods for enrichment factors across 0.5%, 1% and 5%.","PeriodicalId":501022,"journal":{"name":"arXiv - QuanBio - Biomolecules","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142216165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}