Bioinformatics (Oxford, England)最新文献

筛选
英文 中文
TissueViewer: a web-based multiplexed image viewer. TissueViewer:一个基于web的多路图像查看器。
Bioinformatics (Oxford, England) Pub Date : 2025-05-06 DOI: 10.1093/bioinformatics/btaf246
David Gerardus Pieter van IJzendoorn, Magdalena Matusiak, Rob West, Matt van de Rijn
{"title":"TissueViewer: a web-based multiplexed image viewer.","authors":"David Gerardus Pieter van IJzendoorn, Magdalena Matusiak, Rob West, Matt van de Rijn","doi":"10.1093/bioinformatics/btaf246","DOIUrl":"10.1093/bioinformatics/btaf246","url":null,"abstract":"<p><strong>Motivation: </strong>Datasets generated by spatial biology techniques such as multiplex immunofluorescence staining or spatial transcriptomics profiling of histologic sections carry a tremendous wealth of information. Several commercial platforms exist that can simultaneously acquire 1-1000 distinct marker signals (e.g. MIBI, CODEX, Orion, Nanostring CosMX SMI, Vizgen). However, due to the large size of these datasets, their viewing and sharing are slow, laborious, and require extensive computational resources.</p><p><strong>Results: </strong>To overcome these challenges, we developed TissueViewer, an easy to setup and use web-based viewer designed to deliver high-resolution images over the internet with low bandwidth requirements and at high speed.</p><p><strong>Availability and implementation: </strong>TissueViewer is available on GitHub (https://github.com/davidvi/tissueviewer) and can be used on the TissueViewer.org platform, where readers can upload their own data with a limit of 50 GB to share with colleagues.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144051954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Nonparametric IPSS: fast, flexible feature selection with false discovery control. 非参数IPSS:快速,灵活的特征选择与错误发现控制。
Bioinformatics (Oxford, England) Pub Date : 2025-05-06 DOI: 10.1093/bioinformatics/btaf299
Omar Melikechi, David B Dunson, Jeffrey W Miller
{"title":"Nonparametric IPSS: fast, flexible feature selection with false discovery control.","authors":"Omar Melikechi, David B Dunson, Jeffrey W Miller","doi":"10.1093/bioinformatics/btaf299","DOIUrl":"10.1093/bioinformatics/btaf299","url":null,"abstract":"<p><strong>Motivation: </strong>Feature selection is a critical task in machine learning and statistics. However, existing feature selection methods either (i) rely on parametric methods such as linear or generalized linear models, (ii) lack theoretical false discovery control, or (iii) identify few true positives.</p><p><strong>Results: </strong>We introduce a general feature selection method with finite-sample false discovery control based on applying integrated path stability selection (IPSS) to arbitrary feature importance scores. The method is nonparametric whenever the importance scores are nonparametric, and it estimates q-values, which are better suited to high-dimensional data than P-values. We focus on two special cases using importance scores from gradient boosting (IPSSGB) and random forests (IPSSRF). Extensive nonlinear simulations with RNA sequencing data show that both methods accurately control the false discovery rate and detect more true positives than existing methods. Both methods are also efficient, running in under 20 s when there are 500 samples and 5000 features. We apply IPSSGB and IPSSRF to detect microRNAs and genes related to cancer, finding that they yield better predictions with fewer features than existing approaches.</p><p><strong>Availability and implementation: </strong>All code and data used in this work are available on GitHub (https://github.com/omelikechi/ipss_bioinformatics) and permanently archived on Zenodo (https://doi.org/10.5281/zenodo.15335289). A Python package for implementing IPSS is available on GitHub (https://github.com/omelikechi/ipss) and PyPI (https://pypi.org/project/ipss/). An R implementation of IPSS is also available on GitHub (https://github.com/omelikechi/ipssR).</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144000439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Foldclass and Merizo-search: scalable structural similarity search for single- and multi-domain proteins using geometric learning. 折叠类和merizo搜索:使用几何学习对单域和多域蛋白质进行可扩展的结构相似性搜索。
Bioinformatics (Oxford, England) Pub Date : 2025-05-06 DOI: 10.1093/bioinformatics/btaf277
Shaun M Kandathil, Andy M Lau, Daniel W A Buchan, David T Jones
{"title":"Foldclass and Merizo-search: scalable structural similarity search for single- and multi-domain proteins using geometric learning.","authors":"Shaun M Kandathil, Andy M Lau, Daniel W A Buchan, David T Jones","doi":"10.1093/bioinformatics/btaf277","DOIUrl":"10.1093/bioinformatics/btaf277","url":null,"abstract":"<p><strong>Motivation: </strong>The availability of very large numbers of protein structures from accurate computational methods poses new challenges in storing, searching and detecting relationships between these structures. In particular, the new-found abundance of multi-domain structures in the AlphaFold structure database introduces challenges for traditional structure comparison methods.</p><p><strong>Results: </strong>We address these challenges using a fast, embedding-based structure comparison method called Foldclass which detects structural similarity between protein domains. We demonstrate the accuracy of Foldclass embeddings for homology detection. In combination with a recently developed deep learning-based automatic domain segmentation tool Merizo, we develop Merizo-search, which first segments multi-domain query structures into domains, and then searches a Foldclass embedding database to determine the top matches for each constituent domain. Combining the ability of Merizo to accurately segment complete chains into domains, and Foldclass to embed and detect similar domains, the Merizo-search tool can be used to rapidly detect per-domain similarities for complete chains, taking as little as 2 min to search all 365 million domains from the Encyclopedia of Domains. We anticipate that these tools will enable many analyses using the wealth of predicted structural data now available.</p><p><strong>Availability and implementation: </strong>Foldclass and Merizo-search are available at https://github.com/psipred/merizo_search. The version used in this publication is archived at https://doi.org/10.5281/zenodo.15120830. Merizo-search is also available on the PSIPRED web server at http://bioinf.cs.ucl.ac.uk/psipred.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144061732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Colora: a Snakemake workflow for complete chromosome-scale de novo genome assembly. 科罗拉多:一个完整的染色体尺度从头基因组组装的蛇尾工作流程。
Bioinformatics (Oxford, England) Pub Date : 2025-05-06 DOI: 10.1093/bioinformatics/btaf175
Lia Obinu, Timothy Booth, Heleen De Weerd, Urmi Trivedi, Andrea Porceddu
{"title":"Colora: a Snakemake workflow for complete chromosome-scale de novo genome assembly.","authors":"Lia Obinu, Timothy Booth, Heleen De Weerd, Urmi Trivedi, Andrea Porceddu","doi":"10.1093/bioinformatics/btaf175","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf175","url":null,"abstract":"<p><strong>Motivation: </strong>De novo assembly creates reference genomes that underpin many modern biodiversity and conservation studies. Large numbers of new genomes are being assembled by labs around the world. To avoid duplication of efforts and variable data quality, we desire a best-practice assembly process, implemented as an automated portable workflow.</p><p><strong>Results: </strong>Here, we present Colora, a Snakemake workflow that produces chromosome-scale de novo primary or phased genome assemblies complete with organelles using Pacific Biosciences HiFi, Hi-C, and optionally Oxford Nanopore Technologies reads as input. Colora is a user-friendly, versatile, and reproducible pipeline that is ready to use by researchers looking for an automated way to obtain high-quality de novo genome assemblies.</p><p><strong>Availability and implementation: </strong>The source code of Colora is available on GitHub (https://github.com/LiaOb21/colora) and has been deposited in Zenodo under DOI https://doi.org/10.5281/zenodo.13321576. Colora is also available at the Snakemake Workflow Catalog (https://snakemake.github.io/snakemake-workflow-catalog/? usage=LiaOb21%2Fcolora).</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":"41 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12065627/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144002080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
In silico identification of archaeal DNA-binding proteins. 古细菌dna结合蛋白的计算机鉴定。
Bioinformatics (Oxford, England) Pub Date : 2025-05-06 DOI: 10.1093/bioinformatics/btaf169
Linus Donvil, Joëlle A J Housmans, Eveline Peeters, Wim Vranken, Gabriele Orlando
{"title":"In silico identification of archaeal DNA-binding proteins.","authors":"Linus Donvil, Joëlle A J Housmans, Eveline Peeters, Wim Vranken, Gabriele Orlando","doi":"10.1093/bioinformatics/btaf169","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf169","url":null,"abstract":"<p><strong>Motivation: </strong>The rapid advancement of next-generation sequencing technologies has generated an immense volume of genetic data. However, these data are unevenly distributed, with well-studied organisms being disproportionately represented, while other organisms, such as from archaea, remain significantly underexplored. The study of archaea is particularly challenging due to the extreme environments they inhabit and the difficulties associated with culturing them in the laboratory. Despite these challenges, archaea likely represent a crucial evolutionary link between eukaryotic and prokaryotic organisms, and their investigation could shed light on the early stages of life on Earth. Yet, a significant portion of archaeal proteins are annotated with limited or inaccurate information. Among the various classes of archaeal proteins, DNA-binding proteins are of particular importance. While they represent a large portion of every known proteome, their identification in archaea is complicated by the substantial evolutionary divergence between archaeal and the other better studied organisms.</p><p><strong>Results: </strong>To address the challenges of identifying DNA-binding proteins in archaea, we developed Xenusia, a neural network-based tool capable of screening entire archaeal proteomes to identify DNA-binding proteins. Xenusia has proven effective across diverse datasets, including metagenomics data, successfully identifying novel DNA-binding proteins, with experimental validation of its predictions.</p><p><strong>Availability and implementation: </strong>Xenusia is available as a PyPI package, with source code accessible at https://github.com/grogdrinker/xenusia, and as a Google Colab web server application at xenusia.ipynb.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":"41 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12065626/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144010068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MS1FA: Shiny app for the annotation of redundant features in untargeted metabolomics datasets. MS1FA:闪亮的应用程序,用于注释非目标代谢组学数据集中的冗余特征。
Bioinformatics (Oxford, England) Pub Date : 2025-05-06 DOI: 10.1093/bioinformatics/btaf161
Ruibing Shi, Frank Klawonn, Mark Brönstrup, Raimo Franke
{"title":"MS1FA: Shiny app for the annotation of redundant features in untargeted metabolomics datasets.","authors":"Ruibing Shi, Frank Klawonn, Mark Brönstrup, Raimo Franke","doi":"10.1093/bioinformatics/btaf161","DOIUrl":"10.1093/bioinformatics/btaf161","url":null,"abstract":"<p><strong>Motivation: </strong>Untargeted metabolomics, the comprehensive analysis of small molecules in biological systems, has become an invaluable tool for understanding physiology and metabolism. However, the annotation of metabolomic data is often confounded by the presence of redundant features, which can arise from e.g. multimerization, in-source fragments (ISFs), and adducts.</p><p><strong>Results: </strong>MS1FA uniquely integrates all major annotation approaches for redundant features within a single interactive platform. It combines correlation-based grouping with reliable ISF annotation using MS2 data and operates with MS1 data only, MS2 data only, or both. Additionally, it offers a distinctive method for grouping features based on relational criteria. As the only web-based platform with these capabilities, MS1FA provides easy access and allows users to explore and annotate the feature table interactively, with options to download the results.</p><p><strong>Availability and implementation: </strong>MS1FA is freely accessible at https://ms1fa.helmholtz-hzi.de. The source code and data are available at https://github.com/RuibingS/MS1FA_RShiny_dashboard and are archived with the DOI 10.5281/zenodo.15118962.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":"41 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12069231/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144031090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
argNorm: normalization of antibiotic resistance gene annotations to the Antibiotic Resistance Ontology (ARO). argNorm:抗生素耐药基因注释归一化到抗生素耐药本体(antibiotic resistance Ontology, ARO)。
Bioinformatics (Oxford, England) Pub Date : 2025-05-06 DOI: 10.1093/bioinformatics/btaf173
Svetlana Ugarcina Perovic, Vedanth Ramji, Hui Chong, Yiqian Duan, Finlay Maguire, Luis Pedro Coelho
{"title":"argNorm: normalization of antibiotic resistance gene annotations to the Antibiotic Resistance Ontology (ARO).","authors":"Svetlana Ugarcina Perovic, Vedanth Ramji, Hui Chong, Yiqian Duan, Finlay Maguire, Luis Pedro Coelho","doi":"10.1093/bioinformatics/btaf173","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf173","url":null,"abstract":"<p><strong>Summary: </strong>Currently available and frequently used tools for annotating antibiotic resistance genes (ARGs) in genomes and metagenomes provide results using inconsistent nomenclature. This makes the comparison of different ARG annotation outputs challenging. The comparability of ARG annotation outputs can be improved by mapping gene names and their categories to a common controlled vocabulary such as the Antibiotic Resistance Ontology (ARO). We developed argNorm, a command line tool and Python library, to normalize all detected genes across six ARG annotation tools (eight databases) to the ARO. argNorm also adds information to the outputs using the same ARG categorization so that they are comparable across tools.</p><p><strong>Availability and implementation: </strong>argNorm is available as an open-source tool at: https://github.com/BigDataBiology/argNorm. It can also be downloaded as a PyPI package and is available on Bioconda and as an nf-core module.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":"41 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12064170/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144059693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mapping the attractor landscape of Boolean networks with biobalm. 用生物弹绘制布尔网络的吸引子景观。
Bioinformatics (Oxford, England) Pub Date : 2025-05-06 DOI: 10.1093/bioinformatics/btaf280
Van-Giang Trinh, Kyu Hyong Park, Samuel Pastva, Jordan C Rozum
{"title":"Mapping the attractor landscape of Boolean networks with biobalm.","authors":"Van-Giang Trinh, Kyu Hyong Park, Samuel Pastva, Jordan C Rozum","doi":"10.1093/bioinformatics/btaf280","DOIUrl":"10.1093/bioinformatics/btaf280","url":null,"abstract":"<p><strong>Motivation: </strong>Boolean networks are popular dynamical models of cellular processes in systems biology. Their attractors model phenotypes that arise from the interplay of key regulatory subcircuits. A succession diagram (SD) describes this interplay in a discrete analog of Waddington's epigenetic attractor landscape that allows for fast identification of attractors and attractor control strategies. Efficient computational tools for studying SDs are essential for the understanding of Boolean attractor landscapes and connecting them to their biological functions.</p><p><strong>Results: </strong>We present a new approach to SD construction for asynchronously updated Boolean networks, implemented in the biologist's Boolean attractor landscape mapper, biobalm. We compare biobalm to similar tools and find a substantial performance increase in SD construction, attractor identification, and attractor control. We perform the most comprehensive comparative analysis to date of the SD structure in experimentally-validated Boolean models of cell processes and random ensembles. We find that random models (including critical Kauffman networks) have relatively small SDs, indicating simple decision structures. In contrast, nonrandom models from the literature are enriched in extremely large SDs, indicating an abundance of decision points and suggesting the presence of complex Waddington landscapes in nature.</p><p><strong>Availability and implementation: </strong>The tool biobalm is available online at https://github.com/jcrozum/biobalm. Further data, scripts for testing, analysis, and figure generation are available online at https://github.com/jcrozum/biobalm-analysis and in the reproducibility artefact at https://doi.org/10.5281/zenodo.13854760.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12102066/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144055076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Missense variants pathogenicity annotation from homologous proteins. 同源蛋白的错义变异致病性注释。
Bioinformatics (Oxford, England) Pub Date : 2025-05-06 DOI: 10.1093/bioinformatics/btaf305
Gabriel Ruiz-Alías, Sergi Soldevila, Xavier Altafaj, Arnau Cordomí, Mireia Olivella
{"title":"Missense variants pathogenicity annotation from homologous proteins.","authors":"Gabriel Ruiz-Alías, Sergi Soldevila, Xavier Altafaj, Arnau Cordomí, Mireia Olivella","doi":"10.1093/bioinformatics/btaf305","DOIUrl":"10.1093/bioinformatics/btaf305","url":null,"abstract":"<p><strong>Motivation: </strong>High-throughput DNA sequencing has revealed millions of single nucleotide variants (SNVs) in the human genome, with a small fraction linked to disease. The effect of missense variants, which alter the protein sequence, is particularly challenging to interpret due to the scarcity of clinical annotations and experimental information. While using conservation and structural information, current prediction tools still struggle to predict variant pathogenicity. In this study, we explored the pathogenicity of homologous missense variants-variants in equivalent positions across homologous proteins-focusing on proteins involved in autosomal dominant diseases.</p><p><strong>Results: </strong>Our analysis of 2976 pathogenic and 17 555 non-pathogenic homologous variants demonstrated that pathogenicity can be extrapolated with 95% accuracy within a family, or up to 98% for closer homologs. Remarkably, the evaluation of 27 commonly used mutation predictor methods revealed that they were not fully capturing this biological feature. To facilitate the exploration of homologous variants, we created HomolVar, a web server that computationally predicts the pathogenesis of missense variants using annotations from homologous variants, freely available at https://rarevariants.org/HomolVar. Overall, these findings and the accompanying tool offer a robust method for predicting the pathogenicity of unannotated variants, enhancing genotype-phenotype correlations, and contributing to diagnosing rare genetic disorders.</p><p><strong>Availability and implementation: </strong>HomolVar is freely available at https://rarevariants.org/HomolVar.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144061733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SimSon: simple contrastive learning of SMILES for molecular property prediction. SimSon:用于分子性质预测的smile简单对比学习。
Bioinformatics (Oxford, England) Pub Date : 2025-05-06 DOI: 10.1093/bioinformatics/btaf275
Chae Eun Lee, Jin Sob Kim, Jin Hong Min, Sung Won Han
{"title":"SimSon: simple contrastive learning of SMILES for molecular property prediction.","authors":"Chae Eun Lee, Jin Sob Kim, Jin Hong Min, Sung Won Han","doi":"10.1093/bioinformatics/btaf275","DOIUrl":"10.1093/bioinformatics/btaf275","url":null,"abstract":"<p><strong>Motivation: </strong>Molecular property prediction with deep learning has accelerated drug discovery and retrosynthesis. However, the shortage of labeled molecular data and the challenge of generalizing across the vast chemical spaces pose significant hurdles for leveraging deep learning in molecular property prediction. This study proposes a self-supervised framework designed to acquire a Simplified Molecular Input Line Entry System (SMILES) representation, which we have dubbed Simple SMILES contrastive learning (SimSon). SimSon was pre-trained using unlabeled SMILES data through contrastive learning to grasp the SMILES representations.</p><p><strong>Results: </strong>Our findings demonstrate that contrastive learning with randomized SMILES enriches the ability of the model to generalize and its robustness as it captures the global semantic context at the molecular level. In downstream tasks, SimSon performs competitively when compared to graph-based methods and even outperforms them on certain benchmark datasets. These results indicate that SimSon effectively captures structural information from SMILES, exhibiting remarkable generalization and robustness. The potential applications of SimSon extend to bioinformatics and cheminformatics, encompassing areas such as drug discovery and drug-drug interaction prediction.</p><p><strong>Availability and implementation: </strong>The source code is available at https://github.com/lee00206/SimSon.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144060843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信