Bioinformatics advances最新文献

筛选
英文 中文
Identification of universal grass genes and estimates of their monocot-/commelinid-/grass-specificity. 禾草通用基因的鉴定及其单子叶/ commellid /草特异性的估计。
IF 2.4
Bioinformatics advances Pub Date : 2025-04-07 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf079
Rowan A C Mitchell
{"title":"Identification of universal grass genes and estimates of their monocot-/commelinid-/grass-specificity.","authors":"Rowan A C Mitchell","doi":"10.1093/bioadv/vbaf079","DOIUrl":"10.1093/bioadv/vbaf079","url":null,"abstract":"<p><strong>Motivation: </strong>Where experiments identify sets of grass genes of unknown function, e.g. underlying a QTL or co-expressed in a transcriptome, it is useful to know which of these genes are common to all grasses (universal) and whether they likely have monocot-/commelinid-/grass-specific function.</p><p><strong>Results: </strong>A pipeline used data on 16 grass full genomes from Ensembl Plants to generate 13 312 highly conserved, universal groups of grass protein-coding genes. Validation steps showed that 98.8% of these groups also had gene matches in recently sequenced genomes from two major grass clades not used in the pipeline. Comparison with many non-grass genomes identified 4609 of these groups as likely of monocot-/commelinid-/grass-specific function. Both grouping of genes and specificity were defined using hidden Markov model (HMM) profiles of the groups. The HMM-based approach performed better than simple percentage identity in discriminating between test sets of known specific and non-specific genes. The results give novel insight into the nature of monocot-/commelinid-/grass-specific genes. Researchers can use the universal_grass_peps database to gain evidence for their experimentally identified grass genes being involved in monocot-/commelinid-/grass-specific traits.</p><p><strong>Availability and implementation: </strong>The universal_grass_peps database is available for download at https://data.rothamsted.ac.uk/dataset/universal_grass_peps.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf079"},"PeriodicalIF":2.4,"publicationDate":"2025-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12098945/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144144661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
rPIMS: a ShinyR package for the precision identification and modelling of livestock breeds using genomic data and machine learning approaches. rPIMS:使用基因组数据和机器学习方法对牲畜品种进行精确识别和建模的ShinyR软件包。
IF 2.4
Bioinformatics advances Pub Date : 2025-04-07 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf077
Yuhetian Zhao, Xuexue Liu, Benmeng Liang, Lin Jiang
{"title":"rPIMS: a ShinyR package for the precision identification and modelling of livestock breeds using genomic data and machine learning approaches.","authors":"Yuhetian Zhao, Xuexue Liu, Benmeng Liang, Lin Jiang","doi":"10.1093/bioadv/vbaf077","DOIUrl":"https://doi.org/10.1093/bioadv/vbaf077","url":null,"abstract":"<p><strong>Summary: </strong>Accurate breed identification serves is a crucial cornerstone for the conservation and utilization of livestock and poultry genetic resources. The identification of breeds based on a variety of information sources and analytical methods has been extensively applied in the domain of animal genetics and breeding. Recently, the integration of large-scale genomic data with machine learning has become increasingly prevalent for breed identification tasks. However, such projects typically require extensive sequencing data and expertise in bioinformatics. To address this, we introduce rPIMS, a comprehensive tool designed to simplify breed identification and genetic analysis. With intuitive modules for data input, dimensionality reduction, phylogenetic tree construction, population structure analysis, and machine learning-based classification, rPIMS has the capacity to streamlines the analytical process for researchers. It promotes collaboration, facilitates efficient data sharing, and enhances the ability to identify and report genetic diversity and evolutionary relationships among livestock breeds. We performed a validation analysis to confirm that rPIMS achieved 100% classification accuracy in distinguishing 10 breeds using only 860 SNPs. In summary, rPIMS significantly simplifies complex model-building processes, making breed classification and genetic structure visualization accessible and intuitive to users.</p><p><strong>Availability and implementation: </strong>rPIMS is a Shiny R application designed for breed identification in livestock using genomic data and machine learning, accessible through an intuitive graphical user interface. It is freely available under the GNU Public License on GitHub: https://github.com/Werewolfzy/rPIMS.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf077"},"PeriodicalIF":2.4,"publicationDate":"2025-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12052404/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144008277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mars: simplifying bioinformatics workflows through a containerized approach to tool integration and management. 玛氏:通过工具集成和管理的容器化方法简化生物信息学工作流程。
IF 2.4
Bioinformatics advances Pub Date : 2025-04-04 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf074
Fathima Nuzla Ismail, Shanika Amarasoma
{"title":"Mars: simplifying bioinformatics workflows through a containerized approach to tool integration and management.","authors":"Fathima Nuzla Ismail, Shanika Amarasoma","doi":"10.1093/bioadv/vbaf074","DOIUrl":"10.1093/bioadv/vbaf074","url":null,"abstract":"<p><strong>Summary: </strong>Bioinformatics is a rapidly evolving field with numerous specialized tools developed for essential genomic analysis tasks, such as read simulation, mapping, and variant calling. However, managing these tools presents significant challenges due to varied dependencies, execution steps, and output formats, complicating the installation and configuration processes. To address these issues, we introduce \"Mars\" a bioinformatics solution encapsulated within a singularity container that preloads a comprehensive suite of widely used genomic tools. Mars not only simplifies the installation of these tools but also automates critical workflow functions, including sequence sample preparation, read simulation, read mapping, variant calling, and result comparison. By streamlining the execution of these workflows, Mars enables users to easily manage input-output formats and compare results across different tools, thereby enhancing reproducibility and efficiency. Furthermore, by providing a cohesive environment that integrates tool management with a flexible workflow interface, Mars empowers researchers to focus on their analyses rather than the complexities of tool configuration. This integrated solution facilitates the testing of various combinations of tools and algorithms, enabling users to evaluate performance based on different metrics and identify the optimal tools for their specific genomic analysis needs. Through Mars, we aim to enhance the accessibility and usability of bioinformatics tools, ultimately advancing research in genomic analysis.</p><p><strong>Availability and implementation: </strong>Mars is freely available at https://github.com/GenomicAI/mars. It is implemented within a Singularity container environment and supports modular extension for additional genomic tools and custom workflows.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf074"},"PeriodicalIF":2.4,"publicationDate":"2025-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12095131/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144129665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CellRomeR: an R package for clustering cell migration phenotypes from microscopy data. CellRomeR:一个R包,用于从显微镜数据中聚集细胞迁移表型。
IF 2.4
Bioinformatics advances Pub Date : 2025-04-04 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf069
Iivari Kleino, Mats Perk, António G G Sousa, Markus Linden, Julia Mathlin, Daniel Giesel, Paulina Frolovaite, Sami Pietilä, Sini Junttila, Tomi Suomi, Laura L Elo
{"title":"CellRomeR: an R package for clustering cell migration phenotypes from microscopy data.","authors":"Iivari Kleino, Mats Perk, António G G Sousa, Markus Linden, Julia Mathlin, Daniel Giesel, Paulina Frolovaite, Sami Pietilä, Sini Junttila, Tomi Suomi, Laura L Elo","doi":"10.1093/bioadv/vbaf069","DOIUrl":"https://doi.org/10.1093/bioadv/vbaf069","url":null,"abstract":"<p><strong>Motivation: </strong>The analysis of cell migration using time-lapse microscopy typically focuses on track characteristics for classification and statistical evaluation of migration behaviour. However, considerable heterogeneity can be seen in cell morphology and microscope signal intensity features within the migrating cell populations.</p><p><strong>Results: </strong>To utilize this information in cell migration analysis, we introduce here an R package CellRomeR, designed for the phenotypic clustering of cells based on their morphological and motility features from microscopy images. Utilizing machine learning techniques and building on an iterative clustering projection method, CellRomeR offers a new approach to identify heterogeneity in cell populations. The clustering of cells along the migration tracks allows association of distinct cellular phenotypes with different cell migration types and detection of migration patterns associated with stable and unstable cell phenotypes. The user-friendly interface of CellRomeR and multiple visualization options facilitate an in-depth understanding of cellular behaviour, addressing previous challenges in clustering cell trajectories using microscope cell tracking data.</p><p><strong>Availability and implementation: </strong>CellRomeR is available as an R package from https://github.com/elolab/CellRomeR.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf069"},"PeriodicalIF":2.4,"publicationDate":"2025-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12052403/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144057909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhanced metabolomic predictions using concept drift analysis: identification and correction of confounding factors. 利用概念漂移分析增强代谢组学预测:混杂因素的识别和校正。
IF 2.4
Bioinformatics advances Pub Date : 2025-04-04 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf073
Jana Schwarzerova, Dominika Olesova, Katerina Jureckova, Ales Kvasnicka, Ales Kostoval, David Friedecky, Jiri Sekora, Jitka Pomenkova, Valentyna Provaznik, Lubos Popelinsky, Wolfram Weckwerth
{"title":"Enhanced metabolomic predictions using concept drift analysis: identification and correction of confounding factors.","authors":"Jana Schwarzerova, Dominika Olesova, Katerina Jureckova, Ales Kvasnicka, Ales Kostoval, David Friedecky, Jiri Sekora, Jitka Pomenkova, Valentyna Provaznik, Lubos Popelinsky, Wolfram Weckwerth","doi":"10.1093/bioadv/vbaf073","DOIUrl":"https://doi.org/10.1093/bioadv/vbaf073","url":null,"abstract":"<p><strong>Motivation: </strong>The increasing use of big data and optimized prediction methods in metabolomics requires techniques aligned with biological assumptions to improve early symptom diagnosis. One major challenge in predictive data analysis is handling confounding factors-variables influencing predictions but not directly included in the analysis.</p><p><strong>Results: </strong>Detecting and correcting confounding factors enhances prediction accuracy, reducing false negatives that contribute to diagnostic errors. This study reviews concept drift detection methods in metabolomic predictions and selects the most appropriate ones. We introduce a new implementation of concept drift analysis in predictive classifiers using metabolomics data. Known confounding factors were confirmed, validating our approach and aligning it with conventional methods. Additionally, we identified potential confounding factors that may influence biomarker analysis, which could introduce bias and impact model performance.</p><p><strong>Availability and implementation: </strong>Based on biological assumptions supported by detected concept drift, these confounding factors were incorporated into correction of prediction algorithms to enhance their accuracy. The proposed methodology has been implemented in Semi-Automated Pipeline using Concept Drift Analysis for improving Metabolomic Predictions (SAPCDAMP), an open-source workflow available at https://github.com/JanaSchwarzerova/SAPCDAMP.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf073"},"PeriodicalIF":2.4,"publicationDate":"2025-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12037104/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144047317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HuTAge: a comprehensive human tissue- and cell-specific ageing signature atlas. HuTAge:一个全面的人体组织和细胞特异性衰老签名图谱。
IF 2.4
Bioinformatics advances Pub Date : 2025-04-03 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf072
Koichi Himori, Zhang Bingyuan, Kazuki Hatta, Yusuke Matsui
{"title":"HuTAge: a comprehensive human tissue- and cell-specific ageing signature atlas.","authors":"Koichi Himori, Zhang Bingyuan, Kazuki Hatta, Yusuke Matsui","doi":"10.1093/bioadv/vbaf072","DOIUrl":"https://doi.org/10.1093/bioadv/vbaf072","url":null,"abstract":"<p><strong>Summary: </strong>Ageing is a complex process that involves interorgan and intercellular interactions. To obtain a clear understanding of ageing, cross-tissue single-cell data resources are required. However, a complete resource for humans is not available. To bridge this gap, we developed HuTAge, a comprehensive resource that integrates cross-tissue age-related information from The Genotype-Tissue Expression project with cross-tissue single-cell information from Tabula Sapiens to provide human tissue- and cell-specific ageing molecular information.</p><p><strong>Availability and implementation: </strong>HuTAge is implemented within an R Shiny application and can be freely accessed at https://igcore.cloud/GerOmics/HuTAge/home. The source code is available at https://github.com/matsui-lab/HuTAge.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf072"},"PeriodicalIF":2.4,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12005899/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144059526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
mirrorCheck: an R package facilitating informed use of DESeq2's lfcShrink() function for differential gene expression analysis of clinical samples. mirrorCheck:一个R包,方便使用DESeq2的lfcShrink()函数对临床样本进行差异基因表达分析。
IF 2.4
Bioinformatics advances Pub Date : 2025-04-02 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf070
Katherine Elise Scull, Kiarash Behrouzfar, Daniella Brasacchio, Enid Yi Ni Lam, Dineika Chandrananda, Paul Yeh
{"title":"mirrorCheck: an R package facilitating informed use of DESeq2's lfcShrink() function for differential gene expression analysis of clinical samples.","authors":"Katherine Elise Scull, Kiarash Behrouzfar, Daniella Brasacchio, Enid Yi Ni Lam, Dineika Chandrananda, Paul Yeh","doi":"10.1093/bioadv/vbaf070","DOIUrl":"10.1093/bioadv/vbaf070","url":null,"abstract":"<p><strong>Motivation: </strong>The sophisticated lfcShrink() function implemented in the DESeq2 package for differential gene expression analysis aims to reduce noise from low read count and/or highly variable genes in bulk RNA-sequencing data, thus circumventing the need for arbitrary filtering thresholds. However, difficulties can arise when analysing clinical data with multiple biologically-relevant groupings. In particular, changing the reference group can dramatically alter the ranking of differentially expressed genes, instead of merely 'mirroring' the up- and down-regulated genes in reciprocal comparisons.</p><p><strong>Results: </strong>Here, we present mirrorCheck, an R package to automate methodical lfcShrink() usage and data visualization for quality control and data-driven decision-making during analysis.</p><p><strong>Availability and implementation: </strong>The source code, including documentation, is available on github at https://github.com/kescull/mirrorCheck.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf070"},"PeriodicalIF":2.4,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12089695/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144112845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transfer learning improves performance in volumetric electron microscopy organelle segmentation across tissues. 迁移学习提高了容量电镜细胞器跨组织分割的性能。
IF 2.4
Bioinformatics advances Pub Date : 2025-04-02 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf021
Ronald Xie, Ben Mulcahy, Ali Darbandi, Sagar Marwah, Fez Ali, Yuna Lee, Gunes Parlakgul, Gokhan S Hotamisligil, Bo Wang, Sonya MacParland, Mei Zhen, Gary D Bader
{"title":"Transfer learning improves performance in volumetric electron microscopy organelle segmentation across tissues.","authors":"Ronald Xie, Ben Mulcahy, Ali Darbandi, Sagar Marwah, Fez Ali, Yuna Lee, Gunes Parlakgul, Gokhan S Hotamisligil, Bo Wang, Sonya MacParland, Mei Zhen, Gary D Bader","doi":"10.1093/bioadv/vbaf021","DOIUrl":"10.1093/bioadv/vbaf021","url":null,"abstract":"<p><strong>Motivation: </strong>Volumetric electron microscopy (VEM) enables nanoscale resolution three-dimensional imaging of biological samples. Identification and labeling of organelles, cells, and other structures in the image volume is required for image interpretation, but manual labeling is extremely time-consuming. This can be automated using deep learning segmentation algorithms, but these traditionally require substantial manual annotation for training and typically these labeled datasets are unavailable for new samples.</p><p><strong>Results: </strong>We show that transfer learning can help address this challenge. By pretraining on VEM data from multiple mammalian tissues and organelle types and then fine-tuning on a target dataset, we segment multiple organelles at high performance, yet require a relatively small amount of new training data. We benchmark our method on three published VEM datasets and a new rat liver dataset we imaged over a 56×56×11 <math><mi>μ</mi></math> m volume measuring 7000×7000×219 px using serial block face scanning electron microscopy with corresponding manually labeled mitochondria and endoplasmic reticulum structures. We further benchmark our approach against the Segment Anything Model 2 and MitoNet in zero-shot, prompted, and fine-tuned settings.</p><p><strong>Availability and implementation: </strong>Our rat liver dataset's raw image volume, manual ground truth annotation, and model predictions are freely shared at github.com/Xrioen/cross-tissue-transfer-learning-in-VEM.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf021"},"PeriodicalIF":2.4,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11974384/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143804970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
cOmicsArt-a customizable Omics Analysis and reporting tool. comicsart -可定制组学分析和报告工具。
IF 2.4
Bioinformatics advances Pub Date : 2025-04-01 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf067
Lea Seep, Paul Jonas Jost, Clivia Lisowski, Hao Huang, Stephan Grein, Hildigunnur Hermannsdottir, Katharina Kuellmer, Tobias Fromme, Martin Klingenspor, Elvira Mass, Christian Kurts, Jan Hasenauer
{"title":"cOmicsArt-a customizable Omics Analysis and reporting tool.","authors":"Lea Seep, Paul Jonas Jost, Clivia Lisowski, Hao Huang, Stephan Grein, Hildigunnur Hermannsdottir, Katharina Kuellmer, Tobias Fromme, Martin Klingenspor, Elvira Mass, Christian Kurts, Jan Hasenauer","doi":"10.1093/bioadv/vbaf067","DOIUrl":"10.1093/bioadv/vbaf067","url":null,"abstract":"<p><strong>Motivation: </strong>The availability of bulk-omic data is steadily increasing, necessitating collaborative efforts between experimental and computational researchers. While software tools with graphical user interfaces (GUIs) enable rapid and interactive data assessment, they are limited to pre-implemented methods, often requiring transitions to custom code for further adjustments. However, the most available tools lack GUI-independent reproducibility such as direct integration with R, resulting in very limited support for transition.</p><p><strong>Results: </strong>We introduce the customizable Omics Analysis and reporting tool-cOmicsArt. cOmicsArt aims to enhance collaboration through integration of GUI-based analysis with R. The GUI allows researchers to perform user-friendly exploratory and statistical analyses with interactive visualizations and automatic documentation. Downloadable R scripts and results ensure reproducibility and seamless integration with R, supporting both novice and experienced programmers by enabling easy customizations and serving as a foundation for more advanced analyses. This versatility also allows for usage in educational settings guiding students from GUI-based analysis to R Code.</p><p><strong>Availability and implementation: </strong>cOmicsArt is freely available at https://shiny.iaas.uni-bonn.de/cOmicsArt/. User documentation is available at https://icb-dcm.github.io/cOmicsArt/. Source code is available at https://github.com/ICB-DCM/cOmicsArt. A docker available from https://hub.docker.com/r/pauljonasjost/comicsart/tags. A snapshot upon publication available from https://zenodo.org/records/14907620. A screen recording of cOmicsArt is available at: https://www.youtube.com/watch?v=pTGjtIYQOakp.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf067"},"PeriodicalIF":2.4,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12085238/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144095979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluation of search-enabled pretrained Large Language Models on retrieval tasks for the PubChem database. 对PubChem数据库检索任务中支持搜索的预训练大型语言模型的评估。
IF 2.4
Bioinformatics advances Pub Date : 2025-03-24 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf064
Ash Sze, Soha Hassoun
{"title":"Evaluation of search-enabled pretrained Large Language Models on retrieval tasks for the PubChem database.","authors":"Ash Sze, Soha Hassoun","doi":"10.1093/bioadv/vbaf064","DOIUrl":"10.1093/bioadv/vbaf064","url":null,"abstract":"<p><strong>Motivation: </strong>Databases are indispensable in biological and biomedical research, hosting vast amounts of structured and unstructured data, facilitating the organization, retrieval, and analysis of complex data. Database access, however, remains a manual, tedious, and sometimes overwhelming, task. The availability of Large Language Models (LLMs) has the potential to play a transformative role in accessing databases.</p><p><strong>Results: </strong>We investigate in this study the current state of using a pretrained, search-enabled LLMs (ChatGPT-4o), for data retrieval from PubChem, a flagship database that plays a critical role in biological and biomedical research. We evaluate eight PubChem access protocols that were previously documented. We develop a methodology for adopting the protocols into an LLM-prompt, where we supplement the prompt with additional context through iterative prompt refinement as needed. To further evaluate the LLM capabilities, we instruct the LLM to perform the retrieval. We quantitatively and qualitatively show that instructing ChatGPT-4o to generate programmatic access is more likely to yield the correct answers. We provide insightful future directions in developing LLMs for database access.</p><p><strong>Availability and implementation: </strong>All text used to prompt ChatGPT-4o is provided in the manuscript.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf064"},"PeriodicalIF":2.4,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12073969/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144042362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信