BMC Bioinformatics最新文献_第10页

Universal multilayer network embedding reveals a causal link between GABA neurotransmitter and cancer. 通用多层网络嵌入揭示了GABA神经递质与癌症之间的因果关系。

IF 2.9 3区生物学

BMC Bioinformatics Pub Date : 2025-06-02 DOI: 10.1186/s12859-025-06158-5

Léo Pio-Lopez, Michael Levin

{"title":"Universal multilayer network embedding reveals a causal link between GABA neurotransmitter and cancer.","authors":"Léo Pio-Lopez, Michael Levin","doi":"10.1186/s12859-025-06158-5","DOIUrl":"10.1186/s12859-025-06158-5","url":null,"abstract":"Background: The volume and complexity of biological data have significantly increased in recent years, often represented as network models continue to increase at a rapid pace. However, drug discovery in the context of complex phenotypes are hampered by the difficulties inherent in producing machine learning algorithms that can integrate molecular-genetic, biochemical, physiological, and other diverse datasets. Recent developments have expanded network analysis techniques, such as network embedding, to effectively explore multilayer network structures. Multilayer networks, which incorporate various nodes and connections in formats such as multiplex, heterogeneous, and bipartite networks, provide an effective framework for merging diverse and multi-scale biological data sources. However, current network embedding methods face challenges and limitations in addressing the heterogeneity and diversity of these networks. Therefore, there is an essential need for the development of new network embedding methods to manage the complexity and diversity of multi-omics biological information effectively.Results: Here, we report a universal multilayer network embedding method MultiXVERSE, which is to the best of our knowledge the first one capable of handling any kind of multilayer network. We applied it to a molecular-drug-disease multiplex-heterogeneous network. Our model made new predictions about a link between GABA and cancer that we verified experimentally in the Xenopus laevis model.Conclusions: The development of MultiXVERSE represents a significant advancement in the integration and analysis of multilayer networks for biological research. By providing a universal, scalable framework for multilayer network embedding, MultiXVERSE enables the systematic exploration of molecular and phenotypic interactions across diverse biological contexts. Our experimental validation of the predicted link between GABA and cancer using Xenopus laevis underscores its capability to generate biologically meaningful hypotheses and accelerate breakthroughs in multi-omics research. Future directions include applying MultiXVERSE to additional multi-omics datasets and integrating it with high-throughput experimental pipelines for systematic hypothesis generation and validation, particularly in drug discovery. Beyond its biological applications, MultiXVERSE is a versatile tool that can be utilized for analyzing multilayer networks in a wide range of fields, including social sciences and other complex systems. By offering a universal framework, MultiXVERSE paves the way for novel insights and interdisciplinary collaborations in multilayer network research.","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"149"},"PeriodicalIF":2.9,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12131449/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144207564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Blastn2dotplots: multiple dot-plot visualizer for genome comparisons. Blastn2dotplots：用于基因组比较的多点图可视化工具。

IF 2.9 3区生物学

BMC Bioinformatics Pub Date : 2025-06-02 DOI: 10.1186/s12859-025-06175-4

Miki Okuno, Takeshi Yamamoto, Yoshitoshi Ogura

{"title":"Blastn2dotplots: multiple dot-plot visualizer for genome comparisons.","authors":"Miki Okuno, Takeshi Yamamoto, Yoshitoshi Ogura","doi":"10.1186/s12859-025-06175-4","DOIUrl":"10.1186/s12859-025-06175-4","url":null,"abstract":"Background: Dot-plots, along with linear comparisons, are fundamental visualization methods in genome comparisons, widely used for analyzing structural variations, repeat regions, and sequence similarities. However, existing tools often have limitations in visualization flexibility, particularly requiring the concatenation of multiple sequences into a single continuous axis. This constraint can make it difficult to apply highlights or user-defined grid lines effectively, reducing interpretability in comparative genomic analyses.Results: We developed blastn2dotplots, a Python 3-based tool that utilizes the Matplotlib library to generate customizable dot-plots from local blastn results. Unlike traditional approaches, blastn2dotplots treats each alignment as a separate subplot, allowing for independent axis labeling, adjustable spacing between plots, and enhanced visualization flexibility. Users can highlight specific regions of interest, apply custom grid lines, and tailor the display to suit different genomic analyses. This tool is particularly useful for chromosomal structure analyses, detection of horizontal gene transfer events, and visualization of repetitive elements, offering an intuitive and adaptable framework for sequence comparison.Conclusions: By addressing key limitations of existing dot-plot visualization tools, blastn2dotplots enhances the clarity and flexibility of comparative genomic analyses. Its ability to handle multiple alignments separately while preserving independent axis control and customization options makes it a valuable resource for a wide range of genomic studies. This tool provides a novel and effective solution for researchers needing precise and adaptable visualization of sequence alignments, thereby maximizing the potential of dot-plots in bioinformatics.","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"146"},"PeriodicalIF":2.9,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12131419/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144207551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Class-balanced negative training sets for improving classifier model predictions of enhancer-promoter interactions. 类平衡负训练集用于改进分类器模型对增强器-启动器相互作用的预测。

IF 2.9 3区生物学

BMC Bioinformatics Pub Date : 2025-06-02 DOI: 10.1186/s12859-025-06171-8

Osamu Maruyama, Tsukasa Koga

{"title":"Class-balanced negative training sets for improving classifier model predictions of enhancer-promoter interactions.","authors":"Osamu Maruyama, Tsukasa Koga","doi":"10.1186/s12859-025-06171-8","DOIUrl":"10.1186/s12859-025-06171-8","url":null,"abstract":"Background: Enhancers regulate gene expression by forming DNA loops, thereby bringing themselves in close proximity to the target gene promoter. The human genome contains hundreds of thousands of enhancers, vastly outnumbering its 20,000-25,000 protein-coding genes, highlighting the importance of enhancer-promoter interactions (EPIs) in gene regulation. Supervised learning models have been developed to predict EPIs, often using experimentally validated interacting enhancer-promoter pairs and artificially generated negative samples. However, the lack of reliable negative samples presents a challenge. Current methods randomly select pairs from unlabeled data, leading to class imbalance and reduced predictive performance. This imbalance, where enhancers and promoters are unevenly distributed between the positive and negative sets, hinders classifiers from learning meaningful patterns. Therefore, constructing more reliable negative samples is crucial for improving the accuracy of EPI predictions.Results: We developed two methods to generate class-balanced negative training sets for EPI classifiers: one based on maximum flow and the other on Gibbs sampling. We evaluated these methods with the TargetFinder and TransEPI classifiers across five and six cell lines, respectively. The trained models were tested using a common negative test set. Our negative training sets significantly improved the prediction performance across several metrics, including precision, recall, and area under the receiver operating characteristic curve.Conclusions: Our findings demonstrate that carefully designed negative samples can enhance the performance of EPI classifiers. Further advanced methods in generating negative EPIs should further improve prediction accuracy. The source code is available at https://github.com/maruyama-lab-design/CBOEP2 .","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"145"},"PeriodicalIF":2.9,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12131720/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144207552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Efficient construction of Markov state models for stochastic gene regulatory networks by domain decomposition. 基于区域分解的随机基因调控网络马尔可夫状态模型的高效构建。

IF 2.9 3区生物学

BMC Bioinformatics Pub Date : 2025-06-02 DOI: 10.1186/s12859-025-06174-5

Maryam Yousefian, Anna-Simone Frank, Marcus Weber, Susanna Röblitz

{"title":"Efficient construction of Markov state models for stochastic gene regulatory networks by domain decomposition.","authors":"Maryam Yousefian, Anna-Simone Frank, Marcus Weber, Susanna Röblitz","doi":"10.1186/s12859-025-06174-5","DOIUrl":"10.1186/s12859-025-06174-5","url":null,"abstract":"Background: The dynamics of many gene regulatory networks (GRNs) is characterized by the occurrence of metastable phenotypes and stochastic phenotype switches. The chemical master equation (CME) is the most accurate description to model such stochastic dynamics, whereby the long-time dynamics of the system is encoded in the spectral properties of the CME operator. Markov State Models (MSMs) provide a general framework for analyzing and visualizing stochastic multistability and state transitions based on these spectral properties. Until now, however, this approach is either limited to low-dimensional systems or requires the use of high-performance computing facilities, thus limiting its usability.Results: We present a domain decomposition approach (DDA) that approximates the CME by a stochastic rate matrix on a discretized state space and projects the multistable dynamics to a lower dimensional MSM. To approximate the CME, we decompose the state space via a Voronoi tessellation and estimate transition probabilities by using adaptive sampling strategies. We apply the robust Perron cluster analysis (PCCA+) to construct the final MSM. Measures for uncertainty quantification are incorporated. As a proof of concept, we run the algorithm on a single PC and apply it to two GRN models, one for the genetic toggle switch and one describing macrophage polarization. By comparing the results with reference solutions, we demonstrate that our approach correctly identifies the number and location of metastable phenotypes with adequate accuracy and uncertainty bounds. We show that accuracy mainly depends on the total number of Voronoi cells, whereas uncertainty is determined by the number of sampling points.Conclusions: A DDA enables the efficient computation of MSMs with quantified uncertainty. Since the algorithm is trivially parallelizable, it can be applied to larger systems, which will inevitably lead to new insights into cell-regulatory dynamics.","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"147"},"PeriodicalIF":2.9,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12131593/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144207553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SC2Spa: a deep learning based approach to map transcriptome to spatial origins at cellular resolution. SC2Spa：一种基于深度学习的方法，以细胞分辨率将转录组映射到空间起源。

IF 2.9 3区生物学

BMC Bioinformatics Pub Date : 2025-06-02 DOI: 10.1186/s12859-025-06173-6

Linbu Liao, Esha Madan, António M Palma, Hyobin Kim, Amit Kumar, Praveen Bhoopathi, Robert Winn, Jose Trevino, Paul Fisher, Cord Herbert Brakebusch, Gahyun Kim, Junil Kim, Rajan Gogna, Kyoung Jae Won

{"title":"SC2Spa: a deep learning based approach to map transcriptome to spatial origins at cellular resolution.","authors":"Linbu Liao, Esha Madan, António M Palma, Hyobin Kim, Amit Kumar, Praveen Bhoopathi, Robert Winn, Jose Trevino, Paul Fisher, Cord Herbert Brakebusch, Gahyun Kim, Junil Kim, Rajan Gogna, Kyoung Jae Won","doi":"10.1186/s12859-025-06173-6","DOIUrl":"10.1186/s12859-025-06173-6","url":null,"abstract":"Background: Understanding cellular heterogeneity within tissues hinges on knowledge of their spatial context. However, it is still challenging to accurately map cells to their spatial coordinates.Results: We present SC2Spa, a deep learning-based approach that learns intricate spatial relationships from spatial transcriptomics (ST) data. Benchmarking tests show that SC2Spa outperformed other predictors and accurately detected tissue architecture from transcriptome. SC2Spa successfully mapped single cell RNA sequencing (scRNA-seq) to Visium assay, providing an approach to enhance the resolution for low resolution ST data. Our test showed that SC2Spa performs well for various ST technologies and robust to spatial resolution. In addition, SC2Spa can suggest spatially variable genes that cannot be identified from previous approaches.Conclusions: SC2Spa is a robust and accurate approach to provide single cells with their spatial location and identify spatially meaningful genes.","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"148"},"PeriodicalIF":2.9,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12131412/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144207555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MultiVis.js: a software tool for the visualization of multiway chromatin interactions and SPRITE data. 一个可视化多路染色质相互作用和SPRITE数据的软件工具。

IF 2.9 3区生物学

BMC Bioinformatics Pub Date : 2025-06-02 DOI: 10.1186/s12859-025-06176-3

Jes Hui Min Kwek, Melissa Jane Fullwood

引用次数: 0

One-sample missing DNA-methylation value imputation. 单样本缺失dna甲基化值估算。

IF 2.9 3区生物学

BMC Bioinformatics Pub Date : 2025-05-31 DOI: 10.1186/s12859-025-06154-9

Christelle Kemda Ngueda, Julia Palm, Flavia Remo, André Scherag, Lutz Leistritz

{"title":"One-sample missing DNA-methylation value imputation.","authors":"Christelle Kemda Ngueda, Julia Palm, Flavia Remo, André Scherag, Lutz Leistritz","doi":"10.1186/s12859-025-06154-9","DOIUrl":"10.1186/s12859-025-06154-9","url":null,"abstract":"Background: Currently, the most popular methods for missing DNA-methylation value imputation rely on exploiting methylation patterns across multiple samples from the same population. However, if there is significant variability between individuals or limited data available, these methods might produce biased results. This situation has prompted researchers to seek alternative approaches for handling single-sample data, particularly in the context of personalized medicine. Accordingly, we propose One-Sample Methyl Imputation (OSMI), an imputation method that can also be used in single-sample applications.Results: The proposed method in single-subject cases yielded an average imputation accuracy of RMSE = 0.2713 (95%-CI from 0.2696 to 0.2730) in β-value units (range: 0-1) based on real 450 K BeadChip data sets of 3,402 individuals. It is possible to take the affiliation of individual CpGs to CpG islands into account during the imputation of missing methylation values. This improves the imputation accuracy. In addition, the accuracy of imputation depends in general on the density of CpG sites on DNA-methylation microarrays and increases as the CpG site density increases. OSMI has low memory and computational requirements.Conclusions: OSMI uses a single methylome to impute missing values quickly at very low memory constraints. Its imputation accuracy is inferior to other methods if multiple samples are available and these samples are reasonably similar, but OSMI represents a useful addition to the imputation toolbox for the case of single-sample applications.","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"143"},"PeriodicalIF":2.9,"publicationDate":"2025-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12126866/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144191437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MAFcounter: an efficient tool for counting the occurrences of k-mers in MAF files. MAFcounter：一个计算在MAF文件中k-mers出现次数的有效工具。

IF 2.9 3区生物学

BMC Bioinformatics Pub Date : 2025-05-30 DOI: 10.1186/s12859-025-06172-7

Michail Patsakis, Kimonas Provatas, Aris Karatzikos, Charalampos Koilakos, Ioannis Mouratidis, Ilias Georgakopoulos-Soares

{"title":"MAFcounter: an efficient tool for counting the occurrences of k-mers in MAF files.","authors":"Michail Patsakis, Kimonas Provatas, Aris Karatzikos, Charalampos Koilakos, Ioannis Mouratidis, Ilias Georgakopoulos-Soares","doi":"10.1186/s12859-025-06172-7","DOIUrl":"10.1186/s12859-025-06172-7","url":null,"abstract":"Motivation: With the rapid expansion of large-scale biological datasets, DNA and protein sequence alignments have become essential for comparative genomics and proteomics. These alignments facilitate the exploration of sequence similarity patterns, providing valuable insights into sequence conservation, evolutionary relationships and for functional analyses. Typically, sequence alignments are stored in formats such as the Multiple Alignment Format (MAF). Counting k-mer occurrences is a crucial task in many computational biology applications, but currently, there is no algorithm designed for k-mer counting in alignment files.Results: We have developed MAFcounter, the first k-mer counter dedicated to alignment files. MAFcounter is multithreaded, fast, and memory efficient, enabling k-mer counting in DNA and protein sequence alignment files with a wide variety of features for k-mer analysis.Availability: MAFcounter is released under GPL license as a suite of binary C++ applications and is available at: https://github.com/Georgakopoulos-Soares-lab/MAFcounter .","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"142"},"PeriodicalIF":2.9,"publicationDate":"2025-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12125892/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144186384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

RSCUcaller: an R package for analyzing differences in relative synonymous codon usage (RSCU). RSCUcaller：一个用于分析相对同义密码子使用差异的R包。

IF 2.9 3区生物学

BMC Bioinformatics Pub Date : 2025-05-29 DOI: 10.1186/s12859-025-06166-5

Mateusz Maździarz, Sebastian Zając, Łukasz Paukszto, Jakub Sawicki

{"title":"RSCUcaller: an R package for analyzing differences in relative synonymous codon usage (RSCU).","authors":"Mateusz Maździarz, Sebastian Zając, Łukasz Paukszto, Jakub Sawicki","doi":"10.1186/s12859-025-06166-5","DOIUrl":"10.1186/s12859-025-06166-5","url":null,"abstract":"Background: Synonymous codon usage bias, a significant factor in gene expression and genome evolution, was extensively studied in genomics and molecular biology. Although the genetic code is universal, significant variations in synonymous codon usage have been observed among and within organisms. This bias was linked to various factors, including gene expression levels, tRNA abundance, protein structure, and environmental adaptation. Relative Synonymous Codon Usage (RSCU), a normalized measure, was used to quantify this bias. By analyzing RSCU values, researchers uncovered patterns and trends related to the underlying mechanisms driving codon usage bias.Results: We present an R package named RSCUcaller designed for the analysis of coding nucleotide sequences at the level of relative synonymous codon usage (RSCU). The package enables both visualization of data and the performance of advanced statistical analyses. RSCUcaller accepts as input a multi-fasta file containing coding sequences (CDS) and an accompanying description table. Alternatively, the user may provide separate fasta files for each sequence along with the corresponding table. The program merges the provided sequences and calculates RSCU values for each. Implemented visualization features include creating heatmaps and dendrograms based on these heatmaps. Furthermore, the package allows for the presentation of data in the form of histograms. The calculated RSCU values can also be used to create matrices that can be subjected to further analysis by the user. RSCUcaller offers the functionality of correlation analysis between any two organisms. Additionally, to compare the frequency of amino acid occurrence between different groups of sequences, statistical tests have been implemented.Conclusions: RSCUcaller enabled comparative RSCU analysis between coding sequences of different organisms or individuals of the same species. It facilitated visualization and statistical analysis among codons and user-defined groups. The RSCUcaller package is available at https://github.com/Mordziarz/RSCUcaller under the GPL-3 license.","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"141"},"PeriodicalIF":2.9,"publicationDate":"2025-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12121200/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144179585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Valsci: an open-source, self-hostable literature review utility for automated large-batch scientific claim verification using large language models. Valsci：一个开源的、自托管的文献回顾工具，用于使用大型语言模型进行自动化的大批量科学声明验证。

IF 2.9 3区生物学

BMC Bioinformatics Pub Date : 2025-05-28 DOI: 10.1186/s12859-025-06159-4

Brice Edelman, Jeffrey Skolnick

{"title":"Valsci: an open-source, self-hostable literature review utility for automated large-batch scientific claim verification using large language models.","authors":"Brice Edelman, Jeffrey Skolnick","doi":"10.1186/s12859-025-06159-4","DOIUrl":"10.1186/s12859-025-06159-4","url":null,"abstract":"Background: The exponential growth of scientific publications poses a formidable challenge for researchers seeking to validate emerging hypotheses or synthesize existing evidence. In this paper, we introduce Valsci, an open-source, self-hostable utility that automates large-batch scientific claim verification using any OpenAI-compatible large language model. Valsci unites retrieval-augmented generation with structured bibliometric scoring and chain-of-thought prompting, enabling users to efficiently search, evaluate, and summarize evidence from the Semantic Scholar database and other academic sources. Unlike conventional standalone LLMs, which often suffer from hallucinations and unreliable citations, Valsci grounds its analyses in verifiable published findings. A guided prompt-flow approach is employed to generate query expansions, retrieve relevant excerpts, and synthesize coherent, evidence-based reports.Results: Preliminary evaluations across claims from the SciFact benchmark dataset reveal that Valsci significantly outperforms base GPT-4o outputs in citation hallucination rate while maintaining a low misclassification rate. The system is highly scalable, processing hundreds of claims per hour through asynchronous parallelization.Conclusions: By providing an open and transparent platform for large-batch literature verification, Valsci substantially lowers the barrier to comprehensive evidence-based reviews and fosters a more reproducible research ecosystem.","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"140"},"PeriodicalIF":2.9,"publicationDate":"2025-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12121171/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144172432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0