Zeynab Bayat, Faezeh Mahdian-Khoo, Lida Samie, Amir Taherkhani
{"title":"A bioinformatics pipeline for the design of a SART3-targeted cancer vaccine with enhanced immunogenicity.","authors":"Zeynab Bayat, Faezeh Mahdian-Khoo, Lida Samie, Amir Taherkhani","doi":"10.1186/s44342-026-00068-5","DOIUrl":"https://doi.org/10.1186/s44342-026-00068-5","url":null,"abstract":"<p><strong>Background and objectives: </strong>Squamous cell carcinoma antigen recognized by T-cells 3 (SART3) has emerged as a promising target for cancer immunotherapy, given its overexpression in various malignancies and low or absent expression in non-tumorous tissues. This study aimed to design rationally and in silico evaluate a multi-epitope T cell vaccine targeting SART3, incorporating a TLR4 agonist adjuvant. The vaccine's predicted immunogenicity, physicochemical properties, structural stability, and interaction with TLR4 were comprehensively assessed. Additional assessments of cytokine-inducing potential, B-cell epitopes, and disulfide engineering opportunities were also executed.</p><p><strong>Methods: </strong>Potential T-cell epitopes from SART3 were identified using IEDB and screened for antigenicity (VaxiJen), toxicity (ToxinPred), and MHC-I/II binding affinity. Cytokine-inducing epitopes were evaluated using IL4pred, IL-10Pred, and IFNepitope servers. B-cell epitopes were predicted using ElliPro. The vaccine underwent comprehensive physicochemical, structural (I-TASSER/GalaxyRefine), molecular docking (HDOCK), molecular dynamics simulations, and disulfide engineering (Disulfide by Design 2.0) analyses.</p><p><strong>Results: </strong>The optimized 344-residue vaccine demonstrated non-allergenicity, high stability (instability index 17.16), antigenicity (Vaxijen 0.67), and solubility (SOLpro 0.96). HDOCK predicted favorable vaccine-TLR4 binding (ΔG = - 265.61 kcal/mol, confidence 91%). MD simulations confirmed complex stability. Cytokine analysis revealed the potential to induce IL-4 and IL-10. The Val80-Ala123 pair exhibited the lowest bond energy (1.16 kcal/mol), indicating the optimal geometry for disulfide bond formation. The in silico immune simulations demonstrated a robust immune response following vaccine administration.</p><p><strong>Conclusion: </strong>This rationally designed SART3-targeted multi-epitope vaccine exhibits promising in silico characteristics across immunogenicity, physicochemical, cytokine-inducing, B-cell epitope, structural, and disulfide engineering profiles, warranting experimental validation for cancer immunotherapy development.</p>","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":"24 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13135266/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147825309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Measuring the gap: correlating synthetic-to-real drift with PHI de-identification performance.","authors":"Joseph Cornelius, Fabio Rinaldi","doi":"10.1186/s44342-026-00072-9","DOIUrl":"https://doi.org/10.1186/s44342-026-00072-9","url":null,"abstract":"<p><p>Clinical text de-identification enables the use of electronic health records while protecting patient privacy, but public training data remain scarce and often have mismatched documentation styles. Recent works have proposed using large language models (LLMs) to generate synthetic clinical notes, but it remains unclear if they reflect distributions of real clinical notes. We examine how lexical and semantic drift across training and evaluation corpora affects protected health information (PHI) tagger performance. We generated synthetic notes from scratch for four categories using five generator LLMs and one judge LLM. Next, we fine-tuned small de-identification models on real, synthetic, and mixed corpora, and evaluated them on three external benchmarks under a harmonized label schema. Models trained on broad, clinically oriented sources transfer better than those on legal or narrowly synthetic data. These results suggest that although synthetic data lacks some real-world distributional properties, it remains useful in low-resource settings. We found that compact distributional and embedding-based drift measures moderately correlate with out-of-distribution F1 score, a practically important result because drift estimation can improve synthetic-data quality control and alignment.</p>","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":"24 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2026-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13130710/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147825296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Correction: Towards a transparent and reproducible AI-assisted research paper writing.","authors":"Jeongbin Park","doi":"10.1186/s44342-025-00062-3","DOIUrl":"10.1186/s44342-025-00062-3","url":null,"abstract":"","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":"24 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2026-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13091237/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147719138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tae-Ho Kang, KyoungSoo Ha, WonHyun Kyung, Yong-Jin Jeon, Geun-Hyoung Jo, KangMin Park, KwangHee Lee, Tae-Eun Jin
{"title":"BioOne: a national-scale platform for integrated discovery and utilization of diverse biological resources in South Korea.","authors":"Tae-Ho Kang, KyoungSoo Ha, WonHyun Kyung, Yong-Jin Jeon, Geun-Hyoung Jo, KangMin Park, KwangHee Lee, Tae-Eun Jin","doi":"10.1186/s44342-026-00070-x","DOIUrl":"10.1186/s44342-026-00070-x","url":null,"abstract":"<p><p>The rapid growth of biological resources and associated research outputs has increased the complexity of resource discovery, access, and reuse across heterogeneous repositories. However, fragmented metadata schemas, limited interoperability, and siloed access mechanisms continue to hinder the integrated exploration of biological resources and their associated knowledge. Herein, we present BioOne (Biological resources One-Stop service platform), a unified informatics framework designed to address the fragmentation of national biological resources by integrating heterogeneous metadata from 14 distinct clusters and establishing a seamless resource-to-knowledge pipeline to enhance the discoverability and practical utilization of biological assets.BioOne is a national-scale web-based discovery platform that harmonizes biological resource metadata across 14 domain-specific biological resource clusters in Korea and systematically links these resources with external knowledge objects, including research papers, patents, biological datasets, and disease-drug-target information, through a unified discovery interface. To achieve this, we adopted a standard-aligned metadata integration framework, interoperable identifier mapping, and a modular system architecture to support scalable indexing, cross-domain search, and association-based navigation. By extending conventional catalog-based biological resource databases with an integrated discovery and access layer connecting distributed biorepositories to evidence-oriented knowledge resources, BioOne provides an informatics infrastructure for data-driven discovery, translational research, and coordinated utilization of biological resources at the national scale. The BioOne also offers a transferable implementation model for the large-scale integration of distributed biological resource systems.</p>","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2026-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13130625/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147641147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"VLMs struggle to extract definitions of structural segments from MSA images: toward the design of a human-in-the-loop annotation pipeline for the PDB-Descriptome project.","authors":"Koya Sakuma","doi":"10.1186/s44342-025-00061-4","DOIUrl":"10.1186/s44342-025-00061-4","url":null,"abstract":"<p><p>We investigate the ability of vision-language models (VLMs) to extract structural region definitions from rasterized multiple sequence alignment (MSA) images, as part of the PDB-Descriptome project. Using synthetic MSAs with annotated structural biological entities (SBEs) and structural biological Referring Expressions (SBREs), we evaluate two VLMs, gemini-2.5-flash and gemini-2.5-pro, under naïve and strict prompts. While VLMs perform well in SBRE extraction, they show poor accuracy in defining SBE boundaries. In contrast, a human annotator achieves high boundary precision with slightly lower textual accuracy. These results support a human-in-the-loop pipeline for reliable structure-text annotation.</p>","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2026-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13091262/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147523510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"scCAPReSE: detection of large-scale genomic rearrangements from single-cell Hi-C based on few-shot learning.","authors":"Kyukwang Kim, Chul-Hwan Lee, Inkyung Jung","doi":"10.1186/s44342-026-00069-4","DOIUrl":"10.1186/s44342-026-00069-4","url":null,"abstract":"<p><p>Large-scale genomic rearrangements are prevalent in cancer genomes and can profoundly rewire three-dimensional (3D) genome architecture, leading to aberrant oncogene activation through enhancer hijacking. The rewired 3D organization generates unique chromatin contact signatures, which can be detected using deep learning-based approaches. However, extending such analyses to single-cell resolution, which is critical to delineate clonal heterogeneity in cancer, remains a major challenge, due to the limited number of training sets as single-cell Hi-C techniques are not standardized and only limited datasets are available across different methods. Here, we introduce scCAPReSE, a few-shot learning-based framework that adopts representations from a pre-trained image foundation model, CLIP, to enable robust classification of structural variation (SV) patterns in single-cell Hi-C data. By extracting and fine-tuning base weights from the foundation model, scCAPReSE enables effective training of deep learning classifiers using only a few hundred large-scale SV examples derived from a single cancer cell line while adapting classification tasks to heterogeneous single-cell Hi-C libraries. scCAPReSE achieved over 90% classification accuracy when evaluated on sci-Hi-C datasets. When further applied to scNanoHi-C data from the K562 chronic myeloid leukemia cell line, scCAPReSE correctly identified the Philadelphia chromosome translocation but also revealed substantial cell-to-cell variability in the contribution of SV-mediated chromatin interactions, highlighting previously inaccessible heterogeneity in cancer 3D genome organization. In summary, scCAPReSE provides a broadly applicable and data-efficient framework for detecting SV-driven 3D genome reorganization at single-cell resolution, enabling quantitative dissection of cancer-specific chromatin architecture and clonal heterogeneity. The developed method is freely available at https://github.com/kaistcbfg/CAPReSE.</p>","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2026-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13104498/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147470681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xinzhi Yao, Claire Nédellec, Jingbo Xia, Robert Bossy
{"title":"Consistency-accuracy correlation in hard-prompted LLMs for entity and relation extraction: empirical findings from plant-health data.","authors":"Xinzhi Yao, Claire Nédellec, Jingbo Xia, Robert Bossy","doi":"10.1186/s44342-025-00063-2","DOIUrl":"10.1186/s44342-025-00063-2","url":null,"abstract":"<p><p>As large language models (LLMs) become increasingly popular for information extraction (IE), concerns persist regarding the stability and reliability of their outputs. While accuracy has traditionally been the main evaluation metric, consistency-defined as the stability of model outputs across repeated runs-has recently been proposed as a complementary signal of reliability. In this work, we examine the relationship between accuracy and consistency in hard-prompted generative LLMs applied to entity and relation extraction. We conduct a systematic evaluation using four LLMs (GPT, DeepSeek, Qwen, Kimi) on the EPOP corpus, a plant-health dataset with rich entity types, long-range relations, overlapping relations, and strong argument constraints. To refine the interpretation of consistency, we distinguish between recoverable output variations-those that preserve the meaning of the extracted information-and critical ones that result in semantic errors. Our results show that while some positive correlation between accuracy and consistency exists, it is model-dependent and varies with task complexity. In structured prediction tasks, we show that consistency should be measured at the semantic level, ignoring superficial variations in format or wording. These insights have important implications for using self-consistency as a confidence filter and for designing reliable generative IE pipelines in specialized domains.</p>","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":"24 1","pages":"3"},"PeriodicalIF":0.0,"publicationDate":"2026-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12888769/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146159931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ali Ghulam, Mujeebu Rehman, Huma Fida, Pei-Yu Zhao, Ramsha Noroze, Ye-Chen Qi, Xiao-Long Yu
{"title":"AMP-CapsNet: a multi-view feature fusion approach for antimicrobial peptide prediction using capsule networks.","authors":"Ali Ghulam, Mujeebu Rehman, Huma Fida, Pei-Yu Zhao, Ramsha Noroze, Ye-Chen Qi, Xiao-Long Yu","doi":"10.1186/s44342-026-00067-6","DOIUrl":"10.1186/s44342-026-00067-6","url":null,"abstract":"<p><p>Antimicrobial peptides (AMPs) are universally found in both intracellular and extracellular settings and have significant antibiotic-resistant bacteria are becoming a bigger problem. In medical laboratories, it has shown notable anti-bacterial effectiveness in treating diabetic foot infections and related issues. New medication development frequently targets (AMPs), which are certainly ensuing components of adaptive immune system. The findings of this research employs deep learning to identify antibiotic activity. Numerous computational methods have been established to detect antimicrobial peptides via deep learning algorithms. We introduced a novel deep learning approach called antimicrobial peptides using Capsule Neural Network (AMP-CapsNet) to precisely forecast them and evaluated its efficacy against deep learning and baseline models. AMPs prediction using capsule neural networks, a type of next generation neural network, to build prediction models. Additionally, we utilized Amino Acid Composition (AAC) for effective features encoded method and as well as dipeptide composition (DPC). Every model underwent independent cross-validation and external testing. The findings indicate that the enhanced AMP-CapsNet deep learning model surpassed its counterparts, achieving an accuracy of 97.29% and an AUC score of 98.91% on the test set using with dipeptide Composition (DPC). The proposed AMP-CapsNet demonstrates superior performance of the testing set achieved accuracy 97.29% score with DPC and accuracy 84.42% score with AAC approach. Consequently, the technique we advocate is anticipated to enhance the accuracy of antimicrobial peptide predictions in the future. By producing powerful peptides for medication development and application, this study advances deep learning-based AMP drug discovery approaches. This finding has important ramifications for how biological data is processed and how pharmacology is calculated.</p>","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2026-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12977703/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146138210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mitochondrial transfer in cancer: mechanisms, immune evasion, and therapeutic opportunities.","authors":"Hye In Ka, Hyun Goo Woo","doi":"10.1186/s44342-025-00064-1","DOIUrl":"10.1186/s44342-025-00064-1","url":null,"abstract":"<p><p>Intercellular mitochondrial transfer (MT) is emerging as a transformative communication axis in cancer biology. Intact mitochondria or mitochondrial components can be exchanged between tumor cells, stromal elements, and immune cells via tunneling nanotubes, extracellular vesicles, cell fusion, or phagocytic uptake. This organelle exchange enables metabolic adaptation by restoring OXPHOS (oxidative phosphorylation), increasing ATP production, and enhancing survival in hostile environments. Conversely, tumor cells also hijack mitochondria from cytotoxic lymphocytes thereby undermining immune function and contributing to immune escape and tumor progression. These converging metabolic exchanges fuel immune evasion, metastatic potential, and resistance to chemotherapy, radiation, and immunotherapy. Cutting-edge tracing tools, including mitochondrial reporter proteins and single-cell mitochondrial genome lineage mapping, have uncovered MT events both in vitro and in vivo. Therapeutic strategies designed to block mitochondrial trafficking, inhibit nanotube formation or vesicle uptake, or enhance immune cell mitochondrial resilience hold promise for tumor sensitization and restoration of antitumor immunity. A deeper understanding of MT provides novel insight into cancer metabolism and intercellular communication, offering a foundation for future therapeutic innovation and potential clinical application as both a biomarker and a therapeutic target.</p>","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":" ","pages":"2"},"PeriodicalIF":0.0,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12888364/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145968325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}