Jinkyeong Lee, Jeong-Ih Shin, Woo Young Cho, Kun Taek Park, Yeun-Jun Chung, Seung-Hyun Jung
{"title":"Genomic characteristics of Vibrio vulnificus strains isolated from clinical and environmental sources.","authors":"Jinkyeong Lee, Jeong-Ih Shin, Woo Young Cho, Kun Taek Park, Yeun-Jun Chung, Seung-Hyun Jung","doi":"10.1186/s44342-024-00029-w","DOIUrl":"10.1186/s44342-024-00029-w","url":null,"abstract":"<p><p>Vibrio vulnificus, a gram-negative pathogenic bacterium, transmitted via undercooked seafood or contaminated seawater, causes septicemia and wound infections. In this study, we analyzed 15 clinical and 11 environmental isolates. In total, 20 sequence types (STs), including eight novel STs, were identified. Antibiotic resistance gene analysis commonly detected the cyclic AMP receptor protein (CRP) in both the clinical and environmental isolates. Interestingly, clinical and environmental isolates were non-susceptible to third-generation cephalosporins, such as ceftazidime and cefotaxime, complicating the treatment of V. vulnificus infection. Multiple antibiotic resistance (MAR) index ranged from 0.1 to 0.5, with clinical isolates showing a higher mean MAR index than the environmental isolates, indicating their broader spectrum of resistance. Notable, no quantitative (124.3 vs. 126.5) and qualitative (adherence, antiphagocytosis, and chemotaxis/motility) differences in virulence factors were observed between the environmental and clinical strains. The molecular characteristics identified in this study provide insights into the virulence of V. vulnificus strains in South Korea, highlighting the need for continuous surveillance of antibiotic resistance in emerging V. vulnificus strains.</p>","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":"22 1","pages":"26"},"PeriodicalIF":0.0,"publicationDate":"2024-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11603906/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142741946","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Neuromuscular diseases: genomics-driven advances.","authors":"Anna Cho","doi":"10.1186/s44342-024-00027-y","DOIUrl":"10.1186/s44342-024-00027-y","url":null,"abstract":"<p><p>Neuromuscular diseases (NMDs) are a group of rare disorders characterized by significant genetic and clinical complexity. Advances in genomics have revolutionized both the diagnosis and treatment of NMDs. While fewer than 30 NMDs had known genetic causes before the 1990s, more than 600 have now been identified, largely due to the adoption of next-generation sequencing (NGS) technologies such as whole-exome sequencing (WES) and whole-genome sequencing (WGS). These technologies have enabled more precise and earlier diagnoses, although the genetic complexity of many NMDs continues to pose challenges. Gene therapy has been a transformative breakthrough in the treatment of NMDs. In spinal muscular atrophy (SMA), therapies like nusinersen, onasemnogene abeparvovec, and risdiplam have dramatically improved patient outcomes. Similarly, Duchenne muscular dystrophy (DMD) has seen significant progress, most notably with the FDA approval of delandistrogene moxeparvovec, the first micro-dystrophin gene therapy. Despite these advancements, challenges remain, including the rarity of many NMDs, genetic heterogeneity, and the high costs associated with genomic technologies and therapies. Continued progress in gene therapy, RNA-based therapeutics, and personalized medicine holds promise for further breakthroughs in the management of these debilitating diseases.</p>","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":"22 1","pages":"24"},"PeriodicalIF":0.0,"publicationDate":"2024-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11600827/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142735453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Examining HPO by organ and system to facilitate practical use by clinicians.","authors":"Eisuke Dohi, Terue Takatsuki, Yuka Tateisi, Toyofumi Fujiwara, Yasunori Yamamoto","doi":"10.1186/s44342-024-00024-1","DOIUrl":"10.1186/s44342-024-00024-1","url":null,"abstract":"<p><p>The Human Phenotype Ontology (HPO) is widely used for annotating clinical text data, and sufficient annotation is crucial for the effective utilization of clinical texts. It was known that the use of LLMs can successfully extract symptoms and findings, but cannot annotate them with the HPO. We hypothesized that one of the potential issue for this is the lack of appropriate terms in the HPO. Therefore, during the Biomedical Linked Annotation Hackathon 8 (BLAH8), we attempted the following two tasks in order to grasp the overall picture of HPO. (1) Extract all HPO terms for each of the 23 HPO subclasses (defined as categories) directly under the HPO \"Phenotypic abnormality\" and then (2) search for major attributes in each of 23 categories. We employed LLM for these two tasks related to examining HPO and, at the same time, found that LLM didn't work well without ingenuity for tasks that lacked sentences and context. A manual search for terms within each category revealed that the HPO contains a mix of terms with four major attributes: (1) Disease Name, (2) Condition, (3) Test Data, and (4) Symptoms and Findings. Manual curation showed that the ratio of symptoms and findings varied from 0 to 93.1% across categories. For clinicians, who are end-users of medical terminology including HPO, it is difficult to understand ontologies. However, for good quality ontology is also important for good-quality data, and a clinician's help is essential. It is also important to make the overall picture and limitations of ontologies easy to understand in order to bring out the explanatory power of LLMs and artificial intelligence.</p>","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":"22 1","pages":"23"},"PeriodicalIF":0.0,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11559069/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142635517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Customizing GPT for natural language dialogue interface in database access.","authors":"Jin-Dong Kim, Kousaku Okubo","doi":"10.1186/s44342-024-00020-5","DOIUrl":"10.1186/s44342-024-00020-5","url":null,"abstract":"<p><p>The paper presents Anatomy3DExplorer, a customized ChatGPT designed as a natural language dialogue interface for exploring 3D models of anatomical structures. It illustrates the significant potential of large language models (LLMs) as user-friendly interfaces for database access. Furthermore, it showcases the seamless integration of LLMs and database APIs, within the GPTS framework, offering a promising and straightforward approach.</p>","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":"22 1","pages":"22"},"PeriodicalIF":0.0,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11531191/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142565407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards automated phenotype definition extraction using large language models.","authors":"Ramya Tekumalla, Juan M Banda","doi":"10.1186/s44342-024-00023-2","DOIUrl":"10.1186/s44342-024-00023-2","url":null,"abstract":"<p><p>Electronic phenotyping involves a detailed analysis of both structured and unstructured data, employing rule-based methods, machine learning, natural language processing, and hybrid approaches. Currently, the development of accurate phenotype definitions demands extensive literature reviews and clinical experts, rendering the process time-consuming and inherently unscalable. Large language models offer a promising avenue for automating phenotype definition extraction but come with significant drawbacks, including reliability issues, the tendency to generate non-factual data (\"hallucinations\"), misleading results, and potential harm. To address these challenges, our study embarked on two key objectives: (1) defining a standard evaluation set to ensure large language models outputs are both useful and reliable and (2) evaluating various prompting approaches to extract phenotype definitions from large language models, assessing them with our established evaluation task. Our findings reveal promising results that still require human evaluation and validation for this task. However, enhanced phenotype extraction is possible, reducing the amount of time spent in literature review and evaluation.</p>","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":"22 1","pages":"21"},"PeriodicalIF":0.0,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11529293/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142559986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bioregulatory event extraction using large language models: a case study of rice literature.","authors":"Xinzhi Yao, Zhihan He, Jingbo Xia","doi":"10.1186/s44342-024-00022-3","DOIUrl":"10.1186/s44342-024-00022-3","url":null,"abstract":"<p><p>The extraction of biological regulation events has been a key focus in the field of biomedical nature language processing (BioNLP). However, existing methods often encounter challenges such as cascading errors in text mining pipelines and limitations in topic coverage from the selected corpus. Fortunately, the emergence of large language models (LLMs) presents a potential solution due to their robust semantic understanding and extensive knowledge base. To explore this potential, our project at the Biomedical Linked Annotation Hackathon 8 (BLAH 8) investigates the feasibility of using LLMs to extract biological regulation events. Our findings, based on the analysis of rice literature, demonstrate the promising performance of LLMs in this task, while also highlighting several concerns that must be addressed in future LLM-based application in low-resource topic.</p>","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":"22 1","pages":"20"},"PeriodicalIF":0.0,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11529424/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142560352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fast and accurate short-read alignment with hybrid hash-tree data structure.","authors":"Junichiro Makino, Toshikazu Ebisuzaki, Ryutaro Himeno, Yoshihide Hayashizaki","doi":"10.1186/s44342-024-00012-5","DOIUrl":"10.1186/s44342-024-00012-5","url":null,"abstract":"<p><p>Rapidly increasing the amount of short-read data generated by NGSs (new-generation sequencers) calls for the development of fast and accurate read alignment programs. The programs based on the hash table (BLAST) and Burrows-Wheeler transform (bwa-mem) are used, and the latter is known to give superior performance. We here present a new algorithm, a hybrid of hash table and suffix tree, which we designed to speed up the alignment of short reads against large reference sequences such as the human genome. The total turnaround time for processing one human genome sample (read depth of 30) is just 31 min with our system while that was more than 25 h with bwa-mem/gatk. The time for the aligner alone is 28 min for our system but around 2 h for bwa-mem. Our new algorithm is 4.4 times faster than bwa-mem while achieving similar accuracy. Variant calling and other downstream analyses after the alignment can be done with open-source tools such as SAMtools and Genome Analysis Toolkit (gatk) packages, as well as our own fast variant caller, which is well parallelized and much faster than gatk.</p>","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":"22 1","pages":"19"},"PeriodicalIF":0.0,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11520436/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142549935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Lightweight technology stacks for assistive linked annotations.","authors":"Nishad Thalhath","doi":"10.1186/s44342-024-00021-4","DOIUrl":"10.1186/s44342-024-00021-4","url":null,"abstract":"<p><p>This report presents the findings of a project from the 8th Biomedical Linked Annotation Hackathon (BLAH) to explore lightweight technology stacks to enhance assistive linked annotations. Using modern JavaScript frameworks and edge functions, in-browser Named Entity Recognition (NER), serverless embedding and vector search within web interfaces, and efficient serverless full-text search were implemented. Through this experimental approach, a proof of concept to demonstrate the feasibility and performance of these technologies was demonstrated. The results show that lightweight stacks can significantly improve the efficiency and cost-effectiveness of annotation tools and provide a local-first, privacy-oriented, and secure alternative to traditional server-based solutions in various use cases. This work emphasizes the potential of developing annotation interfaces that are more responsive, scalable, and user-friendly, which would benefit bioinformatics researchers, practitioners, and software developers.</p>","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":"22 1","pages":"17"},"PeriodicalIF":0.0,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11468380/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142402555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Molecular diagnostic approach to rare neurological diseases from a clinician viewpoint.","authors":"Jin Sook Lee","doi":"10.1186/s44342-024-00025-0","DOIUrl":"10.1186/s44342-024-00025-0","url":null,"abstract":"<p><p>Advancements in sequencing technology have significantly enhanced diagnostic capabilities for rare neurological diseases. This progress in molecular diagnostics can greatly impact clinical management and facilitate the development of personalized treatments for patients with rare neurological diseases. Neurologists with expertise should raise clinical awareness, as phenotyping remains crucial for making a clinical diagnosis, even in the genomics era. They should prioritize different types of genomic tests, considering both the benefits and the limitations inherent to each test. Notably, long-read sequencing is being utilized in cases suspected to involve repeat expansion disorders or complex structural variants. Repeat expansion disorders are highly prevalent in neurological diseases, particularly within the ataxia group. Significant efforts, including periodic reanalysis, data sharing, or integration of genomics with multi-omics studies, should be directed toward cases that remain undiagnosed after standard next-generation sequencing.</p>","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":"22 1","pages":"18"},"PeriodicalIF":0.0,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11468364/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142402556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Compression rates of microbial genomes are associated with genome size and base composition.","authors":"Jon Bohlin, John H-O Pettersson","doi":"10.1186/s44342-024-00018-z","DOIUrl":"10.1186/s44342-024-00018-z","url":null,"abstract":"<p><strong>Background: </strong>To what degree a string of symbols can be compressed reveals important details about its complexity. For instance, strings that are not compressible are random and carry a low information potential while the opposite is true for highly compressible strings. We explore to what extent microbial genomes are amenable to compression as they vary considerably both with respect to size and base composition. For instance, microbial genome sizes vary from less than 100,000 base pairs in symbionts to more than 10 million in soil-dwellers. Genomic base composition, often summarized as genomic AT or GC content due to the similar frequencies of adenine and thymine on one hand and cytosine and guanine on the other, also vary substantially; the most extreme microbes can have genomes with AT content below 25% or above 85% AT. Base composition determines the frequency of DNA words, consisting of multiple nucleotides or oligonucleotides, and may therefore also influence compressibility. Using 4,713 RefSeq genomes, we examined the association between compressibility, using both a DNA based- (MBGC) and a general purpose (ZPAQ) compression algorithm, and genome size, AT content as well as genomic oligonucleotide usage variance (OUV) using generalized additive models.</p><p><strong>Results: </strong>We find that genome size (p < 0.001) and OUV (p < 0.001) are both strongly associated with genome redundancy for both type of file compressors. The DNA-based MBGC compressor managed to improve compression with approximately 3% on average with respect to ZPAQ. Moreover, MBGC detected a significant (p < 0.001) compression ratio difference between AT poor and AT rich genomes which was not detected with ZPAQ.</p><p><strong>Conclusion: </strong>As lack of compressibility is equivalent to randomness, our findings suggest that smaller and AT rich genomes may have accumulated more random mutations on average than larger and AT poor genomes which, in turn, were significantly more redundant. Moreover, we find that OUV is a strong proxy for genome compressibility in microbial genomes. The ZPAQ compressor was found to agree with the MBGC compressor, albeit with a poorer performance, except for the compressibility of AT-rich and AT-poor/GC-rich genomes.</p>","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":"22 1","pages":"16"},"PeriodicalIF":0.0,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11468749/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142402554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}