Pierre Larmande, Yusha Liu, Xinzhi Yao, Jingbo Xia
{"title":"OryzaGP 2021 update: a rice gene and protein dataset for named-entity recognition.","authors":"Pierre Larmande, Yusha Liu, Xinzhi Yao, Jingbo Xia","doi":"10.5808/gi.21015","DOIUrl":"https://doi.org/10.5808/gi.21015","url":null,"abstract":"<p><p>Due to the rapid evolution of high-throughput technologies, a tremendous amount of data is being produced in the biological domain, which poses a challenging task for information extraction and natural language understanding. Biological named entity recognition (NER) and named entity normalisation (NEN) are two common tasks aiming at identifying and linking biologically important entities such as genes or gene products mentioned in the literature to biological databases. In this paper, we present an updated version of OryzaGP, a gene and protein dataset for rice species created to help natural language processing (NLP) tools in processing NER and NEN tasks. To create the dataset, we selected more than 15,000 abstracts associated with articles previously curated for rice genes. We developed four dictionaries of gene and protein names associated with database identifiers. We used these dictionaries to annotate the dataset. We also annotated the dataset using pre-trained NLP models. Finally, we analysed the annotation results and discussed how to improve OryzaGP.</p>","PeriodicalId":36591,"journal":{"name":"Genomics and Informatics","volume":"19 3","pages":"e27"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8510865/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39509819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Márcia Barros, Pedro Ruas, Diana Sousa, Ali Haider Bangash, Francisco M Couto
{"title":"COVID-19 recommender system based on an annotated multilingual corpus.","authors":"Márcia Barros, Pedro Ruas, Diana Sousa, Ali Haider Bangash, Francisco M Couto","doi":"10.5808/gi.21008","DOIUrl":"https://doi.org/10.5808/gi.21008","url":null,"abstract":"<p><p>Tracking the most recent advances in Coronavirus disease 2019 (COVID-19)-related research is essential, given the disease's novelty and its impact on society. However, with the publication pace speeding up, researchers and clinicians require automatic approaches to keep up with the incoming information regarding this disease. A solution to this problem requires the development of text mining pipelines; the efficiency of which strongly depends on the availability of curated corpora. However, there is a lack of COVID-19-related corpora, even more, if considering other languages besides English. This project's main contribution was the annotation of a multilingual parallel corpus and the generation of a recommendation dataset (EN-PT and EN-ES) regarding relevant entities, their relations, and recommendation, providing this resource to the community to improve the text mining research on COVID-19-related literature. This work was developed during the 7th Biomedical Linked Annotation Hackathon (BLAH7).</p>","PeriodicalId":36591,"journal":{"name":"Genomics and Informatics","volume":"19 3","pages":"e24"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8510867/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39511294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comparative genome characterization of Leptospira interrogans from mild and severe leptospirosis patients.","authors":"Songtham Anuntakarun, Vorthon Sawaswong, Rungrat Jitvaropas, Kesmanee Praianantathavorn, Witthaya Poomipak, Yupin Suputtamongkol, Chintana Chirathaworn, Sunchai Payungporn","doi":"10.5808/gi.21037","DOIUrl":"https://doi.org/10.5808/gi.21037","url":null,"abstract":"<p><p>Leptospirosis is a zoonotic disease caused by spirochetes from the genus Leptospira. In Thailand, Leptospira interrogans is a major cause of leptospirosis. Leptospirosis patients present with a wide range of clinical manifestations from asymptomatic, mild infections to severe illness involving organ failure. For better understanding the difference between Leptospira isolates causing mild and severe leptospirosis, illumina sequencing was used to sequence genomic DNA in both serotypes. DNA of Leptospira isolated from two patients, one with mild and another with severe symptoms, were included in this study. The paired-end reads were removed adapters and trimmed with Q30 score using Trimmomatic. Trimmed reads were constructed to contigs and scaffolds using SPAdes. Cross-contamination of scaffolds was evaluated by ContEst16s. Prokka tool for bacterial annotation was used to annotate sequences from both Leptospira isolates. Predicted amino acid sequences from Prokka were searched in EggNOG and David gene ontology database to characterize gene ontology. In addition, Leptospira from mild and severe patients, that passed the criteria e-value < 10e-5 from blastP against virulence factor database, were used to analyze with Venn diagram. From this study, we found 13 and 12 genes that were unique in the isolates from mild and severe patients, respectively. The 12 genes in the severe isolate might be virulence factor genes that affect disease severity. However, these genes should be validated in further study.</p>","PeriodicalId":36591,"journal":{"name":"Genomics and Informatics","volume":"19 3","pages":"e31"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8510873/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39509823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yedukondalu Kollati, Radha Rama Devi Akella, Shaik Mohammad Naushad, Rajesh K Patel, G Bhanuprakash Reddy, Vijaya R Dirisala
{"title":"Molecular insights into the role of genetic determinants of congenital hypothyroidism.","authors":"Yedukondalu Kollati, Radha Rama Devi Akella, Shaik Mohammad Naushad, Rajesh K Patel, G Bhanuprakash Reddy, Vijaya R Dirisala","doi":"10.5808/gi.21034","DOIUrl":"https://doi.org/10.5808/gi.21034","url":null,"abstract":"<p><p>In our previous studies, we have demonstrated the association of certain variants of the thyroid-stimulating hormone receptor (TSHR), thyroid peroxidase (TPO), and thyroglobulin (TG) genes with congenital hypothyroidism. Herein, we explored the mechanistic basis for this association using different in silico tools. The mRNA 3'-untranslated region (3'-UTR) plays key roles in gene expression at the post-transcriptional level. In TSHR variants (rs2268477, rs7144481, and rs17630128), the binding affinity of microRNAs (miRs) (hsa-miR-154-5p, hsa-miR-376a-2-5p, hsa-miR-3935, hsa-miR-4280, and hsa-miR-6858-3p) to the 3'-UTR is disrupted, affecting post-transcriptional gene regulation. TPO and TG are the two key proteins necessary for the biosynthesis of thyroid hormones in the presence of iodide and H2O2. Reduced stability of these proteins leads to aberrant biosynthesis of thyroid hormones. Compared to the wild-type TPO protein, the p.S398T variant was found to exhibit less stability and significant rearrangements of intra-atomic bonds affecting the stoichiometry and substrate binding (binding energies, ΔG of wild-type vs. mutant: ‒15 vs. ‒13.8 kcal/mol; and dissociation constant, Kd of wild-type vs. mutant: 7.2E-12 vs. 7.0E-11 M). The missense mutations p.G653D and p.R1999W on the TG protein showed altered ΔG (0.24 kcal/mol and 0.79 kcal/mol, respectively). In conclusion, an in silico analysis of TSHR genetic variants in the 3'-UTR showed that they alter the binding affinities of different miRs. The TPO protein structure and mutant protein complex (p.S398T) are less stable, with potentially deleterious effects. A structural and energy analysis showed that TG mutations (p.G653D and p.R1999W) reduce the stability of the TG protein and affect its structure-functional relationship.</p>","PeriodicalId":36591,"journal":{"name":"Genomics and Informatics","volume":"19 3","pages":"e29"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8510868/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39509821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Oscar Lithgow-Serrano, Joseph Cornelius, Vani Kanjirangat, Carlos-Francisco Méndez-Cruz, Fabio Rinaldi
{"title":"Improving classification of low-resource COVID-19 literature by using Named Entity Recognition.","authors":"Oscar Lithgow-Serrano, Joseph Cornelius, Vani Kanjirangat, Carlos-Francisco Méndez-Cruz, Fabio Rinaldi","doi":"10.5808/gi.21018","DOIUrl":"https://doi.org/10.5808/gi.21018","url":null,"abstract":"<p><p>Automatic document classification for highly interrelated classes is a demanding task that becomes more challenging when there is little labeled data for training. Such is the case of the coronavirus disease 2019 (COVID-19) Clinical repository-a repository of classified and translated academic articles related to COVID-19 and relevant to the clinical practice-where a 3-way classification scheme is being applied to COVID-19 literature. During the 7th Biomedical Linked Annotation Hackathon (BLAH7) hackathon, we performed experiments to explore the use of named-entity-recognition (NER) to improve the classification. We processed the literature with OntoGene's Biomedical Entity Recogniser (OGER) and used the resulting identified Named Entities (NE) and their links to major biological databases as extra input features for the classifier. We compared the results with a baseline model without the OGER extracted features. In these proof-of-concept experiments, we observed a clear gain on COVID-19 literature classification. In particular, NE's origin was useful to classify document types and NE's type for clinical specialties. Due to the limitations of the small dataset, we can only conclude that our results suggests that NER would benefit this classification task. In order to accurately estimate this benefit, further experiments with a larger dataset would be needed.</p>","PeriodicalId":36591,"journal":{"name":"Genomics and Informatics","volume":"19 3","pages":"e22"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8510872/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39511292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A M U B Mahfuz, A M Zubair-Bin-Mahfuj, Dibya Joti Podder
{"title":"A network-biology approach for identification of key genes and pathways involved in malignant peritoneal mesothelioma.","authors":"A M U B Mahfuz, A M Zubair-Bin-Mahfuj, Dibya Joti Podder","doi":"10.5808/gi.21019","DOIUrl":"10.5808/gi.21019","url":null,"abstract":"<p><p>Even in the current age of advanced medicine, the prognosis of malignant peritoneal mesothelioma (MPM) remains abysmal. Molecular mechanisms responsible for the initiation and progression of MPM are still largely not understood. Adopting an integrated bioinformatics approach, this study aims to identify the key genes and pathways responsible for MPM. Genes that are differentially expressed in MPM in comparison with the peritoneum of healthy controls have been identified by analyzing a microarray gene expression dataset. Gene Ontology and Kyoto Encyclopedia of Genes and Genomes pathway analyses of these differentially expressed genes (DEG) were conducted to gain a better insight. A protein-protein interaction (PPI) network of the proteins encoded by the DEGs was constructed using STRING and hub genes were detected analyzing this network. Next, the transcription factors and miRNAs that have possible regulatory roles on the hub genes were detected. Finally, survival analyses based on the hub genes were conducted using the GEPIA2 web server. Six hundred six genes were found to be differentially expressed in MPM; 133 are upregulated and 473 are downregulated. Analyzing the STRING generated PPI network, six dense modules and 12 hub genes were identified. Fifteen transcription factors and 10 miRNAs were identified to have the most extensive regulatory functions on the DEGs. Through bioinformatics analyses, this work provides an insight into the potential genes and pathways involved in MPM.</p>","PeriodicalId":36591,"journal":{"name":"Genomics and Informatics","volume":"19 2","pages":"e16"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8261271/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39183233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Neha Samir Roy, Yong-Wook Ban, Hana Yoo, Rahul Vasudeo Ramekar, Eun Ju Cheong, Nam-Il Park, Jong Kuk Na, Kyong-Cheul Park, Ik-Young Choi
{"title":"Analysis of genome variants in dwarf soybean lines obtained in F6 derived from cross of normal parents (cultivated and wild soybean).","authors":"Neha Samir Roy, Yong-Wook Ban, Hana Yoo, Rahul Vasudeo Ramekar, Eun Ju Cheong, Nam-Il Park, Jong Kuk Na, Kyong-Cheul Park, Ik-Young Choi","doi":"10.5808/gi.21024","DOIUrl":"https://doi.org/10.5808/gi.21024","url":null,"abstract":"<p><p>Plant height is an important component of plant architecture and significantly affects crop breeding practices and yield. We studied DNA variations derived from F5 recombinant inbred lines (RILs) with 96.8% homozygous genotypes. Here, we report DNA variations between the normal and dwarf members of four lines harvested from a single seed parent in an F6 RIL population derived from a cross between Glycine max var. Peking and Glycine soja IT182936. Whole genome sequencing was carried out, and the DNA variations in the whole genome were compared between the normal and dwarf samples. We found a large number of DNA variations in both the dwarf and semi-dwarf lines, with one single nucleotide polymorphism (SNP) per at least 3.68 kb in the dwarf lines and 1 SNP per 11.13 kb of the whole genome. This value is 2.18 times higher than the expected DNA variation in the F6 population. A total of 186 SNPs and 241 SNPs were discovered in the coding regions of the dwarf lines 1282 and 1303, respectively, and we discovered 33 homogeneous nonsynonymous SNPs that occurred at the same loci in each set of dwarf and normal soybean. Of them, five SNPs were in the same positions between lines 1282 and 1303. Our results provide important information for improving our understanding of the genetics of soybean plant height and crop breeding. These polymorphisms could be useful genetic resources for plant breeders, geneticists, and biologists for future molecular biology and breeding projects.</p>","PeriodicalId":36591,"journal":{"name":"Genomics and Informatics","volume":"19 2","pages":"e19"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8261272/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39183605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A protein interactions map of multiple organ systems associated with COVID-19 disease.","authors":"Dhammapal Bharne","doi":"10.5808/gi.20078","DOIUrl":"https://doi.org/10.5808/gi.20078","url":null,"abstract":"<p><p>Coronavirus disease 2019 (COVID-19) is an on-going pandemic disease infecting millions of people across the globe. Recent reports of reduction in antibody levels and the re-emergence of the disease in recovered patients necessitated the understanding of the pandemic at the core level. The cases of multiple organ failures emphasized the consideration of different organ systems while managing the disease. The present study employed RNA sequencing data to determine the disease associated differentially regulated genes and their related protein interactions in several organ systems. It signified the importance of early diagnosis and treatment of the disease. A map of protein interactions of multiple organ systems was built and uncovered CAV1 and CTNNB1 as the top degree nodes. A core interactions sub-network was analyzed to identify different modules of functional significance. AR, CTNNB1, CAV1, and PIK3R1 proteins were unfolded as bridging nodes interconnecting different modules for the information flow across several pathways. The present study also highlighted some of the druggable targets to analyze in drug re-purposing strategies against the COVID-19 pandemic. Therefore, the protein interactions map and the modular interactions of the differentially regulated genes in the multiple organ systems would incline the scientists and researchers to investigate in novel therapeutics for the COVID-19 pandemic expeditiously.</p>","PeriodicalId":36591,"journal":{"name":"Genomics and Informatics","volume":"19 2","pages":"e14"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8261268/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39183231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Editor's introduction to this issue (G&I 19:2, 2021).","authors":"Taesung Park","doi":"10.5808/gi.19.2.e1","DOIUrl":"https://doi.org/10.5808/gi.19.2.e1","url":null,"abstract":"In this issue, there are six original articles and one mini review. The first article by by Sohag et al. (Jagannath University, Bangladesh) provides a short review on omics approaches to cardiovascular diseases (CVDs). The author summarizes the genomics, proteomics, transcriptomics, and metabolomics in CVDs with a well-organized prospect. The first original article is about a protein interactions map of multiple organ systems associated with coronavirus disease 2019 (COVID-19) disease by Dr. Bharne (University of Hyderabad, India). This study appears to be motivated by reports that reduced antibody levels and disease recurrence in recovered COVID-19 patients require understanding of the epidemic at a key level. Multiple organ failure cases in patients with COVID-19 have highlighted consideration for other organ systems. This study used RNA sequencing data to determine disease-associated differentially regulated genes and related protein interactions in multiple organ systems, which implies the importance of early diagnosis and treatment of the disease. RNA sequencing data were obtained from autopsy specimens of lung, heart, jejunum, liver, kidney, intestine, bone marrow, adipose, placenta, and skin from 24 patients who died of COVID-19 infection. The total number of samples in the sequencing data was 88, including five negative control samples. Using significantly expressed genes in different organ systems, protein interactions of multiple organ systems were then mapped, revealing CAV1 and CTNNB1 as top nodes. A core interactions sub-network was analyzed to identify several functionally important modules such as AR, CTNNB1, CAV1 and PIK3R1 proteins. In addition, this study highlighted some of the druggable targets to analyze in drug re-purposing strategies against the COVID-19 pandemic. I think the protein interaction maps and modular interactions of differentially regulated genes in multi-organ systems would provide the clues to researchers to rapidly investigate novel therapeutics for the COVID-19 pandemic. The second article by Sohpal (Beant College of Engineering & Technology, India) performed a comparative study of coronaviruses including severe acute respiratory syndrome coronavirus 2, severe acute respiratory syndrome coronavirus, and Middle East respiratory syndrome coronavirus focusing on non-synonymous and synonymous substitutions Through simulation studies, nucleotide sequence of closely related strains of respiratory syndrome viruses, codon-by-codon with maximum likelihood analysis, z selection and the divergence time were investigated. The third article by Mahfuz et al. (University of Development Alternative, Bangladesh) presented a network-biology approach for identification of key genes and pathways involved in malignant peritoneal mesothelioma (MPM). To understand the molecular mechanisms responsible for the initiation and progression of MPM, this study aims to identify the key genes and pathways responsible for MPM. Several bioin","PeriodicalId":36591,"journal":{"name":"Genomics and Informatics","volume":"19 2","pages":"e12"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8261267/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39145806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Implications of the simple chemical structure of the odorant molecules interacting with the olfactory receptor 1A1.","authors":"S June Oh","doi":"10.5808/gi.21033","DOIUrl":"https://doi.org/10.5808/gi.21033","url":null,"abstract":"<p><p>G protein-coupled receptors (GPCRs), including olfactory receptors, account for the largest group of genes in the human genome and occupy a very important position in signaling systems. Although olfactory receptors, which belong to the broader category of GPCRs, play an important role in monitoring the organism's surroundings, their actual three-dimensional structure has not yet been determined. Therefore, the specific details of the molecular interactions between the receptor and the ligand remain unclear. In this report, the interactions between human olfactory receptor 1A1 and its odorant molecules were simulated using computational methods, and we explored how the chemically simple odorant molecules activate the olfactory receptor.</p>","PeriodicalId":36591,"journal":{"name":"Genomics and Informatics","volume":"19 2","pages":"e18"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8261270/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39145807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}