Gunhwan Ko, Jae Ho Lee, Young Mi Sim, Wangho Song, Byung-Ha Yoon, Iksu Byeon, Bang Hyuck Lee, Sang-Ok Kim, Jinhyuk Choi, Insoo Jang, Hyerin Kim, Jin Ok Yang, Kiwon Jang, Sora Kim, Jong-Hwan Kim, Jongbum Jeon, Jaeeun Jung, Seungwoo Hwang, Ji-Hwan Park, Pan-Gyu Kim, Seon-Young Kim, Byungwook Lee
{"title":"KoNA: Korean Nucleotide Archive as A New Data Repository for Nucleotide Sequence Data.","authors":"Gunhwan Ko, Jae Ho Lee, Young Mi Sim, Wangho Song, Byung-Ha Yoon, Iksu Byeon, Bang Hyuck Lee, Sang-Ok Kim, Jinhyuk Choi, Insoo Jang, Hyerin Kim, Jin Ok Yang, Kiwon Jang, Sora Kim, Jong-Hwan Kim, Jongbum Jeon, Jaeeun Jung, Seungwoo Hwang, Ji-Hwan Park, Pan-Gyu Kim, Seon-Young Kim, Byungwook Lee","doi":"10.1093/gpbjnl/qzae017","DOIUrl":"10.1093/gpbjnl/qzae017","url":null,"abstract":"<p><p>During the last decade, the generation and accumulation of petabase-scale high-throughput sequencing data have resulted in great challenges, including access to human data, as well as transfer, storage, and sharing of enormous amounts of data. To promote data-driven biological research, the Korean government announced that all biological data generated from government-funded research projects should be deposited at the Korea BioData Station (K-BDS), which consists of multiple databases for individual data types. Here, we introduce the Korean Nucleotide Archive (KoNA), a repository of nucleotide sequence data. As of July 2022, the Korean Read Archive in KoNA has collected over 477 TB of raw next-generation sequencing data from national genome projects. To ensure data quality and prepare for international alignment, a standard operating procedure was adopted, which is similar to that of the International Nucleotide Sequence Database Collaboration. The standard operating procedure includes quality control processes for submitted data and metadata using an automated pipeline, followed by manual examination. To ensure fast and stable data transfer, a high-speed transmission system called GBox is used in KoNA. Furthermore, the data uploaded to or downloaded from KoNA through GBox can be readily processed using a cloud computing service called Bio-Express. This seamless coupling of KoNA, GBox, and Bio-Express enhances the data experience, including submission, access, and analysis of raw nucleotide sequences. KoNA not only satisfies the unmet needs for a national sequence repository in Korea but also provides datasets to researchers globally and contributes to advances in genomics. The KoNA is available at https://www.kobic.re.kr/kona/.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141307671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Correction to: dbDEMC 3.0: Functional Exploration of Differentially Expressed miRNAs in Cancers of Human and Model Organisms.","authors":"","doi":"10.1093/gpbjnl/qzae037","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzae037","url":null,"abstract":"","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141307670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"EryDB: A Transcriptomic Profile Database for Erythropoiesis and Erythroid-related Diseases.","authors":"Guangmin Zheng, Song Wu, Zhaojun Zhang, Zijuan Xin, Lijuan Zhang, Siqi Zhao, Jing Wu, Yanxia Liu, Meng Li, Xiuyan Ruan, Nan Qiao, Yiming Bao, Hongzhu Qu, Xiangdong Fang","doi":"10.1093/gpbjnl/qzae029","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzae029","url":null,"abstract":"<p><p>Erythropoiesis is a finely regulated and complex process that involves multiple transformations from hematopoietic stem cells to mature red blood cells at hematopoietic sites from the embryonic to the adult stages. Investigations into its molecular mechanisms have generated a wealth of expression data, including bulk and single-cell RNA sequencing data. A comprehensively integrated and well-curated erythropoiesis-specific database will greatly facilitate the mining of gene expression data and enable large-scale research of erythropoiesis and erythroid-related diseases. Here, we present EryDB, an open-access and comprehensive database dedicated to the collection, integration, analysis, and visualization of transcriptomic data for erythropoiesis and erythroid-related diseases. Currently, the database includes expertly curated quality-assured data of 3803 samples and 1,187,119 single cells derived from 107 public studies of three species (Homo sapiens, Mus musculus, and Danio rerio), nine tissue types, and five diseases. EryDB provides users with the ability to not only browse the molecular features of erythropoiesis between tissues and species, but also perform computational analyses of single-cell and bulk RNA sequencing data, thus serving as a convenient platform for customized queries and analyses. EryDB v1.0 is freely accessible at https://ngdc.cncb.ac.cn/EryDB/home.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142484013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"T2T-YAO Reference Genome of Han Chinese - New Step in Advancing Precision Medicine in China.","authors":"Xue Zhang","doi":"10.1016/j.gpb.2023.09.001","DOIUrl":"10.1016/j.gpb.2023.09.001","url":null,"abstract":"","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11082255/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41109240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Review of Machine Learning and Algorithmic Methods for Protein Phosphorylation Site Prediction.","authors":"Farzaneh Esmaili, Mahdi Pourmirzaei, Shahin Ramazi, Seyedehsamaneh Shojaeilangari, Elham Yavari","doi":"10.1016/j.gpb.2023.03.007","DOIUrl":"10.1016/j.gpb.2023.03.007","url":null,"abstract":"<p><p>Post-translational modifications (PTMs) have key roles in extending the functional diversity of proteins and, as a result, regulating diverse cellular processes in prokaryotic and eukaryotic organisms. Phosphorylation modification is a vital PTM that occurs in most proteins and plays a significant role in many biological processes. Disorders in the phosphorylation process lead to multiple diseases, including neurological disorders and cancers. The purpose of this review is to organize this body of knowledge associated with phosphorylation site (p-site) prediction to facilitate future research in this field. At first, we comprehensively review all related databases and introduce all steps regarding dataset creation, data preprocessing, and method evaluation in p-site prediction. Next, we investigate p-site prediction methods, which are divided into two computational groups: algorithmic and machine learning (ML). Additionally, it is shown that there are basically two main approaches for p-site prediction by ML: conventional and end-to-end deep learning methods, both of which are given an overview. Moreover, this review introduces the most important feature extraction techniques, which have mostly been used in p-site prediction. Finally, we create three test sets from new proteins related to the released version of the database of protein post-translational modifications (dbPTM) in 2022 based on general and human species. Evaluating online p-site prediction tools on newly added proteins introduced in the dbPTM 2022 release, distinct from those in the dbPTM 2019 release, reveals their limitations. In other words, the actual performance of these online p-site prediction tools on unseen proteins is notably lower than the results reported in their respective research papers.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11082408/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49686555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"HPC-Atlas: Computationally Constructing A Comprehensive Atlas of Human Protein Complexes.","authors":"Yuliang Pan, Ruiyi Li, Wengen Li, Liuzhenghao Lv, Jihong Guan, Shuigeng Zhou","doi":"10.1016/j.gpb.2023.05.001","DOIUrl":"10.1016/j.gpb.2023.05.001","url":null,"abstract":"<p><p>A fundamental principle of biology is that proteins tend to form complexes to play important roles in the core functions of cells. For a complete understanding of human cellular functions, it is crucial to have a comprehensive atlas of human protein complexes. Unfortunately, we still lack such a comprehensive atlas of experimentally validated protein complexes, which prevents us from gaining a complete understanding of the compositions and functions of human protein complexes, as well as the underlying biological mechanisms. To fill this gap, we built Human Protein Complexes Atlas (HPC-Atlas), as far as we know, the most accurate and comprehensive atlas of human protein complexes available to date. We integrated two latest protein interaction networks, and developed a novel computational method to identify nearly 9000 protein complexes, including many previously uncharacterized complexes. Compared with the existing methods, our method achieved outstanding performance on both testing and independent datasets. Furthermore, with HPC-Atlas we identified 751 severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)-affected human protein complexes, and 456 multifunctional proteins that contain many potential moonlighting proteins. These results suggest that HPC-Atlas can serve as not only a computing framework to effectively identify biologically meaningful protein complexes by integrating multiple protein data sources, but also a valuable resource for exploring new biological findings. The HPC-Atlas webserver is freely available at http://www.yulpan.top/HPC-Atlas.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10928439/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41124619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"OBIA: An Open Biomedical Imaging Archive.","authors":"Enhui Jin, Dongli Zhao, Gangao Wu, Junwei Zhu, Zhonghuang Wang, Zhiyao Wei, Sisi Zhang, Anke Wang, Bixia Tang, Xu Chen, Yanling Sun, Zhe Zhang, Wenming Zhao, Yuanguang Meng","doi":"10.1016/j.gpb.2023.09.003","DOIUrl":"10.1016/j.gpb.2023.09.003","url":null,"abstract":"<p><p>With the development of artificial intelligence (AI) technologies, biomedical imaging data play an important role in scientific research and clinical application, but the available resources are limited. Here we present Open Biomedical Imaging Archive (OBIA), a repository for archiving biomedical imaging and related clinical data. OBIA adopts five data objects (Collection, Individual, Study, Series, and Image) for data organization, and accepts the submission of biomedical images of multiple modalities, organs, and diseases. In order to protect personal privacy, OBIA has formulated a unified de-identification and quality control process. In addition, OBIA provides friendly and intuitive web interfaces for data submission, browsing, and retrieval, as well as image retrieval. As of September 2023, OBIA has housed data for a total of 937 individuals, 4136 studies, 24,701 series, and 1,938,309 images covering 9 modalities and 30 anatomical sites. Collectively, OBIA provides a reliable platform for biomedical imaging data management and offers free open access to all publicly available data to support research activities throughout the world. OBIA can be accessed at https://ngdc.cncb.ac.cn/obia.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10928373/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41147344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"From BIG Data Center to China National Center for Bioinformation.","authors":"Yiming Bao, Yongbiao Xue","doi":"10.1016/j.gpb.2023.10.001","DOIUrl":"10.1016/j.gpb.2023.10.001","url":null,"abstract":"","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10928365/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41223972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Toward A New Paradigm of Genomics Research","authors":"Zhang Zhang, Songnian Hu, Jun Yu","doi":"10.1016/j.gpb.2023.10.005","DOIUrl":"https://doi.org/10.1016/j.gpb.2023.10.005","url":null,"abstract":"Twenty years after the completion and forty years after the proposal of the Human Genome Project (HGP), genomics, together with its twin field – bioinformatics, has entered a new paradigm, where its bioscience-related, discipline-centric applications have been creating many new research frontiers. Beijing Institute of Genomics (BIG) and now also known as China National Center for Bioinformation (CNCB), will play key roles in supporting and participating in these frontier research activities. On the 20th anniversary of the establishment of BIG, we provide a brief retrospective of its historic events and ascertain strategic research directions with a broader vision for future genomics, where digital genome, digital medicine, and digital health are so structured to meet the needs of human life and healthcare, as well as their related metaverses.","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136128324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}