GigaSciencePub Date : 2024-01-02DOI: 10.1093/gigascience/giae079
Niklas Birth, Nicolina Leppich, Julia Schirmacher, Nina Andreae, Rasmus Steinkamp, Matthias Blanke, Peter Meinicke
{"title":"CoCoPyE: feature engineering for learning and prediction of genome quality indices.","authors":"Niklas Birth, Nicolina Leppich, Julia Schirmacher, Nina Andreae, Rasmus Steinkamp, Matthias Blanke, Peter Meinicke","doi":"10.1093/gigascience/giae079","DOIUrl":"https://doi.org/10.1093/gigascience/giae079","url":null,"abstract":"<p><strong>Background: </strong>The exploration of the microbial world has been greatly advanced by the reconstruction of genomes from metagenomic sequence data. However, the rapidly increasing number of metagenome-assembled genomes has also resulted in a wide variation in data quality. It is therefore essential to quantify the achieved completeness and possible contamination of a reconstructed genome before it is used in subsequent analyses. The classical approach for the estimation of quality indices solely relies on a relatively small number of universal single-copy genes. Recent tools try to extend the genomic coverage of estimates for an increased accuracy.</p><p><strong>Results: </strong>We developed CoCoPyE, a fast tool based on a novel 2-stage feature extraction and transformation scheme. First, it identifies genomic markers and then refines the marker-based estimates with a machine learning approach. In our simulation studies, CoCoPyE showed a more accurate prediction of quality indices than the existing tools. While the CoCoPyE web server offers an easy way to try out the tool, the freely available Python implementation enables integration into existing genome reconstruction pipelines.</p><p><strong>Conclusions: </strong>CoCoPyE provides a new approach to assess the quality of genome data. It complements and improves existing tools and may help researchers to better distinguish between low-quality draft and high-quality genome assemblies in metagenome sequencing projects.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11503480/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142498590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data processing solutions to render metabolomics more quantitative: case studies in food and clinical metabolomics using Metabox 2.0.","authors":"Kwanjeera Wanichthanarak, Ammarin In-On, Sili Fan, Oliver Fiehn, Arporn Wangwiwatsin, Sakda Khoomrung","doi":"10.1093/gigascience/giae005","DOIUrl":"10.1093/gigascience/giae005","url":null,"abstract":"<p><p>In classic semiquantitative metabolomics, metabolite intensities are affected by biological factors and other unwanted variations. A systematic evaluation of the data processing methods is crucial to identify adequate processing procedures for a given experimental setup. Current comparative studies are mostly focused on peak area data but not on absolute concentrations. In this study, we evaluated data processing methods to produce outputs that were most similar to the corresponding absolute quantified data. We examined the data distribution characteristics, fold difference patterns between 2 metabolites, and sample variance. We used 2 metabolomic datasets from a retail milk study and a lupus nephritis cohort as test cases. When studying the impact of data normalization, transformation, scaling, and combinations of these methods, we found that the cross-contribution compensating multiple standard normalization (ccmn) method, followed by square root data transformation, was most appropriate for a well-controlled study such as the milk study dataset. Regarding the lupus nephritis cohort study, only ccmn normalization could slightly improve the data quality of the noisy cohort. Since the assessment accounted for the resemblance between processed data and the corresponding absolute quantified data, our results denote a helpful guideline for processing metabolomic datasets within a similar context (food and clinical metabolomics). Finally, we introduce Metabox 2.0, which enables thorough analysis of metabolomic data, including data processing, biomarker analysis, integrative analysis, and data interpretation. It was successfully used to process and analyze the data in this study. An online web version is available at http://metsysbio.com/metabox.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":3.5,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10941642/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140131178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
GigaSciencePub Date : 2024-01-02DOI: 10.1093/gigascience/giae024
Justin Chu, Jiazhen Rong, Xiaowen Feng, Heng Li
{"title":"ntsm: an alignment-free, ultra-low-coverage, sequencing technology agnostic, intraspecies sample comparison tool for sample swap detection.","authors":"Justin Chu, Jiazhen Rong, Xiaowen Feng, Heng Li","doi":"10.1093/gigascience/giae024","DOIUrl":"10.1093/gigascience/giae024","url":null,"abstract":"<p><strong>Background: </strong>Due to human error, sample swapping in large cohort studies with heterogeneous data types (e.g., mix of Oxford Nanopore Technologies, Pacific Bioscience, Illumina data, etc.) remains a common issue plaguing large-scale studies. At present, all sample swapping detection methods require costly and unnecessary (e.g., if data are only used for genome assembly) alignment, positional sorting, and indexing of the data in order to compare similarly. As studies include more samples and new sequencing data types, robust quality control tools will become increasingly important.</p><p><strong>Findings: </strong>The similarity between samples can be determined using indexed k-mer sequence variants. To increase statistical power, we use coverage information on variant sites, calculating similarity using a likelihood ratio-based test. Per sample error rate, and coverage bias (i.e., missing sites) can also be estimated with this information, which can be used to determine if a spatially indexed principal component analysis (PCA)-based prescreening method can be used, which can greatly speed up analysis by preventing exhaustive all-to-all comparisons.</p><p><strong>Conclusions: </strong>Because this tool processes raw data, is faster than alignment, and can be used on very low-coverage data, it can save an immense degree of computational resources in standard quality control (QC) pipelines. It is robust enough to be used on different sequencing data types, important in studies that leverage the strengths of different sequencing technologies. In addition to its primary use case of sample swap detection, this method also provides information useful in QC, such as error rate and coverage bias, as well as population-level PCA ancestry analysis visualization.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":3.5,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11148594/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141237337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
GigaSciencePub Date : 2024-01-02DOI: 10.1093/gigascience/giae085
Carolina Heloisa Souza-Borges, Ricardo Utsunomia, Alessandro M Varani, Marcela Uliano-Silva, Lieschen Valeria G Lira, Arno J Butzge, John F Gomez Agudelo, Shisley Manso, Milena V Freitas, Raquel B Ariede, Vito A Mastrochirico-Filho, Carolina Penaloza, Agustín Barria, Fábio Porto-Foresti, Fausto Foresti, Ricardo Hattori, Yann Guiguen, Ross D Houston, Diogo Teruo Hashimoto
{"title":"De novo assembly and characterization of a highly degenerated ZW sex chromosome in the fish Megaleporinus macrocephalus.","authors":"Carolina Heloisa Souza-Borges, Ricardo Utsunomia, Alessandro M Varani, Marcela Uliano-Silva, Lieschen Valeria G Lira, Arno J Butzge, John F Gomez Agudelo, Shisley Manso, Milena V Freitas, Raquel B Ariede, Vito A Mastrochirico-Filho, Carolina Penaloza, Agustín Barria, Fábio Porto-Foresti, Fausto Foresti, Ricardo Hattori, Yann Guiguen, Ross D Houston, Diogo Teruo Hashimoto","doi":"10.1093/gigascience/giae085","DOIUrl":"10.1093/gigascience/giae085","url":null,"abstract":"<p><strong>Background: </strong>Megaleporinus macrocephalus (piauçu) is a Neotropical fish within Characoidei that presents a well-established heteromorphic ZZ/ZW sex determination system and thus constitutes a good model for studying W and Z chromosomes in fishes. We used PacBio reads and Hi-C to assemble a chromosome-level reference genome for M. macrocephalus. We generated family segregation information to construct a genetic map, pool sequencing of males and females to characterize its sex system, and RNA sequencing to highlight candidate genes of M. macrocephalus sex determination.</p><p><strong>Results: </strong>The reference genome of M. macrocephalus is 1,282,030,339 bp in length and has a contig and scaffold N50 of 5.0 Mb and 45.03 Mb, respectively. In the sex chromosome, based on patterns of recombination suppression, coverage, FST, and sex-specific SNPs, we distinguished a putative W-specific region that is highly differentiated, a region where Z and W still share some similarities and is undergoing degeneration, and the PAR. The sex chromosome gene repertoire includes genes from the TGF-β family (amhr2, bmp7) and the Wnt/β-catenin pathway (wnt4, wnt7a), some of which are differentially expressed.</p><p><strong>Conclusions: </strong>The chromosome-level genome of piauçu exhibits high quality, establishing a valuable resource for advancing research within the group. Our discoveries offer insights into the evolutionary dynamics of Z and W sex chromosomes in fish, emphasizing ongoing degenerative processes and indicating complex interactions between Z and W sequences in specific genomic regions. Notably, amhr2 and bmp7 are potential candidate genes for sex determination in M. macrocephalus.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11590113/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142715761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
GigaSciencePub Date : 2024-01-02DOI: 10.1093/gigascience/giae092
Mohamed Salem, Rafet Al-Tobasei, Ali Ali, Liqi An, Ying Wang, Xuechen Bai, Ye Bi, Huaijun Zhou
{"title":"Functional annotation of regulatory elements in rainbow trout uncovers roles of the epigenome in genetic selection and genome evolution.","authors":"Mohamed Salem, Rafet Al-Tobasei, Ali Ali, Liqi An, Ying Wang, Xuechen Bai, Ye Bi, Huaijun Zhou","doi":"10.1093/gigascience/giae092","DOIUrl":"https://doi.org/10.1093/gigascience/giae092","url":null,"abstract":"<p><p>Rainbow trout (RBT) has gained widespread attention as a biological model across various fields and has been rapidly adopted for aquaculture and recreational purposes on 6 continents. Despite significant efforts to develop genome sequences for RBT, the functional genomic basis of RBT's environmental, phenotypic, and evolutionary variations still requires epigenome reference annotations. This study has produced a comprehensive catalog and epigenome annotation tracks of RBT, detecting gene regulatory elements, including chromatin histone modifications, chromatin accessibility, and DNA methylation. By integrating chromatin immunoprecipitation sequencing, ATAC sequencing, Methyl Mini-seq, and RNA sequencing data, this new regulatory element catalog has helped to characterize the epigenome dynamics and its correlation with gene expression. The study has also identified potential causal variants and transcription factors regulating complex domestication phenotypic traits. This research also provides valuable insights into the epigenome's role in gene evolution and the mechanism of duplicate gene retention 100 million years after RBT whole-genome duplication and during re-diploidization. The newly developed epigenome annotation maps are among the first in fish and are expected to enhance the accuracy and efficiency of genomic studies and applications, including genome-wide association studies, causative variation identification, and genomic selection in RBT and fish comparative genomics.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142828078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
GigaSciencePub Date : 2024-01-02DOI: 10.1093/gigascience/giae087
Aishwarya Venkataramanan, Michael Kloster, Andrea Burfeid-Castellanos, Mimoza Dani, Ntambwe A S Mayombo, Danijela Vidakovic, Daniel Langenkämper, Mingkun Tan, Cedric Pradalier, Tim Nattkemper, Martin Laviale, Bánk Beszteri
{"title":"\"UDE DIATOMS in the Wild 2024\": a new image dataset of freshwater diatoms for training deep learning models.","authors":"Aishwarya Venkataramanan, Michael Kloster, Andrea Burfeid-Castellanos, Mimoza Dani, Ntambwe A S Mayombo, Danijela Vidakovic, Daniel Langenkämper, Mingkun Tan, Cedric Pradalier, Tim Nattkemper, Martin Laviale, Bánk Beszteri","doi":"10.1093/gigascience/giae087","DOIUrl":"10.1093/gigascience/giae087","url":null,"abstract":"<p><strong>Background: </strong>Diatoms are microalgae with finely ornamented microscopic silica shells. Their taxonomic identification by light microscopy is routinely used as part of community ecological research as well as ecological status assessment of aquatic ecosystems, and a need for digitalization of these methods has long been recognized. Alongside their high taxonomic and morphological diversity, several other factors make diatoms highly challenging for deep learning-based identification using light microscopy images. These include (i) an unusually high intraclass variability combined with small between-class differences, (ii) a rather different visual appearance of specimens depending on their orientation on the microscope slide, and (iii) the limited availability of diatom experts for accurate taxonomic annotation.</p><p><strong>Findings: </strong>We present the largest diatom image dataset thus far, aimed at facilitating the application and benchmarking of innovative deep learning methods to the diatom identification problem on realistic research data, \"UDE DIATOMS in the Wild 2024.\" The dataset contains 83,570 images of 611 diatom taxa, 101 of which are represented by at least 100 examples and 144 by at least 50 examples each. We showcase this dataset in 2 innovative analyses that address individual aspects of the above challenges using subclustering to deal with visually heterogeneous classes, out-of-distribution sample detection, and semi-supervised learning.</p><p><strong>Conclusions: </strong>The problem of image-based identification of diatoms is both important for environmental research and challenging from the machine learning perspective. By making available the so far largest image dataset, accompanied by innovative analyses, this contribution will facilitate addressing these points by the scientific community.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11604061/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142750299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
GigaSciencePub Date : 2024-01-02DOI: 10.1093/gigascience/giae099
Rina Su, Hao Zhou, Wenhao Yang, Sorgog Moqir, Xiji Ritu, Lei Liu, Ying Shi, Ai Dong, Menghe Bayier, Yibu Letu, Xin Manxi, Hasi Chulu, Narenhua Nasenochir, He Meng, Muren Herrid
{"title":"Near telomere-to-telomere genome assembly of Mongolian cattle: implications for population genetic variation and beef quality.","authors":"Rina Su, Hao Zhou, Wenhao Yang, Sorgog Moqir, Xiji Ritu, Lei Liu, Ying Shi, Ai Dong, Menghe Bayier, Yibu Letu, Xin Manxi, Hasi Chulu, Narenhua Nasenochir, He Meng, Muren Herrid","doi":"10.1093/gigascience/giae099","DOIUrl":"https://doi.org/10.1093/gigascience/giae099","url":null,"abstract":"<p><strong>Background: </strong>Mongolian cattle, a unique breed indigenous to China, represent valuable genetic resources and serve as important sources of meat and milk. However, there is a lack of high-quality genomes in cattle, which limits biological research and breeding improvement.</p><p><strong>Findings: </strong>In this study, we conducted whole-genome sequencing on a Mongolian bull. This effort yielded a 3.1 Gb Mongolian cattle genome sequence, with a BUSCO integrity assessment of 95.9%. The assembly achieved both contig N50 and scaffold N50 values of 110.9 Mb, with only 3 gaps identified across the entire genome. Additionally, we successfully assembled the Y chromosome among the 31 chromosomes. Notably, 3 chromosomes were identified as having telomeres at both ends. The annotation data include 54.31% repetitive sequences and 29,794 coding genes. Furthermore, a population genetic variation analysis was conducted on 332 individuals from 56 breeds, through which we identified variant loci and potentially discovered genes associated with the formation of marbling patterns in beef, predominantly located on chromosome 12.</p><p><strong>Conclusions: </strong>This study produced a genome with high continuity, completeness, and accuracy, marking the first assembly and annotation of a near telomere-to-telomere genome in cattle. Based on this, we generated a variant database comprising 332 individuals. The assembly of the genome and the analysis of population variants provide significant insights into cattle evolution and enhance our understanding of breeding selection.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142853779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
GigaSciencePub Date : 2024-01-02DOI: 10.1093/gigascience/giae004
Ye Xu, Ling Ma, Shanlin Liu, Yanxin Liang, Qiaoqiao Liu, Zhixin He, Li Tian, Yuange Duan, Wanzhi Cai, Hu Li, Fan Song
{"title":"Chromosome-level genome of the poultry shaft louse Menopon gallinae provides insight into the host-switching and adaptive evolution of parasitic lice.","authors":"Ye Xu, Ling Ma, Shanlin Liu, Yanxin Liang, Qiaoqiao Liu, Zhixin He, Li Tian, Yuange Duan, Wanzhi Cai, Hu Li, Fan Song","doi":"10.1093/gigascience/giae004","DOIUrl":"10.1093/gigascience/giae004","url":null,"abstract":"<p><strong>Background: </strong>Lice (Psocodea: Phthiraptera) are one important group of parasites that infects birds and mammals. It is believed that the ancestor of parasitic lice originated on the ancient avian host, and ancient mammals acquired these parasites via host-switching from birds. Here we present the first chromosome-level genome of Menopon gallinae in Amblycera (earliest diverging lineage of parasitic lice). We explore the transition of louse host-switching from birds to mammals at the genomic level by identifying numerous idiosyncratic genomic variations.</p><p><strong>Results: </strong>The assembled genome is 155 Mb in length, with a contig N50 of 27.42 Mb. Hi-C scaffolding assigned 97% of the bases to 5 chromosomes. The genome of M. gallinae retains a basal insect repertoire of 11,950 protein-coding genes. By comparing the genomes of lice to those of multiple representative insects in other orders, we discovered that gene families of digestion, detoxification, and immunity-related are generally conserved between bird lice and mammal lice, while mammal lice have undergone a significant reduction in genes related to chemosensory systems and temperature. This suggests that mammal lice have lost some of these genes through the adaption to environment and temperatures after host-switching. Furthermore, 7 genes related to hematophagy were positively selected in mammal lice, suggesting their involvement in the hematophagous behavior.</p><p><strong>Conclusions: </strong>Our high-quality genome of M. gallinae provides a valuable resource for comparative genomic research in Phthiraptera and facilitates further studies on adaptive evolution of host-switching within parasitic lice.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 1","pages":""},"PeriodicalIF":3.5,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10904027/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139899653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
GigaSciencePub Date : 2024-01-02DOI: 10.1093/gigascience/giae007
Filipi Miranda Soares, Luís Ferreira Pires, Maria Carolina Garcia, Yamine Bouzembrak, Lidio Coradin, Natalia Pirani Ghilardi-Lopes, Rubens Rangel Silva, Aline Martins de Carvalho, Benildes Coura Moreira Dos Santos Maculan, Sheina Koffler, Uiara Bandineli Montedo, Debora Pignatari Drucker, Raquel Santiago, Anand Gavai, Maria Clara Peres de Carvalho, Ana Carolina da Silva Lima, Hillary Dandara Elias Gabriel, Stephanie Gabriele Mendonça de França, Karoline Reis de Almeida, Bárbara Junqueira Dos Santos, Antonio Mauro Saraiva
{"title":"Leveraging citizen science for monitoring urban forageable plants.","authors":"Filipi Miranda Soares, Luís Ferreira Pires, Maria Carolina Garcia, Yamine Bouzembrak, Lidio Coradin, Natalia Pirani Ghilardi-Lopes, Rubens Rangel Silva, Aline Martins de Carvalho, Benildes Coura Moreira Dos Santos Maculan, Sheina Koffler, Uiara Bandineli Montedo, Debora Pignatari Drucker, Raquel Santiago, Anand Gavai, Maria Clara Peres de Carvalho, Ana Carolina da Silva Lima, Hillary Dandara Elias Gabriel, Stephanie Gabriele Mendonça de França, Karoline Reis de Almeida, Bárbara Junqueira Dos Santos, Antonio Mauro Saraiva","doi":"10.1093/gigascience/giae007","DOIUrl":"10.1093/gigascience/giae007","url":null,"abstract":"<p><p>Urbanization brings forth social challenges in emerging countries such as Brazil, encompassing food scarcity, health deterioration, air pollution, and biodiversity loss. Despite this, urban areas like the city of São Paulo still boast ample green spaces, offering opportunities for nature appreciation and conservation, enhancing city resilience and livability. Citizen science is a collaborative endeavor between professional scientists and nonprofessional scientists in scientific research that may help to understand the dynamics of urban ecosystems. We believe citizen science has the potential to promote human and nature connection in urban areas and provide useful data on urban biodiversity.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":3.5,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10914215/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140039095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
GigaSciencePub Date : 2024-01-02DOI: 10.1093/gigascience/giae013
Anish M S Shrestha, Mark Edward M Gonzales, Phoebe Clare L Ong, Pierre Larmande, Hyun-Sook Lee, Ji-Ung Jeung, Ajay Kohli, Dmytro Chebotarov, Ramil P Mauleon, Jae-Sung Lee, Kenneth L McNally
{"title":"RicePilaf: a post-GWAS/QTL dashboard to integrate pangenomic, coexpression, regulatory, epigenomic, ontology, pathway, and text-mining information to provide functional insights into rice QTLs and GWAS loci.","authors":"Anish M S Shrestha, Mark Edward M Gonzales, Phoebe Clare L Ong, Pierre Larmande, Hyun-Sook Lee, Ji-Ung Jeung, Ajay Kohli, Dmytro Chebotarov, Ramil P Mauleon, Jae-Sung Lee, Kenneth L McNally","doi":"10.1093/gigascience/giae013","DOIUrl":"10.1093/gigascience/giae013","url":null,"abstract":"<p><strong>Background: </strong>As the number of genome-wide association study (GWAS) and quantitative trait locus (QTL) mappings in rice continues to grow, so does the already long list of genomic loci associated with important agronomic traits. Typically, loci implicated by GWAS/QTL analysis contain tens to hundreds to thousands of single-nucleotide polmorphisms (SNPs)/genes, not all of which are causal and many of which are in noncoding regions. Unraveling the biological mechanisms that tie the GWAS regions and QTLs to the trait of interest is challenging, especially since it requires collating functional genomics information about the loci from multiple, disparate data sources.</p><p><strong>Results: </strong>We present RicePilaf, a web app for post-GWAS/QTL analysis, that performs a slew of novel bioinformatics analyses to cross-reference GWAS results and QTL mappings with a host of publicly available rice databases. In particular, it integrates (i) pangenomic information from high-quality genome builds of multiple rice varieties, (ii) coexpression information from genome-scale coexpression networks, (iii) ontology and pathway information, (iv) regulatory information from rice transcription factor databases, (v) epigenomic information from multiple high-throughput epigenetic experiments, and (vi) text-mining information extracted from scientific abstracts linking genes and traits. We demonstrate the utility of RicePilaf by applying it to analyze GWAS peaks of preharvest sprouting and genes underlying yield-under-drought QTLs.</p><p><strong>Conclusions: </strong>RicePilaf enables rice scientists and breeders to shed functional light on their GWAS regions and QTLs, and it provides them with a means to prioritize SNPs/genes for further experiments. The source code, a Docker image, and a demo version of RicePilaf are publicly available at https://github.com/bioinfodlsu/rice-pilaf.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11148593/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141237423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}