{"title":"Optimal Dual RNA-Seq Mapping for Accurate Pathogen Detection in Complex Eukaryotic Hosts.","authors":"Infanta Saleth Teresa Eden M, Umashankar Vetrivel","doi":"10.21769/BioProtoc.5182","DOIUrl":null,"url":null,"abstract":"<p><p>Dual RNA-Seq technology has significantly advanced the study of biological interactions between two organisms by allowing parallel transcriptomic analysis. Existing analysis methods employ various combinations of open-source bioinformatics tools to process dual RNA-Seq data. Upon reviewing these methods, we intend to explore crucial criteria for selecting standard tools and methods, especially focusing on critical steps such as trimming and mapping reads to the reference genome. In order to validate the different combinatorial approaches, we performed benchmarking using top-ranking tools and a publicly available dual RNA-Seq Sequence Read Archive (SRA) dataset. An important observation while evaluating the mapping approach is that when the adapter trimmed reads are first mapped to the pathogen genome, more reads align to the pathogen genome than the unmapped reads derived from the traditional host-first mapping approach. This mapping method prevents the misalignment of pathogen reads to the host genome due to their shorter length. In this way, the pathogenic read information found at lesser proportions in a complex eukaryotic dataset is precisely obtained. This protocol presents a comprehensive comparison of these possible approaches, resulting in a robust unified standard methodology. Key features • Benchmarking of top-ranking software for quality control, adapter trimming, and read mapping. • Emphasizes the importance of read mapping criteria for dual RNA-Seq datasets: (i) high count of uniquely host mapped reads, (ii) low count of host multi-mapped reads, and (iii) high count of unmapped reads belonging to pathogens. • Elaborates the best mapping approach to precisely extract the pathogen reads as these get captured comparatively less in dual RNA-Seq datasets. Graphical overview.</p>","PeriodicalId":93907,"journal":{"name":"Bio-protocol","volume":"15 3","pages":"e5182"},"PeriodicalIF":1.0000,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11825298/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bio-protocol","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21769/BioProtoc.5182","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Dual RNA-Seq technology has significantly advanced the study of biological interactions between two organisms by allowing parallel transcriptomic analysis. Existing analysis methods employ various combinations of open-source bioinformatics tools to process dual RNA-Seq data. Upon reviewing these methods, we intend to explore crucial criteria for selecting standard tools and methods, especially focusing on critical steps such as trimming and mapping reads to the reference genome. In order to validate the different combinatorial approaches, we performed benchmarking using top-ranking tools and a publicly available dual RNA-Seq Sequence Read Archive (SRA) dataset. An important observation while evaluating the mapping approach is that when the adapter trimmed reads are first mapped to the pathogen genome, more reads align to the pathogen genome than the unmapped reads derived from the traditional host-first mapping approach. This mapping method prevents the misalignment of pathogen reads to the host genome due to their shorter length. In this way, the pathogenic read information found at lesser proportions in a complex eukaryotic dataset is precisely obtained. This protocol presents a comprehensive comparison of these possible approaches, resulting in a robust unified standard methodology. Key features • Benchmarking of top-ranking software for quality control, adapter trimming, and read mapping. • Emphasizes the importance of read mapping criteria for dual RNA-Seq datasets: (i) high count of uniquely host mapped reads, (ii) low count of host multi-mapped reads, and (iii) high count of unmapped reads belonging to pathogens. • Elaborates the best mapping approach to precisely extract the pathogen reads as these get captured comparatively less in dual RNA-Seq datasets. Graphical overview.