Project description:A critical task in high throughput sequencing is aligning millions of short reads to a reference genome. Alignment is especially complicated for RNA sequencing (RNA-Seq) because of RNA splicing. A number of RNA-Seq algorithms are available, and claim to align reads with high accuracy and efficiency while detecting splice junctions. RNA-Seq data is discrete in nature; therefore with reasonable gene models and comparative metrics RNA-Seq data can be simulated to sufficient accuracy to enable meaningful benchmarking of alignment algorithms. The exercise to rigorously compare all viable published RNA-Seq algorithms has not previously been performed. RESULTS: We developed an RNA-Seq simulator that models the main impediments to RNA alignment, including alternative splicing, insertions, deletions, substitutions, sequencing errors, and intron signal. We used this simulator to measure the accuracy and robustness of available algorithms at the base and junction levels. Additionally, we used RT-PCR and Sanger sequencing to validate the ability of the algorithms to detect novel transcript features such as novel exons and alternative splicing in RNA-Seq data from mouse retina. A pipeline based on BLAT was developed to explore the performance of established tools for this problem, and to compare it to the recently developed methods. This pipeline, the RNA-Seq Unified Mapper (RUM) performs comparably to the best current aligners and provides an advantageous combination of accuracy, speed and usability. RNA-Seq of mouse retinal RNA, as described.
Project description:A critical task in high throughput sequencing is aligning millions of short reads to a reference genome. Alignment is especially complicated for RNA sequencing (RNA-Seq) because of RNA splicing. A number of RNA-Seq algorithms are available, and claim to align reads with high accuracy and efficiency while detecting splice junctions. RNA-Seq data is discrete in nature; therefore with reasonable gene models and comparative metrics RNA-Seq data can be simulated to sufficient accuracy to enable meaningful benchmarking of alignment algorithms. The exercise to rigorously compare all viable published RNA-Seq algorithms has not previously been performed. RESULTS: We developed an RNA-Seq simulator that models the main impediments to RNA alignment, including alternative splicing, insertions, deletions, substitutions, sequencing errors, and intron signal. We used this simulator to measure the accuracy and robustness of available algorithms at the base and junction levels. Additionally, we used RT-PCR and Sanger sequencing to validate the ability of the algorithms to detect novel transcript features such as novel exons and alternative splicing in RNA-Seq data from mouse retina. A pipeline based on BLAT was developed to explore the performance of established tools for this problem, and to compare it to the recently developed methods. This pipeline, the RNA-Seq Unified Mapper (RUM) performs comparably to the best current aligners and provides an advantageous combination of accuracy, speed and usability.
Project description:To assess the performance of read mapping software for RNA-seq, 26 spliced alignment protocols based on 11 programs and pipelines (BAGET, GEM, GSNAP, GSTRUCT, MapSplice, PALMapper, PASS, ReadsMap, SMALT, STAR and TopHat) were applied to four human and mouse transcriptome data sets.
Project description:The source of most errors in RNA sequencing (RNA-seq) read alignment is in the repetitive structure of the genome and not with the alignment algorithm. Genetic variation away from the reference sequence exacerbates this problem causing reads to be assigned to the wrong location. We developed a method, implemented as the software package Seqnature, to construct the imputed genomes of individuals (individualized genomes) of experimental model organisms including inbred mouse strains and genetically unique outbred animals. Alignment to individualized genomes increases read mapping accuracy and improves transcript abundance estimates. In an application to expression QTL mapping, this approach corrected erroneous linkages and unmasked thousands of hidden associations. Individualized genomes accounting for genetic variation will be useful for human short-read sequencing and other sequencing applications including ChIP-seq. Illumina 100bp single-end liver RNA-seq from 277 male and female Diversity Outbred 26-week old mice raised on standard chow or high fat diet. In addition, Illumina 100bp single-end liver RNA-seq from 128 male 26-week old male mice (20 weeks for NZO strain) from each of the DO founder strains raised on standard chow or high fat diet (8 males per strain by diet group). Each sample was sequenced in 2-4x technical replicates across multiple flowcells. Samples were randomly assigned lanes and multiplexed at 12-24x.
Project description:Intestinal protists are emerging as key modulators of host immunity and microbial ecology, yet their roles remain poorly defined. Here, we investigated the role of two distinct protists, the amoeba Entamoeba muris, and the parabasalid, Tritrichomonas, to determine how they shape gut immunity in vivo individually and together. Unlike the well-characterized inducer of type 2 immunity, Tritrichomonas, which activates the tuft cell–IL-25–ILC2 circuit in the small intestine, E. muris failed to elicit robust immune responses in the intestine or colon. However, introduction of E. muris into mice naturally colonized by Tritrichomonas spp., or co-infection with E. muris and Tritrichomonas spp. suppressed the Tritrichomonas-induced type-2 response in the small intestine. Fecal and cecal qPCR suggest that E. muris may outcompete Tritrichomonas spp., with reduced protist loads in the cecum and possibly diminished succinate-driven tuft cell activation. We also identified sex-specific differences in the intestinal response to primary Tritrichomonas spp. colonization which have not previously been described. These findings reveal that E. muris can dampen existing type-2 immune circuits without triggering overt inflammation, underscoring its role as an immunomodulatory agent. This work provides a framework for understanding how commensal protists interact within the gut ecosystem and shape mucosal immunity in the absence of pathogenicity.
Project description:The source of most errors in RNA sequencing (RNA-seq) read alignment is in the repetitive structure of the genome and not with the alignment algorithm. Genetic variation away from the reference sequence exacerbates this problem causing reads to be assigned to the wrong location. We developed a method, implemented as the software package Seqnature, to construct the imputed genomes of individuals (individualized genomes) of experimental model organisms including inbred mouse strains and genetically unique outbred animals. Alignment to individualized genomes increases read mapping accuracy and improves transcript abundance estimates. In an application to expression QTL mapping, this approach corrected erroneous linkages and unmasked thousands of hidden associations. Individualized genomes accounting for genetic variation will be useful for human short-read sequencing and other sequencing applications including ChIP-seq.