Project description:RNA sequencing (RNA-seq) has been a widely used high-throughput method to characterize transcriptomic dynamics spatiotemporally. However, typical RNA-seq data analysis pipelines depend on either a sequenced genome or reference transcripts. This constriction makes the use of RNA-seq for species lacking both of sequenced genomes and reference transcripts challenging. To solve this problem, we developed CRSP, an RNA-seq pipeline integrating multiple comparative species strategy but not depending on a specific sequenced genome or reference transcripts. Benchmarking suggests the CRSP tool can achieve high accuracy to quantify gene expression levels.
Project description:Here we present the first whole-genome assemblies of Arabidopsis thaliana strains since the release of the 125 Mb reference genome sequence a decade ago. We demonstrate their practical relevance in studying the expression differences of polymorphic genes and show how the analysis of sRNA sequencing data can lead to erroneous conclusions if aligned against the reference genome alone.
Project description:This dataset addresses two phenomena affected by reference strain bias in model organism research, specifically in the nematode C. elegans. I) C. elegans is the leading system for research into RNA interference (RNAi); this research has been conducted exclusively in the reference strain. However, sensitivity to RNAi is remarkably diverse across wild-type strains. Here, we used RNA sequencing to evaluate the transcriptional response of the reference strain and four other strains to RNAi by transcriptionally profiling these strains in three conditions: exogenous RNAi targeting germline-expressed genes 1) par-1 and 2) pos-1, and 3) the control condition. II) Gene expression quantification in non-reference strains relies on successful alignment of DNA reads to the reference genome, but high sequence divergence can lead to mapping failure. Here, we used this RNA-seq dataset to characterize the extent to which poor DNA genome assembly limits expression quantification inferences.
Project description:<p>The study of antimicrobial resistance (AMR) in infectious diarrhea has generally been limited to cultivation, antimicrobial susceptibility testing and targeted PCR assays. When individual strains of significance are identified, whole genome shotgun (WGS) sequencing of important clones and clades is performed. Genes that encode resistance to antibiotics have been detected in environmental, insect, human and animal metagenomes and are known as "resistomes". While metagenomic datasets have been mined to characterize the healthy human gut resistome in the Human Microbiome Project and MetaHIT and in a Yanomani Amerindian cohort, directed metagenomic sequencing has not been used to examine the epidemiology of AMR. Especially in developing countries where sanitation is poor, diarrhea and enteric pathogens likely serve to disseminate antibiotic resistance elements of clinical significance. Unregulated use of antibiotics further exacerbates the problem by selection for acquisition of resistance. This is exemplified by recent reports of multiple antibiotic resistance in Shigella strains in India, in Escherichia coli in India and Pakistan, and in nontyphoidal Salmonella (NTS) in South-East Asia. We propose to use deep metagenomic sequencing and genome level assembly to study the epidemiology of AMR in stools of children suffering from diarrhea. Here the epidemiology component will be surveillance and analysis of the microbial composition (to the bacterial species/strain level where possible) and its constituent antimicrobial resistance genetic elements (such as plasmids, integrons, transposons and other mobile genetic elements, or MGEs) in samples from a cohort where diarrhea is prevalent and antibiotic exposure is endemic. The goal will be to assess whether consortia of specific mobile antimicrobial resistance elements associate with species/strains and whether their presence is enhanced or amplified in diarrheal microbiomes and in the presence of antibiotic exposure. This work could potentially identify clonal complexes of organisms and MGEs with enhanced resistance and the potential to transfer this resistance to other enteric pathogens.</p> <p>We have performed WGS, metagenomic assembly and gene/protein mapping to examine and characterize the types of AMR genes and transfer elements (transposons, integrons, bacteriophage, plasmids) and their distribution in bacterial species and strains assembled from DNA isolated from diarrheal and non-diarrheal stools. The samples were acquired from a cohort of pediatric patients and controls from Colombia, South America where antibiotic use is prevalent. As a control, the distribution and abundance of AMR genes can be compared to published studies where resistome gene lists from healthy cohort sequences were compiled. Our approach is more epidemiologic in nature, as we plan to identify and catalogue antimicrobial elements on MGEs capable of spread through a local population and further we will, where possible, link mobile antimicrobial resistance elements with specific strains within the population.</p>
Project description:Advances in sequencing and assembly technology has led to the creation of genome assemblies for a wide variety of non-model organisms. The rapid production and proliferation of updated, novel assembly versions can create create vexing problems for researchers when multiple genome as-sembly versions are available at once, requiring researchers to work with more than one reference genome. Multiple genome assemblies are especially problematic for researchers studying the genetic makeup of individual cells as single cell RNA sequencing (scRNAseq) requires sequenced reads to be mapped and aligned to a single reference genome. Using the Astyanax mexicanus this study highlights how the interpretation of a single cell dataset from the same sample changes when aligned to its two different available genome assemblies. We found that the number of cells and expressed genes detected were drastically different when aligning to the different assemblies. When the genome assemblies were used in isolation with their respective annotation, cell type identification was confounded as some classic cell type markers were assembly-specific, whilst other genes showed differential patterns of expression between the two assemblies. To overcome the problems posed by multiple genome assemblies, we propose that researchers align to each available assembly and then integrate the resultant datasets to produce a final dataset in which all genome alignments can be used simultaneously. We found this approach increased the accuracy of cell type identification and maximised the amount of data that could be extracted from our single cell sample by capturing all possible cells and transcripts. As scRNAseq becomes more widely available, it is imperative that the single cell community is aware how genome assembly alignment can alter single cell data and its interpretation, especially when reviewing studies on non-model organisms.
Project description:The source of most errors in RNA sequencing (RNA-seq) read alignment is in the repetitive structure of the genome and not with the alignment algorithm. Genetic variation away from the reference sequence exacerbates this problem causing reads to be assigned to the wrong location. We developed a method, implemented as the software package Seqnature, to construct the imputed genomes of individuals (individualized genomes) of experimental model organisms including inbred mouse strains and genetically unique outbred animals. Alignment to individualized genomes increases read mapping accuracy and improves transcript abundance estimates. In an application to expression QTL mapping, this approach corrected erroneous linkages and unmasked thousands of hidden associations. Individualized genomes accounting for genetic variation will be useful for human short-read sequencing and other sequencing applications including ChIP-seq. Illumina 100bp single-end liver RNA-seq from 277 male and female Diversity Outbred 26-week old mice raised on standard chow or high fat diet. In addition, Illumina 100bp single-end liver RNA-seq from 128 male 26-week old male mice (20 weeks for NZO strain) from each of the DO founder strains raised on standard chow or high fat diet (8 males per strain by diet group). Each sample was sequenced in 2-4x technical replicates across multiple flowcells. Samples were randomly assigned lanes and multiplexed at 12-24x.
Project description:The source of most errors in RNA sequencing (RNA-seq) read alignment is in the repetitive structure of the genome and not with the alignment algorithm. Genetic variation away from the reference sequence exacerbates this problem causing reads to be assigned to the wrong location. We developed a method, implemented as the software package Seqnature, to construct the imputed genomes of individuals (individualized genomes) of experimental model organisms including inbred mouse strains and genetically unique outbred animals. Alignment to individualized genomes increases read mapping accuracy and improves transcript abundance estimates. In an application to expression QTL mapping, this approach corrected erroneous linkages and unmasked thousands of hidden associations. Individualized genomes accounting for genetic variation will be useful for human short-read sequencing and other sequencing applications including ChIP-seq.
Project description:Whole-genome sequencing on PacBio of laboratory mouse strains. See http://www.sanger.ac.uk/resources/mouse/genomes/ for more details. This data is part of a pre-publication release. For information on the proper use of pre-publication data shared by the Wellcome Trust Sanger Institute (including details of any publication moratoria), please see http://www.sanger.ac.uk/datasharing/
2014-10-15 | E-ERAD-328 | biostudies-arrayexpress
Project description:Sinorhizobium species type strains genome sequencing and assembly