Project description:The source of most errors in RNA sequencing (RNA-seq) read alignment is in the repetitive structure of the genome and not with the alignment algorithm. Genetic variation away from the reference sequence exacerbates this problem causing reads to be assigned to the wrong location. We developed a method, implemented as the software package Seqnature, to construct the imputed genomes of individuals (individualized genomes) of experimental model organisms including inbred mouse strains and genetically unique outbred animals. Alignment to individualized genomes increases read mapping accuracy and improves transcript abundance estimates. In an application to expression QTL mapping, this approach corrected erroneous linkages and unmasked thousands of hidden associations. Individualized genomes accounting for genetic variation will be useful for human short-read sequencing and other sequencing applications including ChIP-seq.
Project description:The source of most errors in RNA sequencing (RNA-seq) read alignment is in the repetitive structure of the genome and not with the alignment algorithm. Genetic variation away from the reference sequence exacerbates this problem causing reads to be assigned to the wrong location. We developed a method, implemented as the software package Seqnature, to construct the imputed genomes of individuals (individualized genomes) of experimental model organisms including inbred mouse strains and genetically unique outbred animals. Alignment to individualized genomes increases read mapping accuracy and improves transcript abundance estimates. In an application to expression QTL mapping, this approach corrected erroneous linkages and unmasked thousands of hidden associations. Individualized genomes accounting for genetic variation will be useful for human short-read sequencing and other sequencing applications including ChIP-seq. Illumina 100bp single-end liver RNA-seq from 277 male and female Diversity Outbred 26-week old mice raised on standard chow or high fat diet. In addition, Illumina 100bp single-end liver RNA-seq from 128 male 26-week old male mice (20 weeks for NZO strain) from each of the DO founder strains raised on standard chow or high fat diet (8 males per strain by diet group). Each sample was sequenced in 2-4x technical replicates across multiple flowcells. Samples were randomly assigned lanes and multiplexed at 12-24x.
Project description:We collected (Illumina) RNA-seq data (polyadenylated RNA fraction) for a number of tissue samples from common marmoset and elephant. We developed a subtraction approach based on male/female RNA-seq data, Illumina genomic data and available genomes to identify and assemble Y transcripts. For marmoset samples, we added Y coding genes and noncoding sequences to the reference genomes in order to assess their expression levels. We then mapped all RNA-seq reads with TopHat 1.4.0 and used Cufflinks 2.0.0 (all mapped reads, embedded multi-read and fragment bias correction) to calculate the FPKM (Fragments Per Kilobase of transcript per Million mapped reads) values for all genes in the genomes with our refined annotations.
Project description:Genome graphs, including the recently released draft human pangenome graph, can represent the breadth of genetic diversity and thus transcend the limits of traditional linear reference genomes. However, there are no genome-graph-compatible tools for analyzing whole genome bisulfite sequencing (WGBS) data. To close this gap, we introduce methylGrapher, a tool tailored for accurate DNA methylation analysis by mapping WGBS data to a genome graph. Notably, methylGrapher can reconstruct methylation patterns along haplotype paths precisely and efficiently. To demonstrate the utility of methylGrapher, we analyzed the WGBS data derived from five individuals whose genomes were included in the first Human Pangenome draft as well as WGBS data from ENCODE (EN-TEx). Along with standard performance benchmarking, we show that methylGrapher fully recapitulates DNA methylation patterns defined by classic linear genome analysis approaches. Importantly, methylGrapher captures a substantial number of CpG sites that are missed by linear methods, and improves overall genome coverage while reducing alignment reference bias. Thus, methylGrapher is a first step towards unlocking the full potential of Human Pangenome graphs in genomic DNA methylation analysis.
Project description:We collected (Illumina) RNA-seq data (polyadenylated RNA fraction) for a number of tissue samples from common marmoset and elephant. We developed a subtraction approach based on male/female RNA-seq data, Illumina genomic data and available genomes to identify and assemble Y transcripts. For marmoset samples, we added Y coding genes and noncoding sequences to the reference genomes in order to assess their expression levels. We then mapped all RNA-seq reads with TopHat 1.4.0 and used Cufflinks 2.0.0 (all mapped reads, embedded multi-read and fragment bias correction) to calculate the FPKM (Fragments Per Kilobase of transcript per Million mapped reads) values for all genes in the genomes with our refined annotations. Sequence and expression levels of reconstructed Y-linked genes