Project description:Here, we performed deep transcriptome sequencing for the aerial-tissues and the roots of S. japonica, generating over 2 billion raw reads with an average length of 101 nt by using an Illumina paired-end sequencing by HiSeq2000 platform. Using a combined approach of three popular assemblers, de novo transcriptome assembly for S. japonica was obtained, yielding in 81,729 unigenes with an average length as 884bps and N50-value as 1,452bps, with 46,963 unigenes being annotated based on the sequence similarity against NCBI-nr protein database.
Project description:The goals of this study are to use Next-generation sequencing (NGS)to detect bacterial mRNA profiles of E. coli K-12 LE392, P. putida KT2440 and Acinetobacter baylyi ADP1 in response to various antidepressant concentrations for 2 h, in triplicate, using Illumina HiSeq 2500.The NGS QC toolkit (version 2.3.3) was used to treat the raw sequence reads to trim the 3’-end residual adaptors and primers, and the ambiguous characters in the reads were removed. Then, the sequence reads consisting of at least 85% bases were progressively trimmed at the 3’-ends until a quality value ≥ 20 were kept. Downstream analyses were performed using the generated clean reads of no shorter than 75 bp. The clean reads of each sample were aligned to the E. coli reference genome (NC_000913), Pseudomonas putida KT2440 genome (NCBI:txid160488) and Acinetobacter baylyi ADP1 genome (NCBI:txid62977) using SeqAlto (version 0.5). Cufflinks (version 2.2.1) was used to calculate the strand-specific coverage for each gene, and to analyze the differential expression in triplicate bacterial cell cultures. The statistical analyses and visualization were conducted using CummeRbund package in R (http://compbio.mit.edu/cummeRbund/). Gene expression was calculated as fragments per kilobase of a gene per million mapped reads (FPKM, a normalized value generated from the frequency of detection and the length of a given gene.
Project description:Here, we performed deep transcriptome sequencing for the aerial-tissues and the roots of S. japonica, generating over 2 billion raw reads with an average length of 101 nt by using an Illumina paired-end sequencing by HiSeq2000 platform. Using a combined approach of three popular assemblers, de novo transcriptome assembly for S. japonica was obtained, yielding in 81,729 unigenes with an average length as 884bps and N50-value as 1,452bps, with 46,963 unigenes being annotated based on the sequence similarity against NCBI-nr protein database. Transcriptome profiling of the aerial-tissues and the roots of Swertia japonica
Project description:Purpose: The goal of this study is to compare endothelial small RNA transcriptome to identify the target of OASL under basal or stimulated conditions by utilizing miRNA-seq. Methods: Endothelial miRNA profilies of siCTL or siOASL transfected HUVECs were generated by illumina sequencing method, in duplicate. After sequencing, the raw sequence reads are filtered based on quality. The adapter sequences are also trimmed off the raw sequence reads. rRNA removed reads are sequentially aligned to reference genome (GRCh38) and miRNA prediction is performed by miRDeep2. Results: We identified known miRNA in species (miRDeep2) in the HUVECs transfected with siCTL or siOASL. The expression profile of mature miRNA is used to analyze differentially expressed miRNA(DE miRNA). Conclusions: Our study represents the first analysis of endothelial miRNA profiles affected by OASL knockdown with biologic replicates.
Project description:A cDNA library was constructed by Novogene (CA, USA) using a Small RNA Sample Pre Kit, and Illumina sequencing was conducted according to company workflow, using 20 million reads. Raw data were filtered for quality as determined by reads with a quality score > 5, reads containing N < 10%, no 5' primer contaminants, and reads with a 3' primer and insert tag. The 3' primer sequence was trimmed and reads with a poly A/T/G/C were removed
Project description:This data was generated by ENCODE. If you have questions about the data, contact the submitting laboratory directly (Florencia Pauli mailto:fpauli@hudsonalpha.org). If you have questions about the Genome Browser track associated with this data, contact ENCODE (mailto:genome@soe.ucsc.edu). This track is produced as part of the ENCODE Project. RNA-seq is a method for mapping and quantifying the transcriptome of any organism that has a genomic DNA sequence assembly (Mortazavi et al., 2008). Biological replicates of ENCODE cell lines were grown on separate culture plates, total RNA was purified and polyA selected two times. mRNA was then fragmented by magnesium-catalyzed hydrolysis, reverse transcribed to cDNA by random priming and amplified. The cDNA was sequenced on an Illumina Genome Analyzer (GAI or GAIIx). The DNA sequences were aligned to the NCBI Build37 (hg19) version of the human genome using the sequence alignment programs ELAND (Illumina) or Bowtie (Langmead et al., 2009). The first 10 residues of sequencing have a weak characteristic nucleotide bias of unknown origin. This RNA-seq protocol does not specify the coding strand. As a result, there will be ambiguity at loci where both strands are transcribed. This is the first NCBI Build37 (hg19) release of this track (Jan 2012). This release includes the 3 datasets (Jurkat, A549/DEX100nm, and A549/EtOH2pct) previously released on NCBI Build36 (hg18) and adds data for several more cell types and growth conditions in replicate. Four types of download files are available for each replicate including the Raw Data (fastq), Transcripts GencodeV7 (gtf), Raw Signal (bigwig), and Alignments (bam). For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf Experimental Procedures Cells were grown according to the approved ENCODE cell culture protocols (http://hgwdev.cse.ucsc.edu/ENCODE/protocols/cell) except for H1-hESC for which frozen cell pellets were purchased from Cellular Dynamics. Cells were lysed in RLT buffer (Qiagen RNEasy kit) and processed on RNEasy midi columns according to the manufacturer's protocol, with the inclusion of the "on-column" DNase digestion step to remove residual genomic DNA. mRNA was isolated from at least 10 ug of total RNA with oligo(dT) two times (Dynabeads mRNA PurificationgKit, Invitrogen). Alternatively, cells were lysed and mRNA was purified directly two times with oligo(dT) (Dynabeads mRNA DIRECT Kit, Invitrogen). 100 ng of mRNA was fragmented by magnesium-catalyzed hydrolysis and reverse transcribed to cDNA by random priming according to the protocol in Mortazavi et al. (2008). cDNA was prepared for sequencing on the Genome Analyzer flowcell according to the protocol for the ChIPSeq DNA genomic DNA kit (Illumina). The sequencing libraries were size-selected around 225 bp and amplified with 15 rounds of PCR. Libraries were sequenced with an Illumina Genome Analyzer I or an Illumina Genome Analyzer IIx according to the manufacturer's recommendations. Single end reads of 36 nt in length were obtained. Data Processing and Analysis Fastq files were made from qseq files generated by the Illumina pipeline (Casava 1.7). The Raw Signal files (bigWig) were generated from bedgraph files and the score was calculated as the number of reads at that position divided by the total number of reads divided by one million. Casava export files were aligned to the NCBI Build37 (hg19) version of the human genome with ELAND (Illumina), generating SAM files. Fastq files of experiments that were previously aligned to NCBI Build36 (hg18) were aligned to NCBI Build37 (hg19) using Bowtie (Langmead et al., 2009; parameters: -S -n 2 -k 11 -m 10 --best), also generating SAM files. SAM files were converted to BAM with SAMtools (Li et al., 2009). Gene expression within Gencode.v7 (Harrow et al., 2006) gene models was estimated using Cufflinks v0.9.3 (Roberts et al., 2011). Estimates of transcript abundance were reported in Fragments Per Kilobase of exon per Million fragments mapped (FPKM). FPKM is calculated by dividing the total number of fragments that align to the gene model by the size of the spliced transcript (exons) in kilobases. This number is then divided by the total number of reads in millions for the experiment. FPKM is reported in the last column of the gtf (TranscriptGencV7) files. Raw Data (fastq), Raw Signal (bigWig), Alignments (bam) and Transcript Gencode V7 (gtf) files are available from the Downloads (http://hgwdev.cse.ucsc.edu/cgi-bin/hgFileUi?g=wgEncodeHaibRnaSeq) page.
Project description:We performed two independent siRNA mediated knockdowns of Srf (Srf si1 & Srf si2) and an unspecific siRNA (siNon) in mouse cardiomyocytes HL-1 cells. Small RNAs were sequenced by Illumina/Solexa next-generation (single-end) sequencing technology. The sequence reads were mapped to the mouse reference genome (NCBI v37, mm9) using MicroRazerS. MicroRazerS searches for the longest possible prefix-match of each read, i.e. the longest possible contiguous match starting at the first base. Hence, it is robust to possible adapter sequence at the 3' end of a read and requires no adapter trimming.
Project description:In order to elucidate the general rules for gene localization and regulation mediated by CpG islands, we reanalyzed published ChIP-seq data of CXXC domain, H3K9me3, KDM2A, SUV39H1, ATF4, MYBL1, MYOD1, SPI1, and CTCF. Raw data were downloaded from Sequence Read Archive (SRA) in National Center for Biotechnology Information (NCBI) database. FASTQ files were extracted with the SRA Toolkit version 2.5.5 and aligned using Bowtie 2.2.5 onto the mouse and human genome (mm9 and hg19, respectively). For the identification of factor binding sites, model-based analysis for ChIP-seq peak caller (MACS 1.4.2) was used with a p-value cutoff of 1e-5.
Project description:Whole exome sequencing of 5 HCLc tumor-germline pairs. Genomic DNA from HCLc tumor cells and T-cells for germline was used. Whole exome enrichment was performed with either Agilent SureSelect (50Mb, samples S3G/T, S5G/T, S9G/T) or Roche Nimblegen (44.1Mb, samples S4G/T and S6G/T). The resulting exome libraries were sequenced on the Illumina HiSeq platform with paired-end 100bp reads to an average depth of 120-134x. Bam files were generated using NovoalignMPI (v3.0) to align the raw fastq files to the reference genome sequence (hg19) and picard tools (v1.34) to flag duplicate reads (optical or pcr), unmapped reads, reads mapping to more than one location, and reads failing vendor QC.