Improved annotation of C. elegans microRNAs by deep sequencing reveals structures associated with processing by Drosha and Dicer
Ontology highlight
ABSTRACT: MicroRNAs (miRNAs) are small regulatory RNAs that are essential in all studied metazoans. Research has focused on the prediction and identification of novel miRNAs, while little has been done to validate, annotate, and characterize identified miRNAs. Using Illumina sequencing, ~20 million small RNA sequences were obtained from Caenorhabditis elegans. Of the 175 miRNAs listed on the miRBase database, 106 were validated as deriving from a stem–loop precursor with hallmark characteristics of miRNAs. This result suggests that not all sequences identified as miRNAs belong in this category of small RNAs. Our large data set of validated miRNAs facilitated the determination of general sequence and structural characteristics of miRNAs and miRNA precursors. In contrast to previous observations, we did not observe a preference for the 59 nucleotide of the miRNA to be unpaired compared to the 59 nucleotide of the miRNA*, nor a preference for the miRNA to be on either the 59 or 39 arm of the miRNA precursor stem–loop. We observed that steady-state pools of miRNAs have fairly homogeneous termini, especially at their 59 end. Nearly all mature miRNA–miRNA* duplexes had two nucleotide 39 overhangs, and there was a preference for a uracil in the first and ninth position of the mature miRNA. Finally, we observed that specific nucleotides and structural distortions were overrepresented at certain positions adjacent to Drosha and Dicer cleavage sites. Our study offers a comprehensive data set of C. elegans miRNAs and their precursors that significantly decreases the uncertainty associated with the identity of these molecules in existing databases. Deep sequencing of small RNAs in wild-type (N2) mixed stage C. elegans, roughly 20 million sequencing reads were obtained
ORGANISM(S): Caenorhabditis elegans
SUBMITTER: Michael Warf
PROVIDER: E-GEOD-24704 | biostudies-arrayexpress |
REPOSITORIES: biostudies-arrayexpress
ACCESS DATA