Project description:Despite the fact that taro, colocasia esculenta, is an important staple food for millions of people around the world, its genome and transcriptome sequence has not yet been investigated. The objective of this study was to generate transcriptome sequence information from taro cultivars Niue, Palau 10, and Sam-07. Niue and Sam-07 are highly susceptible to the taro leaf blight (TLB) disease caused by Phytophthora colocasiae, to which Palau 10 is resistant. The analysis of the taro transcriptome will facilitate gene discovery, including genes that are responsible for TLB-resistance. Moreover, microsatellites (SSRs) developped from these data will be useful for marker-assisted breeding of improved taro cultivars, QTL mapping, and characterization of the genetic diversity in taro.
Project description:The surprising observation that virtually the entire human genome is transcribed means we know very little about the function of many emerging classes of RNAs, except their astounding diversity. Traditional RNA function prediction methods rely on sequence or alignment information, which are limited in their ability to classify classes of non-coding RNAs (ncRNAs). To address this, we developed CoRAL, a machine learning-based approach for classification of RNA molecules. CoRAL uses biologically interpretable features including fragment length, cleavage specificity, and antisense transcription to distinguish between different ncRNA classes. We evaluated CoRAL using genome-wide small RNA sequencing (smRNA-seq) datasets from two human tissue types (brain and skin [GSE31037]), and were able to classify six different types of RNA transcripts with 79~80% accuracy in cross-validation experiments, and with 71~73% accuracy when CoRAL uses one tissue type for training and the other as validation. Analysis by CoRAL revealed that long intergenic ncRNAs, small cytoplasmic RNAs, and small nuclear RNAs show more tissue specificity, while microRNAs, small nucleolar, and transposon-derived RNAs are highly discernible and consistent across the two tissue types. The ability to consistently annotate loci across tissue types demonstrates the potential of CoRAL to characterize ncRNAs using smRNA-seq data in less characterized organisms.