Population-level annotation of lncRNAs in Arabidopsis thaliana reveals extensive expression and epigenetic variability associated with TE-like silencing [smallRNAseq]
Ontology highlight
ABSTRACT: Background Long non-coding RNAs (lncRNAs) are under-studied and under-annotated in plants. In mammals, lncRNA expression has been shown to be reaching the extent of protein-coding expression and be highly variable between individuals of the same species. Using A. thaliana as a model plant organism, we aimed to understand the true scope of lncRNA transcription across plants from different regions, characterize lncRNA natural expression variability, and study the causes of this variability. Results Using RNA-seq data spanning 499 natural lines and 4 different developmental stages to create a more comprehensive annotation of lncRNAs in A. thaliana, we found over 10,000 novel loci — three times as many as in the current public annotation. We showed that, while lncRNA loci are ubiquitous in the genome, most appear to be actively silenced and their expression and repressive chromatin levels are extremely variable between natural lines. It was particularly prominent in intergenic lncRNAs, where TE-like sequences present in 50% of the loci are associated with increased silencing and variation and such lincRNAs tend to be targeted by TE silencing machinery. Conclusion lncRNAs are ubiquitous in the A. thaliana genome, but their expression is highly variable between different lines and tissues. This high expression variability is largely caused by high structural and epigenetic variability of non-coding loci, especially those containing pieces of transposable elements. We create the most comprehensive A. thaliana lncRNA annotation to date and improve our understanding of plant lncRNA biology.
ORGANISM(S): Arabidopsis thaliana
PROVIDER: GSE224571 | GEO | 2023/03/15
REPOSITORIES: GEO
ACCESS DATA