A computational framework for identifying promoter sequences in non-model organisms using RNA-seq datasets
Ontology highlight
ABSTRACT: We developed a computational framework to discover short DNA sequences that confer strong expression in non-model organisms. The framework relies solely on whole genome and RNA sequencing data types, which are easily accessible to a variety of research groups. The framework proceeds in three main stages: 1) identification of a group of highly expressed loci that maintain high transcript counts across a broad range of experimental conditions, 2) extraction of the corresponding upstream candidate promoter regions of these highly expressed loci while minding nearby annotations and avoiding those that may potentially reside in operons, and 3) application of the motif finding algorithm in BioProspector to these upstream regions to predict the location and sequence of the -35 and -10 hexamers that drive the strong expression of these loci. Ultimately, we report sequences of 27-30 bases in length as candidate -35, -10 signals for each of the top loci and create a consensus motif from these predictions. We apply our framework to 80 RNA-seq datasets collected for the methanotroph Methylotuvimicrobium buryatense 5GB1 and validate our predictions computationally and experimentally. The data deposited here represent all RNA-seq data that, until this study, has not previously been published.
ORGANISM(S): Methylotuvimicrobium buryatense
PROVIDER: GSE162089 | GEO | 2021/05/24
REPOSITORIES: GEO
ACCESS DATA