Dataset Information

Systematic clustering of transcription start site landscapes.

ABSTRACT: Genome-wide, high-throughput methods for transcription start site (TSS) detection have shown that most promoters have an array of neighboring TSSs where some are used more than others, forming a distribution of initiation propensities. TSS distributions (TSSDs) vary widely between promoters and earlier studies have shown that the TSSDs have biological implications in both regulation and function. However, no systematic study has been made to explore how many types of TSSDs and by extension core promoters exist and to understand which biological features distinguish them. In this study, we developed a new non-parametric dissimilarity measure and clustering approach to explore the similarities and stabilities of clusters of TSSDs. Previous studies have used arbitrary thresholds to arrive at two general classes: broad and sharp. We demonstrated that in addition to the previous broad/sharp dichotomy an additional category of promoters exists. Unlike typical TATA-driven sharp TSSDs where the TSS position can vary a few nucleotides, in this category virtually all TSSs originate from the same genomic position. These promoters lack epigenetic signatures of typical mRNA promoters and a substantial subset of them are mapping upstream of ribosomal protein pseudogenes. We present evidence that these are likely mapping errors, which have confounded earlier analyses, due to the high similarity of ribosomal gene promoters in combination with known G addition bias in the CAGE libraries. Thus, previous two-class separations of promoter based on TSS distributions are motivated, but the ultra-sharp TSS distributions will confound downstream analyses if not removed.

SUBMITTER: Zhao X

PROVIDER: S-EPMC3160847 | biostudies-literature | 2011

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Systematic clustering of transcription start site landscapes.

Zhao Xiaobei X Valen Eivind E Parker Brian J BJ Sandelin Albin A

PloS one 20110824 8

Genome-wide, high-throughput methods for transcription start site (TSS) detection have shown that most promoters have an array of neighboring TSSs where some are used more than others, forming a distribution of initiation propensities. TSS distributions (TSSDs) vary widely between promoters and earlier studies have shown that the TSSDs have biological implications in both regulation and function. However, no systematic study has been made to explore how many types of TSSDs and by extension core ...[more]

PMID: 21887249

Dataset Information

Systematic clustering of transcription start site landscapes.

Publications

Systematic clustering of transcription start site landscapes.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Transcription start site evolution in Drosophila.
| S-EPMC3708499 | biostudies-literature

Transcription start site associated RNAs in bacteria.
| S-EPMC3377991 | biostudies-literature

The transcription start site landscape of C. elegans.
| S-EPMC3730108 | biostudies-literature

Massively Systematic Transcript End Readout, "MASTER": Transcription Start Site Selection, Transcriptional Slippage, and Transcript Yields.
| S-EPMC4688149 | biostudies-literature

The mechanism of variability in transcription start site selection.
| S-EPMC5730371 | biostudies-literature

Specificity landscapes unmask submaximal binding site preferences of transcription factors.
| S-EPMC6233140 | biostudies-literature

Transcription start site profiling uncovers divergent transcription and enhancer-associated RNAs in Drosophila melanogaster.
| S-EPMC5822475 | biostudies-literature

Multiplexed protein-DNA cross-linking: Scrunching in transcription start site selection.
| S-EPMC4797950 | biostudies-literature

Ensemble approach combining multiple methods improves human transcription start site prediction.
| S-EPMC3053590 | biostudies-literature

DNA:RNA hybrid G-quadruplex formation upstream of transcription start site.
| S-EPMC7198591 | biostudies-literature