Dataset Information

Sequence biases in large scale gene expression profiling data.

ABSTRACT: We present the results of a simple, statistical assay that measures the G+C content sensitivity bias of gene expression experiments without the requirement of a duplicate experiment. We analyse five gene expression profiling methods: Affymetrix GeneChip, Long Serial Analysis of Gene Expression (LongSAGE), LongSAGELite, 'Classic' Massively Parallel Signature Sequencing (MPSS) and 'Signature' MPSS. We demonstrate the methods have systematic and random errors leading to a different G+C content sensitivity. The relationship between this experimental error and the G+C content of the probe set or tag that identifies each gene influences whether the gene is detected and, if detected, the level of gene expression measured. LongSAGE has the least bias, while Signature MPSS shows a strong bias to G+C rich tags and Affymetrix data show different bias depending on the data processing method (MAS 5.0, RMA or GC-RMA). The bias in the Affymetrix data primarily impacts genes expressed at lower levels. Despite the larger sampling of the MPSS library, SAGE identifies significantly more genes (60% more RefSeq genes in a single comparison).

SUBMITTER: Siddiqui AS

PROVIDER: S-EPMC1524917 | biostudies-literature | 2006 Jul

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Sequence biases in large scale gene expression profiling data.

Siddiqui Asim S AS Delaney Allen D AD Schnerch Angelique A Griffith Obi L OL Jones Steven J M SJ Marra Marco A MA

Nucleic acids research 20060713 12

We present the results of a simple, statistical assay that measures the G+C content sensitivity bias of gene expression experiments without the requirement of a duplicate experiment. We analyse five gene expression profiling methods: Affymetrix GeneChip, Long Serial Analysis of Gene Expression (LongSAGE), LongSAGELite, 'Classic' Massively Parallel Signature Sequencing (MPSS) and 'Signature' MPSS. We demonstrate the methods have systematic and random errors leading to a different G+C content sens ...[more]

PMID: 16840527

Dataset Information

Sequence biases in large scale gene expression profiling data.

Publications

Sequence biases in large scale gene expression profiling data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Large-scale gene expression profiling data of bone marrow stromal cells from osteoarthritic donors.
| S-EPMC4961785 | biostudies-literature

paraGSEA: a scalable approach for large-scale gene expression profiling.
| S-EPMC5737394 | biostudies-literature

Gene expression profiling of single cells on large-scale oligonucleotide arrays.
| S-EPMC1635316 | biostudies-literature

SCANPY: large-scale single-cell gene expression data analysis.
| S-EPMC5802054 | biostudies-other

Bagging statistical network inference from large-scale gene expression data.
| S-EPMC3316596 | biostudies-literature

covRNA: discovering covariate associations in large-scale gene expression data.
| S-EPMC7038619 | biostudies-literature

A highly sensitive and specific system for large-scale gene expression profiling.
| S-EPMC2267712 | biostudies-literature

Latent network-based representations for large-scale gene expression data analysis.
| S-EPMC7394327 | biostudies-literature

choros: correction of sequence-based biases for accurate quantification of ribosome profiling data.
| S-EPMC9980091 | biostudies-literature

Deciphering gene expression patterns using large-scale transcriptomic data and its applications.
| S-EPMC11562847 | biostudies-literature