Consensus Preprocessing of GEO and ArrayExpress Normal and Diseased Lung Tissue Microarray Data
Ontology highlight
ABSTRACT: The Affymetrix oligonucleotide microarrays measure gene expression by quantifying intensity of fluorescently labeled gene fragments that bind to sets of 25-mer oligonucleotide probes on the chip with specific sequences tailored to be complementary to the target genes. Each gene is associated with a "probe set" containing several pairs (usually 11) of "perfect match" (perfectly complementary to target sequence) and "mismatch" (different base at position 13 of 25) probes. The raw measurements of each probe set consist of a set of intensities from the probes, which require in silico preprocessing by (1) correcting for background variability, (2) normalizing intensities across samples, and (3) summarizing intensities across the probe set into a single expression value. The output of the summarization step corresponds to the background-adjusted value for the mRNA of interest. We preprocess using GCRMA, which corrects for background variability by accounting for optical noise, probe affinity, and mismatch probe adjustment; normalizes intensities by quantile normalization; and summarizes intensities using a median polish method. To minimize preprocessing batch effects, it is desirable to preprocess all samples in the dataset together. However, preprocessing across multple platforms requires a consolidation of probes with identical sequences, precluding global preprocessing on datasets with multiple platforms using the standard preprocessing pipelines. To address this problem, we have developed and applied a custom preprocessing pipeline to combine the raw .CEL files from multiple platforms that share the same probe sets.
ORGANISM(S): Homo sapiens
PROVIDER: GSE60486 | GEO | 2014/09/18
SECONDARY ACCESSION(S): PRJNA258392
REPOSITORIES: GEO
ACCESS DATA