Dataset Information

Portraits of breast cancer progression.

ABSTRACT:

Background

Clustering analysis of microarray data is often criticized for giving ambiguous results because of sensitivity to data perturbation or clustering techniques used. In this paper, we describe a new method based on principal component analysis and ensemble consensus clustering that avoids these problems.

Results

We illustrate the method on a public microarray dataset from 36 breast cancer patients of whom 31 were diagnosed with at least two of three pathological stages of disease (atypical ductal hyperplasia (ADH), ductal carcinoma in situ (DCIS) and invasive ductal carcinoma (IDC). Our method identifies an optimum set of genes and divides the samples into stable clusters which correlate with clinical classification into Luminal, Basal-like and Her2+ subtypes. Our analysis reveals a hierarchical portrait of breast cancer progression and identifies genes and pathways for each stage, grade and subtype. An intriguing observation is that the disease phenotype is distinguishable in ADH and progresses along distinct pathways for each subtype. The genetic signature for disease heterogeneity across subtypes is greater than the heterogeneity of progression from DCIS to IDC within a subtype, suggesting that the disease subtypes have distinct progression pathways. Our method identifies six disease subtype and one normal clusters. The first split separates the normal samples from the cancer samples. Next, the cancer cluster splits into low grade (pathological grades 1 and 2) and high grade (pathological grades 2 and 3) while the normal cluster is unchanged. Further, the low grade cluster splits into two subclusters and the high grade cluster into four. The final six disease clusters are mapped into one Luminal A, three Luminal B, one Basal-like and one Her2+.

Conclusion

We confirm that the cancer phenotype can be identified in early stage because the genes altered in this stage progressively alter further as the disease progresses through DCIS into IDC. We identify six subtypes of disease which have distinct genetic signatures and remain separated in the clustering hierarchy. Our findings suggest that the heterogeneity of disease across subtypes is higher than the heterogeneity of the disease progression within a subtype, indicating that the subtypes are in fact distinct diseases.

SUBMITTER: Dalgin GS

PROVIDER: S-EPMC1978212 | biostudies-literature | 2007 Aug

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Portraits of breast cancer progression.

Dalgin Gul S GS Alexe Gabriela G Scanfeld Daniel D Tamayo Pablo P Mesirov Jill P JP Ganesan Shridar S DeLisi Charles C Bhanot Gyan G

BMC bioinformatics 20070806

<h4>Background</h4>Clustering analysis of microarray data is often criticized for giving ambiguous results because of sensitivity to data perturbation or clustering techniques used. In this paper, we describe a new method based on principal component analysis and ensemble consensus clustering that avoids these problems.<h4>Results</h4>We illustrate the method on a public microarray dataset from 36 breast cancer patients of whom 31 were diagnosed with at least two of three pathological stages of ...[more]

PMID: 17683614

Similar Datasets

Project description:BackgroundProper cell models for breast cancer primary tumors have long been the focal point in the cancer's research. The genomic comparison between cell lines and tumors can investigate the similarity and dissimilarity and help to select right cell model to mimic tumor tissues to properly evaluate the drug reaction in vitro. In this paper, a comprehensive comparison in copy number variation (CNV), mutation, mRNA expression and protein expression between 68 breast cancer cell lines and 1375 primary breast tumors is conducted and presented.ResultsUsing whole genome expression arrays, strong correlations were observed between cells and tumors. PAM50 gene expression differentiated them into four major breast cancer subtypes: Luminal A and B, HER2amp, and Basal-like in both cells and tumors partially. Genomic CNVs patterns were observed between tumors and cells across chromosomes in general. High C > T and C > G trans-version rates were observed in both cells and tumors, while the cells had slightly higher somatic mutation rates than tumors. Clustering analysis on protein expression data can reasonably recover the breast cancer subtypes in cell lines and tumors. Although the drug-targeted proteins ER/PR and interesting mTOR/GSK3/TS2/PDK1/ER_P118 cluster had shown the consistent patterns between cells and tumor, low protein-based correlations were observed between cells and tumors. The expression consistency of mRNA verse protein between cell line and tumors reaches 0.7076. These important drug targets in breast cancer, ESR1, PGR, HER2, EGFR and AR have a high similarity in mRNA and protein variation in both tumors and cell lines. GATA3 and RP56KB1 are two promising drug targets for breast cancer. A total score developed from the four correlations among four molecular profiles suggests that cell lines, BT483, T47D and MDAMB453 have the highest similarity with tumors.ConclusionsThe integrated data from across these multiple platforms demonstrates the existence of the similarity and dissimilarity of molecular features between breast cancer tumors and cell lines. The cell lines only mirror some but not all of the molecular properties of primary tumors. The study results add more evidence in selecting cell line models for breast cancer research.

Dataset Information

Portraits of breast cancer progression.

Background

Results

Conclusion

Publications

Portraits of breast cancer progression.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets