Project description:BACKGROUND: This paper introduces and applies a genome wide predictive study to learn a model that predicts whether a new subject will develop breast cancer or not, based on her SNP profile. RESULTS: We first genotyped 696 female subjects (348 breast cancer cases and 348 apparently healthy controls), predominantly of Caucasian origin from Alberta, Canada using Affymetrix Human SNP 6.0 arrays. Then, we applied EIGENSTRAT population stratification correction method to remove 73 subjects not belonging to the Caucasian population. Then, we filtered any SNP that had any missing calls, whose genotype frequency was deviated from Hardy-Weinberg equilibrium, or whose minor allele frequency was less than 5%. Finally, we applied a combination of MeanDiff feature selection method and KNN learning method to this filtered dataset to produce a breast cancer prediction model. LOOCV accuracy of this classifier is 59.55%. Random permutation tests show that this result is significantly better than the baseline accuracy of 51.52%. Sensitivity analysis shows that the classifier is fairly robust to the number of MeanDiff-selected SNPs. External validation on the CGEMS breast cancer dataset, the only other publicly available breast cancer dataset, shows that this combination of MeanDiff and KNN leads to a LOOCV accuracy of 60.25%, which is significantly better than its baseline of 50.06%. We then considered a dozen different combinations of feature selection and learning method, but found that none of these combinations produces a better predictive model than our model. We also considered various biological feature selection methods like selecting SNPs reported in recent genome wide association studies to be associated with breast cancer, selecting SNPs in genes associated with KEGG cancer pathways, or selecting SNPs associated with breast cancer in the F-SNP database to produce predictive models, but again found that none of these models achieved accuracy better than baseline. CONCLUSIONS: We anticipate producing more accurate breast cancer prediction models by recruiting more study subjects, providing more accurate labelling of phenotypes (to accommodate the heterogeneity of breast cancer), measuring other genomic alterations such as point mutations and copy number variations, and incorporating non-genetic information about subjects such as environmental and lifestyle factors.
Project description:BackgroundMicroarray experiments enable simultaneous measurement of the expression levels of virtually all transcripts present in cells, thereby providing a 'molecular picture' of the cell state. On the other hand, the genomic responses to a pharmacological or hormonal stimulus are dynamic molecular processes, where time influences gene activity and expression. The potential use of the statistical analysis of microarray data in time series has not been fully exploited so far, due to the fact that only few methods are available which take into proper account temporal relationships between samples.ResultsWe compared here four different methods to analyze data derived from a time course mRNA expression profiling experiment which consisted in the study of the effects of estrogen on hormone-responsive human breast cancer cells. Gene expression was monitored with the innovative Illumina BeadArray platform, which includes an average of 30-40 replicates for each probe sequence randomly distributed on the chip surface. We present and discuss the results obtained by applying to these datasets different statistical methods for serial gene expression analysis. The influence of the normalization algorithm applied on data and of different parameter or threshold choices for the selection of differentially expressed transcripts has also been evaluated. In most cases, the selection was found fairly robust with respect to changes in parameters and type of normalization. We then identified which genes showed an expression profile significantly affected by the hormonal treatment over time. The final list of differentially expressed genes underwent cluster analysis of functional type, to identify groups of genes with similar regulation dynamics.ConclusionsSeveral methods for processing time series gene expression data are presented, including evaluation of benefits and drawbacks of the different methods applied. The resulting protocol for data analysis was applied to characterization of the gene expression changes induced by estrogen in human breast cancer ZR-75.1 cells over an entire cell cycle.
Project description:This is Phase II Trial of 4courses of 5-fluorouracil, doxorubicin and cyclophosphamide follwed by 4 additional courses of weekly docetaxel and capecitabine administered as Preoperative Therapy for Patients with Locally Advanced Breast Cancer, Stages II and III by US oncology (PROTOCOL 02-103) We performed gene set analysis (GSA) using functionally annotated gene sets corresponding to almost all known biological processes in ER-positive/HER2negative and ER-negative/HER2-negative breast cancer, respectively. Pre-treatment FNA from primary tumors were obtained and RNA extracted and hybridized to afymetrix microarrays according to manufacturer protocol.
Project description:Tumor samples were obtained from patients with stage II-III breast cancer before starting neoadjuvant chemotherapy with four cycles of 5-fluorouracil/epirubicin/cyclophosphamide (FEC) followed by four cycles of docetaxel/capecitabine (TX) on US Oncology clinical trial 02-103. Most patients with HER-2-positive cancer also received trastuzumab (H). Pre-treatment FNA from primary tumors were obtained and RNA extracted and hybridized to affymetrix microarrays according to manufacturer protocol.
Project description:Until recently, an elevated disease risk has been ascribed to a genetic predisposition, however, exciting progress over the past years has discovered alternate elements of inheritance that involve epigenetic regulation. Epigenetic changes are heritably stable alterations that include DNA methylation, histone modifications and RNA-mediated silencing. Aberrant DNA methylation is a common molecular basis for a number of important human diseases, including breast cancer. Changes in DNA methylation profoundly affect global gene expression patterns. What is emerging is a more dynamic and complex association between DNA methylation and gene expression than previously believed. Although many tools have already been developed for analyzing genome-wide gene expression data, tools for analyzing genome-wide DNA methylation have not yet reached the same level of refinement. Here we provide an in-depth analysis of DNA methylation in parallel with gene expression data characteristics and describe the particularities of low-level and high-level analyses of DNA methylation data. Low-level analysis refers to pre-processing of methylation data (i.e. normalization, transformation and filtering), whereas high-level analysis is focused on illustrating the application of the widely used class comparison, class prediction and class discovery methods to DNA methylation data. Furthermore, we investigate the influence of DNA methylation on gene expression by measuring the correlation between the degree of CpG methylation and the level of expression and to explore the pattern of methylation as a function of the promoter region.
Project description:Breast cancer metastases are most commonly found in bone, an indication of poor prognosis. Pathway-based biomarkers identification may help elucidate the cellular signature of breast cancer metastasis in bone, further characterizing the etiology and promoting new therapeutic approaches. We extracted gene expression profiles from mouse macrophages from the GEO dataset, GSE152795 using the GEO2R webtool. The differentially expressed genes (DEGs) were filtered by log2 fold-change with threshold 1.5 (FDR < 0.05). STRING database and Enrichr were used for GO-term analysis, miRNA and TF analysis associated with DEGs. Autodock Vienna was exploited to investigate interaction of anti-cancer drugs, Actinomycin-D and Adriamycin. Sensitivity and specificity of DEGs was assessed using receiver operating characteristic (ROC) analyses. A total of 61 DEGs, included 27 down-regulated and 34 up-regulated, were found to be significant in breast cancer bone metastasis. Major DEGs were associated with lipid metabolism and immunological response of tumor tissue. Crucial DEGs, Bcl3, ADGRG7, FABP4, VCAN, and IRF4 were regulated by miRNAs, miR-497, miR-574, miR-138 and TFs, CCDN1, STAT6, IRF8. Docking analysis showed that these genes possessed strong binding with the drugs. ROC analysis demonstrated Bcl3 is specific to metastasis. DEGs Bcl3, ADGRG7, FABP4, IRF4, their regulating miRNAs and TFs have strong impact on proliferation and metastasis of breast cancer in bone tissues. In conclusion, present study revealed that DEGs are directly involved in of breast tumor metastasis in bone tissues. Identified genes, miRNAs, and TFs can be possible drug targets that may be used for the therapeutics. However, further experimental validation is necessary.