Gene signature for relapse prediction in Dukes´B colon cancer
Ontology highlight
ABSTRACT: Purpose: The benefit of postoperative adjuvant chemotherapy in patients with Dukes´ B colorectal cancer is still uncertain and its routine use is not recommended. The five years survival rate is approximately 75% and identification of the patients at high risk of recurrence would represent an important strategy for the use of adjuvant chemotherapy. In this study we identify new prognostic markers for tumor relapse in Dukes´ B colon cancer patients. Patients and Methods: We retrospectively analyzed gene expression profiles in frozen tumor specimens from 16 patients with Dukes´ B colorectal cancer by using high density oligonucleotide microarrays. Data were normalized and subsequently analyzed with two different statistical procedures. The intensity value associated to each spot is the result of subtracting a gaussian function of the noise from the foreground values {Kooperberg, 2002}. After this background subtraction, base 2 logarithms of all data were calculated and genes with more than two missing values were excluded from the analysis. The remaining missing values were replaced by using the KNN imputation method {Troyanskaya, 2001}. The great number of genes present in the microarray allowed us to consider that the overall fluorescence intensity must be the same for all slides, as the vast majority of genes will not change their expression between the two conditions tested. So we used the quantile normalization method {Bolstad, 2003}, which equalizes not only the average intensity but also the range of intensity values between slides. We decided to use two different statistical procedures in the search of significant differences in gene expression between the relapsed and non-relapsed patients. Only the genes selected by both statistical procedures were considered as differentially expressed between relapsed and non-relapsed patients. The first test was a permutation t-test for comparison of two means {Dudoit, 2003} and the second one a variation of the Fisher test based on the work presented by Iizuka et al. {Iizuka, 2003} in which they searched for the optimal number of genes that could differentiate between two groups of samples. Briefly, we used the same algorithms for the calculation of the Fisher criterion, but instead of looking for the optimal subset of genes able to differentiate between groups, what requires a lot of computing capacity, we tried different numbers of candidate genes and selected a number which yielded very good classification results when evaluated by means of Fast ICA {Lee, 2003} and Hierarchical Clustering {Everitt, 1974}. The procedure consisted of selecting the 30 genes with the highest Fisher criterion value in 6 rounds of iteration, each round leaving one sample of the relapsed group and two of the non-relapsed group out of the calculations (a variant of the “leave one out” method). We selected the genes present in at least 3 of the iterations. The intersection of the groups of genes selected by the two statistical procedures was selected as a prognostic signature for relapse in Dukes´stage B colon cancer. Results: Our results show a group of 48 genes differentially expressed between the relapsed and non-relapsed groups with an associated probability below 0,001 in the t test. Another statistical procedure based on the Fisher criterion resulted in 11 genes able to separate both groups. In order to minimize false positives we only considered a good gene signature the 8 genes selected by both statistical procedures. These genes are ribosomal protein S5, chromodomain helicase DNA binding protein 2, lysosomal ATPase V0 subunit A isoform 1, zinc finger protein 148, brain protein I3 (BRI3), hypothetical protein MGC23401 and one unknown clone. In order to verify the obtained results, the differential expression of the first two genes was confirmed by real time PCR. Keywords: repeat sample
ORGANISM(S): Homo sapiens
PROVIDER: GSE2630 | GEO | 2005/05/07
SECONDARY ACCESSION(S): PRJNA92267
REPOSITORIES: GEO
ACCESS DATA