Project description:Complements GSE56044 with 36 additional samples belonging to the large-cell carcinoma (LCC) and large-cell neuroendocrine (LCNEC) histologies. LCC classification is according to WHO 2004 guidelines. Genomic DNA from 36 lung cancer samples was treated with bisulfite and hybridized to Illumina methylation 450K arrays using standard protocols. Signal intensities were obtained from GenomeStudio (Illumina), converted to beta-values, filtered, and normalized to remove biases between Infinium I and II probes. Both raw intensity values, signal detection p-values and the final normalized data are included for each sample.
Project description:We propose a statistical algorithm MethylPurify that uses regions with bisulfite reads showing discordant methylation levels to infer tumor purity from tumor samples alone. With purity estimate, MethylPurify can identify differentially methylated regions (DMRs) from individual tumor samples without genomic variation information or prior knowledge from other datasets. In simulations with mixed bisulfite reads from cancer and normal cell lines, MethylPurify correctly inferred tumor purity and identified over 96% of the DMRs. On real patient data where tumor to normal comparison were used as golden standard, MethylPurify called DMR from tumor samples alone at over 57% sensitivity and 91% specificity. Lung adenocarcinoma cancer and normal tissues from 5 patients were captured by Agilent SureSelect Methyl-Seq system, followed by bisulfite sequencing.
Project description:More then 10 million raw reads were acquired. In total, 238 unique mature sequences which contain 142 conserved miRNA and 96 novel miRNAs were identified.
Project description:Purpose: The goal of the study is to compare NGS-derived transcriptome in an isogenic breast cancer progression model cell line system. By comparing the protein-coding and noncoding gene expression of normal versus tumorigenic breast cancer cell lines, we will be able to identify genes that show aberrant expression during breast cancer progression. Methods: We isolated poly A + RNA from four isogenic mammary epithelial cell lines showing various stages of breast cancer progression. The model system comprises of 4 isogenic cell lines, categorized as M1-M4. M1 represents the normal, non-tumorigenic, immortalized MCF10A cells. Transfection of MCF10A with activated T24-HRAS and selection by xenografting generated the M2 (MCF10AT1k.cl2) cell line, which is highly proliferative and gives rise to premalignant lesions with the potential for neoplastic progression. M3 (MCF10Ca1h) and M4 (MCF10CA1a.cl1) were derived from occasional carcinomas arising from xenografts of M2 cells. M3 gives predominantly well-differentiated low-grade carcinomas on xenografting, while M4 gives rise to relatively undifferentiated carcinomas and colonizes to the lung upon injection of these cells into the tail vein. We performed paired-end deep sequencing (190-260 million reads/sample) of poly A+ RNA isolated from these cells that were cultured as 3D acini in biological duplicates. Reads of the samples were trimmed for adapters and low-quality bases using Trimmomatic software before alignment with the reference genome (Human - hg19) and the annotated transcripts using STAR. The average mapping rate of all samples is 96%. Unique alignment is above 87%. There are 3.74 to 4.07% unmapped reads. The mapping statistics are calculated using Picard software. The samples have 0.59% ribosomal bases. Percent coding bases are between 67-72%. Percent UTR bases are 23-26%, and mRNA bases are between 94-96% for all the samples. Library complexity is measured in terms of unique fragments in the mapped reads using Picard’s MarkDuplicate utility. The samples have 31-52% non-duplicate reads. In addition, the gene expression quantification analysis was performed for all samples using STAR/RSEM tools. Both the normalized count and the raw count are provided as part of the data delivery. Results: Using an optimised data analysis workflow, we mapped ~190-250 million reads/sample and identified expression of 17396 protein-coding genes and 11509 long noncoding RNA genes. We initially compared gene expression between M1 and M4 cells. 4668 genes (2815 protein coding and 1853 lncRNAs) showed ~2 fold change in their expression between M1 and M4 cells in both biological repeats. 1159 out of the 1853 deregulated lncRNAs showed 2-fold upregulation in M4 cells in both repeats. On the other hand, 694 of lncRNAs displayed reduced levels in M4 compared to M1 cells. Further, we noticed that natural antisense transcripts (NATs) comprised one of the largest types of lncRNAs (504 out of 1853) that showed deregulation in M4 cells. Conclusion: Our study revealed differential expression of thousands of protein-coding and long noncoding RNAs during breast cancer progression using the isogenic cell line model system. This data set will act as a rich resource for downstream mechanistic studies to determine the role of these differentially expressed genes in breast cancer progression.
Project description:We propose a statistical algorithm MethylPurify that uses regions with bisulfite reads showing discordant methylation levels to infer tumor purity from tumor samples alone. With purity estimate, MethylPurify can identify differentially methylated regions (DMRs) from individual tumor samples without genomic variation information or prior knowledge from other datasets. In simulations with mixed bisulfite reads from cancer and normal cell lines, MethylPurify correctly inferred tumor purity and identified over 96% of the DMRs. On real patient data where tumor to normal comparison were used as golden standard, MethylPurify called DMR from tumor samples alone at over 57% sensitivity and 91% specificity.
Project description:Lung cancer is a highly heterogeneous disease in terms of both underlying genetic lesions and response to therapeutic treatments. We performed deep whole genome sequencing and transcriptome sequencing on 19 lung cancer cell lines and 3 lung tumor/normal pairs (provisional dbGaP accession number; phs000299.v2.p1). Overall, our data show that cell line models exhibit similar mutation spectra to human tumor samples. Taken together, these data present a comprehensive genomic landscape of a large number of lung cancer samples and further demonstrate that cancer specific alternative splicing is a widespread phenomenon that has potential utility as therapeutic biomarkers.
Project description:The aim of the project was to explore large extracellular vesicles capacity to seperate non lung cancer from lung cancer cases. A case control study of patients suspected for lung cancer (12 non lung cancer and 12 lung cancers). The raw files are labeled according to clinical status. Small and large extracellular vesicles were isolated by differential centrifugation. The proteome of full BAL, vesicle depleted BAL, small and large extracellular vesicles were characterized by LC-MS. Small extracellular vesicles were further analyzed by nanoparticle tracking analysis, Transmission electron microscopy and western blots.