Project description:Cancer primarily develops because of somatic alterations in the genome. Advances in sequencing have enabled large-scale sequencing studies across many tumor types, emphasizing the discovery of alterations in protein-coding genes. However, the protein-coding exome comprises less than 2% of the human genome. Here we analyze the complete genome sequences of 863 human tumors from The Cancer Genome Atlas and other sources to systematically identify noncoding regions that are recurrently mutated in cancer. We use new frequency- and sequence-based approaches to comprehensively scan the genome for noncoding mutations with potential regulatory impact. These methods identify recurrent mutations in regulatory elements upstream of PLEKHS1, WDR74 and SDHD, as well as previously identified mutations in the TERT promoter. SDHD promoter mutations are frequent in melanoma and are associated with reduced gene expression and poor prognosis. The non-protein-coding cancer genome remains widely unexplored, and our findings represent a step toward targeting the entire genome for clinical purposes.
Project description:BackgroundSynonymous mutations are able to change the tAI (tRNA adaptation index) of a codon and consequently affect the local translation rate. Intuitively, one may hypothesize that those synonymous mutations which increase the tAI values are favored by natural selection.ResultsWe use the maize (Zea mays) genome to test our assumption. The first supporting evidence is that the tAI-increasing synonymous mutations have higher fixed-to-polymorphic ratios than the tAI-decreasing ones. Next, the DAF (derived allele frequency) or MAF (minor allele frequency) of the former is significantly higher than the latter. Moreover, similar results are obtained when we investigate CAI (codon adaptation index) instead of tAI.ConclusionThe synonymous mutations in the maize genome are not strictly neutral. The tAI-increasing mutations are positively selected while those tAI-decreasing ones undergo purifying selection. This selection force might be weak but should not be automatically ignored.
Project description:BackgroundBecause species-specific gene expression is driven by species-specific regulation, understanding the relationship between sequence and function of the regulatory regions in different species will help elucidate how differences among species arise. Despite active experimental and computational research, relationships among sequence, conservation, and function are still poorly understood.ResultsWe compared transcription factor occupied segments (TFos) for 116 human and 35 mouse TFs in 546 human and 125 mouse cell types and tissues from the Human and the Mouse ENCODE projects. We based the map between human and mouse TFos on a one-to-one nucleotide cross-species mapper, bnMapper, that utilizes whole genome alignments (WGA). Our analysis shows that TFos are under evolutionary constraint, but a substantial portion (25.1% of mouse and 25.85% of human on average) of the TFos does not have a homologous sequence on the other species; this portion varies among cell types and TFs. Furthermore, 47.67% and 57.01% of the homologous TFos sequence shows binding activity on the other species for human and mouse respectively. However, 79.87% and 69.22% is repurposed such that it binds the same TF in different cells or different TFs in the same cells. Remarkably, within the set of repurposed TFos, the corresponding genome regions in the other species are preferred locations of novel TFos. These events suggest exaptation of some functional regulatory sequences into new function. Despite TFos repurposing, we did not find substantial changes in their predicted target genes, suggesting that CRMs buffer evolutionary events allowing little or no change in the TFos - target gene associations. Thus, the small portion of TFos with strictly conserved occupancy underestimates the degree of conservation of regulatory interactions.ConclusionWe mapped regulatory sequences from an extensive number of TFs and cell types between human and mouse using WGA. A comparative analysis of this correspondence unveiled the extent of the shared regulatory sequence across TFs and cell types under study. Importantly, a large part of the shared regulatory sequence is repurposed on the other species. This sequence, fueled by turnover events, provides a strong case for exaptation in regulatory elements.
Project description:Accelerated evolution of regulatory sequence can alter the expression pattern of target genes, and cause phenotypic changes. In this study, we used DNase I hypersensitive sites (DHSs) to annotate putative regulatory sequences in the human genome, and conducted a genome-wide analysis of the effects of accelerated evolution on regulatory sequences. Working under the assumption that local ancient repeat elements of DHSs are under neutral evolution, we discovered that ?0.44% of DHSs are under accelerated evolution (ace-DHSs). We found that ace-DHSs tend to be more active than background DHSs, and are strongly associated with epigenetic marks of active transcription. The target genes of ace-DHSs are significantly enriched in neuron-related functions, and their expression levels are positively selected in the human brain. Thus, these lines of evidences strongly suggest that accelerated evolution on regulatory sequences plays important role in the evolution of human-specific phenotypes.
Project description:Analysis of NSCLC development at microRNA expression level. Results shows that has-miR-96 is significant up-regulated in the cancer tissue compared with the adjacent normal tissue. Illumina Human v2 MicroRNA expression beadchip Total microRNA obtained from 6 paired NSCLC cancer tissue and the adjacent normal tissue.
Project description:Analysis of NSCLC development at gene expression level. Results shows that obvious activation of cell cycle pathway, and significant repression of cell-cell communication pathways. Illumina HumanHT-12 V4.0 expression beadchip Total RNA obtained from 6 paired NSCLC cancer tissue and the adjacent normal tissue.
Project description:Mouth ulcers are the most common ulcerative condition and encompass several clinical diagnoses, including recurrent aphthous stomatitis (RAS). Despite previous evidence for heritability, it is not clear which specific genetic loci are implicated in RAS. In this genome-wide association study (n = 461,106) heritability is estimated at 8.2% (95% CI: 6.4%, 9.9%). This study finds 97 variants which alter the odds of developing non-specific mouth ulcers and replicate these in an independent cohort (n = 355,744) (lead variant after meta-analysis: rs76830965, near IL12A, OR 0.72 (95% CI: 0.71, 0.73); P = 4.4e-483). Additional effect estimates from three independent cohorts with more specific phenotyping and specific study characteristics support many of these findings. In silico functional analyses provide evidence for a role of T cell regulation in the aetiology of mouth ulcers. These results provide novel insight into the pathogenesis of a common, important condition.
Project description:Pharmacological and functional genomic screens play an essential role in the discovery and characterization of therapeutic targets and associated pharmacological inhibitors. Although these screens affect thousands of gene products, the typical readout is based on low complexity rather than genome-wide assays. To address this limitation, we introduce pooled library amplification for transcriptome expression (PLATE-Seq), a low-cost, genome-wide mRNA profiling methodology specifically designed to complement high-throughput screening assays. Introduction of sample-specific barcodes during reverse transcription supports pooled library construction and low-depth sequencing that is 10- to 20-fold less expensive than conventional RNA-Seq. The use of network-based algorithms to infer protein activity from PLATE-Seq data results in comparable reproducibility to 30 M read sequencing. Indeed, PLATE-Seq reproducibility compares favorably to other large-scale perturbational profiling studies such as the connectivity map and library of integrated network-based cellular signatures.Despite the importance of pharmacological and functional genomic screens the readouts are of low complexity. Here the authors introduce PLATE-Seq, a low-cost genome-wide mRNA profiling method to complement high-throughput screening.