Project description:We explore different uses of machine learning classifiers, including neural networks, to combine the “signal” from ATAC-seq with its underlying genome sequence in order to classify ATAC-seq peaks on the presence or absence of transcription. We show how a hybrid signal/sequence representation, classified using recurrent neural networks (RNNs), yields the best performance across different cell types.
Project description:Using a chicken-hamster radiation hybrid panel (ChickRH6), we have mapped chicken chromosome(s) that contain possible factor(s) that permit avian polymerase activity, by increasing polymerase activity. We identified four hybrid clones permissive for polymerase activity. Activity was lost following 12 passages respectively in only one of the positive clones. Expression from the four positive radiation hybrid clones at passages 1, and 12 was measured and compared to expression of the hamster recipient line and 17 negative (for polymerase activity) clones.
Project description:The assay for transposase-accessible chromatin followed by sequencing (ATAC-seq) is an inexpensive protocol for measuring open chromatin regions. ATAC-seq is also relatively simple and requires fewer cells than many other high-throughput sequencing protocols. Therefore, it is tractable in numerous settings where other high throughput assays are challenging to impossible. Hence it is important to understand the limits of what can be inferred from ATAC-seq data. In this work, we leverage ATAC-seq to predict the presence of nascent transcription. Nascent transcription assays are the current gold standard for identifying regions of active transcription, including markers for functional transcription factor (TF) binding. We combine mapped short reads from ATAC-seq with the underlying peak sequence, to determine regions of active transcription genome-wide. We show that a hybrid signal/sequence representation classified using recurrent neural networks (RNNs) can identify these regions across different cell types.
Project description:To evaluate the performance characteristics of Signal-C a plasma circulating free-DNA test, to detect colorectal cancer and advanced precancerous lesions (APL) in an average risk screening population for 45 and over.
Project description:We report the application of single-molecule-based sequencing technology for high-throughput profiling of RNA polymerase II phosphorylated at serine 5 (PolII-S5p; the transcription initiation form) in female mouse cultured hybrid cells and female hybrid brain derived from mouse systems with skewed X inactivation based on crosses between C57BL/6J (BL6) and M. spretus. In these systems, alleles can be differentiated by frequent SNPs between mouse species, and the active X (Xa) compared to the haploid set of autosomes from the same species. To examine PolII-S5p occupancy in vivo, ChIP-seq was done in brain from an adult female F1 mouse in which the BL6 X is always active and the spretus X inactive. Uniquely mapped reads containing informative SNPs were assigned to each haploid chromosome set (BL6 or spretus) and were counted to establish allele-specific PolII-S5p occupancy profiles. We found that PolII-S5p allele-specific occupancy with or without normalization by input genomic DNA sequencing data showed that expressed genes on the Xa (>1RPKM) had 30% higher PolII-S5p peak levels at their promoters compared to autosomal genes from the same species (BL6). This result was confirmed by performing an independent allele-specific ChIP-seq analysis on fibroblasts derived from embryonic kidney (Patski cell line) that have the opposite X inactivation pattern from the brain sample, i.e. an Xa from M. spretus and an Xi from BL6. These findings suggest that transcription initiation of X-linked genes is enhanced to contribute to X upregulation in cell lines and in vivo. Examination of allele-specific PolII-S5p occupancy in mouse hybrid cells and brain.
Project description:We report the application of single-molecule-based sequencing technology for high-throughput profiling of RNA polymerase II phosphorylated at serine 5 (PolII-S5p; the transcription initiation form) in female mouse cultured hybrid cells and female hybrid brain derived from mouse systems with skewed X inactivation based on crosses between C57BL/6J (BL6) and M. spretus. In these systems, alleles can be differentiated by frequent SNPs between mouse species, and the active X (Xa) compared to the haploid set of autosomes from the same species. To examine PolII-S5p occupancy in vivo, ChIP-seq was done in brain from an adult female F1 mouse in which the BL6 X is always active and the spretus X inactive. Uniquely mapped reads containing informative SNPs were assigned to each haploid chromosome set (BL6 or spretus) and were counted to establish allele-specific PolII-S5p occupancy profiles. We found that PolII-S5p allele-specific occupancy with or without normalization by input genomic DNA sequencing data showed that expressed genes on the Xa (>1RPKM) had 30% higher PolII-S5p peak levels at their promoters compared to autosomal genes from the same species (BL6). This result was confirmed by performing an independent allele-specific ChIP-seq analysis on fibroblasts derived from embryonic kidney (Patski cell line) that have the opposite X inactivation pattern from the brain sample, i.e. an Xa from M. spretus and an Xi from BL6. These findings suggest that transcription initiation of X-linked genes is enhanced to contribute to X upregulation in cell lines and in vivo.
Project description:How DNA sequence affects the dynamics and position of RNA Polymerase II during transcription remains poorly understood. Here we used naturally occurring genetic variation in F1 hybrid mice to explore how DNA sequence differences affect the genome-wide distribution of Pol II. We measured the position and orientation of Pol II in eight organs collected from heterozygous F1 hybrid mice using ChRO-seq. Our data revealed a strong genetic basis for the precise coordinates of transcription initiation and promoter proximal pause, which was composed of both existing and novel DNA sequence motifs, allowing us to redefine molecular models of both core transcriptional processes. Our results implicate the strength of base pairing between A-T or G-C dinucleotides as key determinants to the position of Pol II initiation and pause. We reveal substantial and heritable differences in the position of transcription termination, which frequently do not affect the composition of the mature mRNA. Finally, we identified frequent, organ-specific changes in transcription that affect mRNA and ncRNA expression across broad genomic domains. Collectively, we reveal how DNA sequences shape core transcriptional processes at single nucleotide resolution in mammals.
Project description:F1 hybrids can outperform their parents in yield and vegetative biomass, features of hybrid vigor which form the basis of the hybrid seed industry. The yield advantage of the F1 is lost in the F2 and subsequent generations. In Arabidopsis, from F2 plants which have a F1 –like phenotype, we have by recurrent selection produced pure breeding F5/F6 lines “Hybrid Mimics”, in which the characteristics of the F1 Hybrid are stabilized. These Hybrid Mimic lines, like the F1 Hybrid, have larger leaves than the parent plant, the leaves having increased photosynthetic cell numbers, and in some lines increased size of cells, suggesting an increased supply of photosynthate. A comparison of the differentially expressed genes in the F1 Hybrid with those of eight Hybrid Mimic lines has identified metabolic pathways altered in both; these pathways include down regulation of defense response pathways and altered abiotic response pathways. F6 Hybrid Mimic lines are mostly homozygous at each locus in the genome yet retain the large F1-like phenotype. Many alleles in the F6 plants, when they are homozygous, have expression levels different to the level in the parent. We consider this altered expression to be a consequence of trans-regulation of genes from one parent by genes from the other parent. Transregulation could also arise from epigenetic modifications in the F1. The pure breeding Hybrid Mimics have been valuable in probing the mechanisms of hybrid vigor and may also prove to be useful hybrid vigor equivalents in agriculture.
Project description:F1 hybrids can outperform their parents in yield and vegetative biomass, features of hybrid vigor which form the basis of the hybrid seed industry. The yield advantage of the F1 is lost in the F2 and subsequent generations. In Arabidopsis, from F2 plants which have a F1 –like phenotype, we have by recurrent selection produced pure breeding F5/F6 lines “Hybrid Mimics”, in which the characteristics of the F1 Hybrid are stabilized. These Hybrid Mimic lines, like the F1 Hybrid, have larger leaves than the parent plant, the leaves having increased photosynthetic cell numbers, and in some lines increased size of cells, suggesting an increased supply of photosynthate. A comparison of the differentially expressed genes in the F1 Hybrid with those of eight Hybrid Mimic lines has identified metabolic pathways altered in both; these pathways include down regulation of defense response pathways and altered abiotic response pathways. F6 Hybrid Mimic lines are mostly homozygous at each locus in the genome yet retain the large F1-like phenotype. Many alleles in the F6 plants, when they are homozygous, have expression levels different to the level in the parent. We consider this altered expression to be a consequence of trans-regulation of genes from one parent by genes from the other parent. Transregulation could also arise from epigenetic modifications in the F1. The pure breeding Hybrid Mimics have been valuable in probing the mechanisms of hybrid vigor and may also prove to be useful hybrid vigor equivalents in agriculture.