Project description:Using a public reference data set of 82 unique entities, 382 nanopore-sequenced brain tumor samples were classified based on their methylation status through an ad hoc random forest algorithm. As a measure of confidence, score recalibration was performed and platform-specific thresholds were defined.
Project description:To establish a systematic approach for the determination of human biological & disease relevance through the generation of epigenome data in cell types of interest. Integration of cell type epigenome data with existing & newly generated reference data from human tissue and cell types to identify assay systems which will provide greater confidence in translating target biology and compound pharmacology to patients. To provide a framework for the identification of optimal cell types for target identification/validation studies and drug discovery programs across multiple therapeutic areas. Development of bioinformatics pipelines and CTTV components for analysis and provision of data
2015-06-09 | E-ERAD-369 | biostudies-arrayexpress
Project description:High Confidence Fetal Sex Determination from Non-Invasive Prenatal Testing Low Coverage Semiconductor Sequencing Data
Project description:Recent advances in software-driven glycopeptide identification in LC-MS/MS-based N-glycoproteomics have facilitated biochemical studies reporting thousands of intact N-glycopeptides, i.e. N-glycan-conjugated peptides, but the automated identification process remains to be scrutinized. Herein, we explore the efficiency of site-specific glycoprofiling using the PTM-centric search-engine Byonic relative to manual expert annotation. To allow an appropriately deep comparison, the study utilised typical glycoproteomics acquisition and data analysis strategies, but of a single glycoprotein, the uncharacterised N-glycosylated (Asn160, Asn268 and Asn302) human basigin. Detailed site-specific reference glycoprofiles of purified basigin were manually established using ion trap CID-MS/MS and high-resolution Q-Exactive Orbitrap HCD-MS/MS acquisition of tryptic N-glycopeptides and released N-glycans. The basigin N-glycosylation sites, which showed extensive micro- and macro-heterogeneity, were then glycoprofiled using Byonic with or without a background of complex peptides using Q-Exactive Orbitrap HCD-MS/MS data. The glycoprofiling efficiencies were assessed against the site-specific reference glycoprofiles and target and decoy proteome databases. The search criteria and confidence thresholds (Byonic scores) recommended by the vendor provided very high glycoprofiling accuracy and coverage (both >80%) and low peptide FDRs (<1%). The data complexity, search parameters including search space (proteome/glycome size), mass tolerance and peptide modifications, and confidence thresholds affected the glycoprofiling efficiency and analysis time. Automated identification of peptide modifications (methionine oxidation/carbamidomethylation) that coincide with monosaccharide mass differences (Fuc/Hex/HexNAc) and accurately distinguishing isobaric (Hex1NeuAc1-R/Fuc1NeuGc1-R) or near-isobaric (NeuAc1-R/Fuc2-R) monosaccharide sub-compositions remain challenging, arguing particular attention to such “difficult-to-identify” N-glycopeptides. The presented analysis provides valuable insights into automated glycopeptide identification; knowledge that facilitates further developments in FDR-based glycoproteomics.
Project description:A high-confidence map of the direct, functional targets of each transcription factor (TF) requires convergent evidence from independent sources. Two significant sources of evidence are TF binding locations and the transcriptional responses to direct TF perturbations. Systematic data sets of both types exist for yeast and human. Standard analysis of the genes whose regulatory DNA is bound by a TF, assayed by ChIP-chip/seq, and the genes that respond to a perturbation of that TF, shows that these two data sources rarely converge on a common set of direct, functional targets. Even taking the few genes that are both bound and responsive as direct functional targets is not safe -- when there are many non-functional binding sites and many indirect targets, non-functional sites are expected to occur in the cis-regulatory DNA of indirect targets by chance. To address this problem, we introduce Dual Threshold Optimization, a new method for setting significance thresholds on binding and response data, and show that it improves convergence. It also enables comparison of binding data to perturbation-response data that has been processed by network inference algorithms, which further improves convergence. Next, we analyze a comprehensive new data set measuring the transcriptional response shortly after inducing overexpression of a yeast TF. We also present a new yeast binding location data set obtained by transposon calling cards and compare it to recent ChIP-exo data. The combination of dual threshold optimization and network inference greatly expands the high-confidence TF network map in both yeast and human. In yeast, measuring the response shortly after inducing TF overexpression and measuring binding locations by using transposon calling cards or ChIP-exo improve the network synergistically.
Project description:ChIP-Seq is a technique used to analyse protein-DNA interactions. The protein-DNA complex is pulled down using a protein antibody, after which sequencing and analysis of the bound DNA fragments is performed. A key bioinformatics analysis step is “peak” calling - identifying regions of enrichment. Benchmarking studies have consistently shown that no optimal peak caller exists. Peak callers have distinct selectivity and specificity characteristics which are often not additive and seldom completely overlap in many scenarios. In the absence of a universal peak caller, we rationalized one ought to utilize multiple peak-callers to 1) gauge peak confidence as determined through detection by multiple algorithms, and 2) more thoroughly survey the protein-bound landscape by capturing peaks not detected by individual peak callers owing to algorithmic limitations and biases. We therefore developed an integrated ChIP-Seq Analysis Pipeline (ChIP AP) which performs all analysis steps from raw fastq files to final result, and utilizes four commonly used peak callers to more thoroughly and comprehensively analyse datasets. Results are integrated and presented in a single file enabling users to apply selectivity and sensitivity thresholds to select the consensus peak set, the union peak set, or any sub-set in-between to more confidently and comprehensively explore the protein bound landscape. (https://github.com/JSuryatenggara/ChIP-AP).
Project description:The nematode Auanema rhodensis is trioecious (co-occurrence of males, females and self-fertile hermaphrodites). To better understand its sex determination system, we have compared the transcriptomic profiles of early (L2) females, hermaphrodites and converted females (hermaphrodite-fated larvae induced to develop as females). Additionally, we sequenced the transcriptome of adult males and individuals from various stages and sexes (mixed stages samples) to compare global gene expression profiles along the assembled draft chromosomes of A. rhodensis (BioProject PRJEB29492). The RNA-seq data was also used to predict genes in the assembled genome. We generated three biological replicates for each RNA-seq condition (L2 females, L2 converted females, L2 hermaphrodites, males and mixed stages). Comparisons of the expression profiles of the L2 conditions was performed to identify genes potentially involved in the sexual differentiation process, using a standard RNA-seq comparison approach. Briefly, the cleaned reads were aligned to the genome using STAR, the abundance and identification of differentially expressed genes were assessed using FeatureCounts and DEseq2 (using as thresholds an absolute log2(Fold Change) >= 2 and an FDR <0.01).
Project description:These are the tiling array data for the experiments describing LADs on murine chromosomes 5, 12, and 15 by Dam-ID, as determined by LMNB and EMD Dam tagging and detection experiments Supplemental Bed file contains GADA algorithm calls for genomic regions of chromosomes 5, 12 and 15 (-1= no lad, high confidence, 0=indeterminate, 1=LAD, high confidence)