Project description:Genome-wide nucleosome position data for wildtype and mutant strains in S. cerevisiae, C. albicans, and S. pombe Illumina sequencing of mononucleosomal DNA isolated from mid-log cultures grown in rich medium (abbreviated CM, in house recipe). S. pombe samples were grown in YES medium at permissive temperature (30C) and restrictive temperature (35C)
Project description:Short-read DNA sequencing technologies provide new tools to answer biological questions. However, high cost and low throughput limit their widespread use, particularly in organisms with smaller genomes such as S. cerevisiae. Although ChIP-Seq in mammalian cell lines is replacing array-based ChIP-chip as the standard for transcription factor binding studies, ChIP-Seq in yeast is still underutilized compared to ChIP-chip. We developed a multiplex barcoding system that allows simultaneous sequencing and analysis of multiple samples using Illumina’s platform. We applied this method to analyze the chromosomal distributions of three yeast DNA binding proteins (Ste12, Cse4 and RNA PolII) and a reference sample (input DNA) in a single experiment and demonstrate its utility for rapid and accurate results at reduced costs. We developed a barcoding ChIP-Seq method for the concurrent analysis of transcription factor binding sites for yeast. Our multiplex strategy generated high quality data that was indistinguishable from data obtained with non-barcoded libraries. None of the barcoded adapters induced differences relative to a non-barcoded adapter when applied to the same DNA sample. We used this method to map the binding sites for Cse4, Ste12 and Pol II throughout the yeast genome and we found 148 binding targets for Cse4, 823 targets for Ste12 and 2508 targets for PolII. Cse4 was strongly bound to all yeast centromeres as expected and the remaining non-centromeric targets correspond to highly expressed genes in rich media, the latter constituting a novel finding. We designed a multiplex short-read DNA sequencing method to perform efficient ChIP-Seq in yeast and other small genome model organisms. This method produces accurate results with higher throughput and reduced cost. Given constant improvements in high-throughput sequencing technologies, increasing multiplexing will be possible to further decrease costs per sample and to accelerate the completion of large consortium projects such as modENCODE.
Project description:The Radiotherapy Optimisation Test Set (TROTS) is an extensive set of problems originating from radiotherapy (radiation therapy) treatment planning. This dataset is created for 2 purposes: (1) to supply a large-scale dense dataset to measure performance and quality of mathematical solvers, and (2) to supply a dataset to investigate the multi-criteria optimisation and decision-making nature of the radiotherapy problem. The dataset contains 120 problems (patients), divided over 6 different treatment protocols/tumour types. Each problem contains numerical data, a configuration for the optimisation problem, and data required to visualise and interpret the results. The data is stored as HDF5 compatible Matlab files, and includes scripts to work with the dataset.
Project description:Designers of location algorithms share test data sets (benchmarks) to be able to compare performance of newly developed algorithms. In previous decades, the availability of locational data was limited. Big data has revolutionised the amount and detail of information available about human activities and the environment. It is expected that integration of big data into location analysis will increase the resolution and precision of input data. Consequently, the size of solved problems will significantly increase the demand on the development of algorithms that will be able to solve such problems. Accessibility of realistic large scale test data sets, with the number of demands points above 100,000, is very limited. The presented data set covers entire area of Slovakia and consists of the graph of the road network and almost 700,000 connected demand points. The population of 5.5 million inhabitants is allocated to the locations of demand points considering the residential population grid to estimate the size of the demand. The resolution of demand point locations is 100 m. With this article the test data is made publicly available to enable other researches to investigate their algorithms. The second area of its utilisation is the design of methods to eliminate aggregation errors that are usually present when considering location problems of such size. The data set is related to two research articles: "A Versatile Adaptive Aggregation Framework for Spatially Large Discrete Location-Allocation Problem" (Cebecauer and Buzna, 2017) [1] and "Effects of demand estimates on the evaluation and optimality of service centre locations" (Cebecauer et al., 2016) [2].
Project description:aCGH data was used in Paradigm analysis for exploration of networks affected by copy number and gene expression changes based on mutation spectra of recurrently mutated genes in breast cancer.
Project description:In this data article, a reconstructed database, which provides information from PHM08 challenge data set, is presented. The original turbofan engine data were from the Prognostic Center of Excellence (PCoE) of NASA Ames Research Center (Saxena and Goebel, 2008), and were simulated by the Commercial Modular Aero-Propulsion System Simulation (C-MAPSS) (Saxena et al., 2008). The data set is further divided into "training", "test" and "final test" subsets. It is expected from collaborators to train their models using "training" data subset, evaluate the Remaining Useful Life (RUL) prediction performance on "test" subset and finally, apply the models to the "final test" subset for competition. However, the "final test" results can only be submitted once by email to PCoE. Before the results are sent for performance evaluation, in order to pre-validate the dataset with true RUL values, this data article introduces reconstructed secondary datasets derived from the noisy degradation patterns of original trajectories. Reconstructed database refers to data that were collected from the training trajectories. Fundamentally, it is formed of individual partial trajectories in which the RUL is known as a ground truth. Its use provides a robust validation of the model developed for the PHM08 data challenge that would otherwise be ambiguous due to the high-risk of one-time submission. These data and analyses support the research data article "A Neural Network Filtering Approach for Similarity-Based Remaining Useful Life Estimations" (Bektas et al., 2018).
Project description:Sir2 and the homologous proteins, Hst1, Hst2, Hst3, and Hst4 from Saccharomyces cerevisiae are NAD+-dependent histone deacetylases of the sirtuin protein family. Sir2 functions in transcriptional silencing at the silent mating-type loci, telomeres, and rDNA locus, but also promotes replicative lifespan. To gain a better understanding of the chromatin-regulatory roles carried out by Sir2 and the Hst proteins, we performed ChIP-sequencing analysis on all five sirtuins and Sum1, the DNA binding partner for Hst1. Sir2, Hst1, and Sum1 were abundantly, and functionally co-enriched at several major targets, including the telomeric repeats, where they were required for maintaining proper telomere repeat length. At tRNA target genes they were required for efficient cohesin and condensin deposition. Across the open reading frames of glycolytic and ribosomal protein genes, Sir2 and Hst1 functioned in NAD+-dependent transcriptional repression at the diauxic shift, directly linking Sir2 to glucose metabolism, which could have implications for longevity. Six factors and Input ChIP-seq samples were analyzed in Saccharomyces eerevisiae.
Project description:Gene expression test data set from rat liver samples exposed to either 150, 1500 or 2000 mg/kg of APAP for 3, 6 or 24 hours. The Supplementary file (appended below) contains the mapping for the decoding of blinded samples. Keywords: Dose response, Time course, Microarray, Gene expression