Project description:Data from the VLA lyssavirus genotyping microarray. The array platform for this data is GEO accession GPL8066, and consists of 624 oligos representing two viral families. The data set itself consists of 14 arrays, 7 hybridised with RNA from mice brains infected with 7 genotypes of lyssaviruses, 1 hybridised with RNA from normal mouse brain, and 6 hybridised with RNA from coded samples consisting of infected mouse brains or control mouse brains. Keywords: Lyssavirus genotyping microarray
Project description:Data from the VLA lyssavirus genotyping microarray. The array platform for this data is GEO accession GPL8066, and consists of 624 oligos representing two viral families. The data set itself consists of 14 arrays, 7 hybridised with RNA from mice brains infected with 7 genotypes of lyssaviruses, 1 hybridised with RNA from normal mouse brain, and 6 hybridised with RNA from coded samples consisting of infected mouse brains or control mouse brains. Keywords: Lyssavirus genotyping microarray Data from the VLA lyssavirus genotyping microarray. The array platform for this data is GEO accession GPL8066, and consists of 624 oligos representing two viral families. The data set itself consists of 14 arrays, 7 hybridised with RNA from mice brains infected with 7 genotypes of lyssaviruses, 1 hybridised with RNA from normal mouse brain, and 6 hybridised with RNA from coded samples consisting of infected mouse brains or control mouse brains. Statistical analysis of the data was done with DetectiV software (Watson et al., 2007). The median and array methods of normalization were used in the statistical analysis of the results. In the median method, DetectiV software calculates the mean fluorescence for each set of probes and normalised against background fluorescence of all probes, assuming that most probes are not hybridized. The array method utilizes an entire control array, e.g. RNA from a known uninfected animal, as the negative control and all probe values are divided by their respective elements from the control array.
Project description:The yeast calibration curve dataset was acquired to compare the accuracy of DIA tools with decreasing contents of target peptides. Four samples (Y1, Y2, Y3 and Y4) with decreasing contents (200, 100, 50 and 25 ng, respectively) of analytes (yeast tryptic peptides) and a high content of background peptides (800 ng human tryptic peptides constantly) were analyzed in triplicate using LC-DIA-MS/MS. The DIA data were processed by different DIA tools based on the spectral library generated from the DDA data. The accuracy of different DIA tools was compared.
Project description:Gene expression data was analyzed to map with urine proteomics data gene expression data from kidney biopsies from kidney transplant patients with and without acute rejection, chronic allograft nephropathy and BK virus nephritis was used to study gene expression changes during acute rejection, chronic allograft nephropathy and bk virus nephropathy. Samples labeled STA16, STA22, STA14, and STA18 were included in the CAN vs no-CAN analysis as no-CAN samples as they also qualified as non-CAN samples.
Project description:Metatranscriptomic and metaproteomic analysis of C.quadricolor symbiotic bacteria for discovery of new potential biosynthetic clusters
Project description:Primary human astrocytes were infected with either monkeypox virus (MPXV clade IIb lineage), vaccinia virus (VACV: Acambis 2000), or controls (MC=monkeypox control, AC = Vaccinia control) at an MOI of 10 for 6 h. Samples (n=4) were analyzed by LC-MS/MS with label-free quantification where the data was acquired by data-dependent acquisition (DDA).
Project description:Text mining methods have added considerably to our capacity to extract biological knowledge from the literature. Recently the field of systems biology has begun to model and simulate metabolic networks, requiring knowledge of the set of molecules involved. While genomics and proteomics technologies are able to supply the macromolecular parts list, the metabolites are less easily assembled. Most metabolites are known and reported through the scientific literature, rather than through large-scale experimental surveys. Thus it is important to recover them from the literature. Here we present a novel tool to automatically identify metabolite names in the literature, and associate structures where possible, to define the reported yeast metabolome. With ten-fold cross validation on a manually annotated corpus, our recognition tool generates an f-score of 78.49 (precision of 83.02) and demonstrates greater suitability in identifying metabolite names than other existing recognition tools for general chemical molecules. The metabolite recognition tool has been applied to the literature covering an important model organism, the yeast Saccharomyces cerevisiae, to define its reported metabolome. By coupling to ChemSpider, a major chemical database, we have identified structures for much of the reported metabolome and, where structure identification fails, been able to suggest extensions to ChemSpider. Our manually annotated gold-standard data on 296 abstracts are available as supplementary materials. Metabolite names and, where appropriate, structures are also available as supplementary materials. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s11306-010-0251-6) contains supplementary material, which is available to authorized users.
Project description:This pan-cancer cell line proteomic atlas comprises proteomic data acquired by data independent acquisition (specifically, SWATH) mass spectrometry for 949 cancer cell lines. Cell lines were processed in technical triplicate, with duplicates acquired on different mass spectrometers, alongside HEK293T cell line control samples processed across the experimental period. For further details, refer to the publication that accompanies this data deposition.
Project description:This dataset is composed of symbolic and quantitative entities concerning food packaging composition and gas permeability. It was created from 50 scientific articles in English registered in html format from several international journals on the ScienceDirect website. The files were annotated independently by three experts on a WebAnno server. The aim of the annotation task was to recognize all entities related to packaging permeability measures and packaging composition. This annotation task is driven by an Ontological and Terminological Resource (OTR). An annotation guideline was designed in a collective and iterative approach involving the annotators. This dataset can be used to train or evaluate natural language processing (NLP) approaches in experimental fields, such as specialized entity recognition (e.g. terms and variations, units of measure, complex numerical values) or sentence level binary relation (e.g. value to unit, term to acronym).
Project description:The present dataset ("dataset 3") is a subset of a large metastudy on AML classfication. It contains normalized gene expression values of 1181 samples. In total, three datasets were generated, each containing data of a different platforms: dataset 1 (Affymetrix HG-U133 A microarrays), dataset 2 (Affymetrix HG-U133 2.0 microarrays) and dataset 3 (RNA-seq). Dataset 3 was generated using the following strategy: All data sets published in the National Center for Biotechnology Information Gene Expression Omnibus (GEO) on 20 September 2017 were reviewed for inclusion in the present study. Basic criteria for inclusion were the cell type under study (human peripheral blood mononuclear cells (PMBCs) and/or bone marrow samples) as well as the species (Homo sapiens). Furthermore, GEO SuperSeries were excluded to avoid duplicated samples. We filtered the datasets for data generated with high-throughput RNA sequencing (RNA-seq) and excluded studies with very small sample sizes (< 10 samples). We then applied a disease-specific search, in which we filtered for acute myeloid leukemia, other leukemia and healthy or non-leukemia-related samples. The results of this search strategy were then internally reviewed and data were excluded based on the following criteria: (i) exclusion of duplicated samples, (ii) exclusion of studies that sorted single cell types (e.g. T cells or B cells) prior to gene expression profiling, (iii) exclusion of studies with inaccessible data. Other than that, no studies were excluded from our analysis. In total, the datasets contained samples from the following GSE Series: GSE63085, GSE32874, GSE58335, GSE86884, GSE63703, GSE63646, GSE63816, GSE72790, GSE81259, GSE85712, GSE45735, GSE64655, GSE87186, GSE49642, GSE52656, GSE62190, GSE66917, GSE67039, GSE61162, GSE67184, GSE49601, GSE78785, GSE79970. All raw data files were downloaded from GEO. Transcript abundances were calculated using kallisto version 0.43.0 and all data was normalized with the R package DESeq2 (R version R-3.2.4, DESeq2 version 1.12.4) with standard parameters. Genome build hg38 was used for read alignment. No filtering of low-expressed genes was performed.