Project description:Marker genes identified in single cell experiments are expected to be highly specific to a certain cell type and highly expressed in that cell type. Detecting a gene by differential expression analysis does not necessarily satisfy those two conditions and is typically computationally expensive for large cell numbers. Here we present genesorteR, an R package that ranks features in single cell data in a manner consistent with the expected definition of marker genes in experimental biology research. We benchmark genesorteR using various data sets and show that it is distinctly more accurate in large single cell data sets compared to other methods. genesorteR is orders of magnitude faster than current implementations of differential expression analysis methods, can operate on data containing millions of cells and is applicable to both single cell RNA-Seq and single cell ATAC-Seq data. genesorteR is available at https://github.com/mahmoudibrahim/genesorteR.
Project description:A quantitative computational framework for allopolyploid single-cell data integration and core gene ranking in development [scRNA-seq]
Project description:Detection of single feature polymorphisms comparing five barley genotypes. Gene expression under unstressed and drought stressed conditions. Tissue from five entire five day old seedlings from drought stress or unstressed growth conditions was used for RNA extraction. For Barke, Morex and Stepoe the two types of RNA were pooled. For Oregon Wolfe Barley Dominant and Recessive (OWBs), the two types of RNA were handled separately. Targets from three biological replicates of each genotype-treatment were generated and transcript levels were determined using Affymetrix Barley1 GeneChip arrays. Probe set, followed by single probe, comparisons between genotypes allows the identification of single feature polymorphisms in comparisons between genotypes. For the OWBs, comparisons between stressed and unstressed conditions defines stress-regulated genes. Experiment Overall Design: Three genotypes, three replicates each. Two additional genotypes, two sets of three replicates each.
Project description:Detection of single feature polymorphisms comparing five barley genotypes. Gene expression under unstressed and drought stressed conditions. Tissue from five entire five day old seedlings from drought stress or unstressed growth conditions was used for RNA extraction. For Barke, Morex and Stepoe the two types of RNA were pooled. For Oregon Wolfe Barley Dominant and Recessive (OWBs), the two types of RNA were handled separately. Targets from three biological replicates of each genotype-treatment were generated and transcript levels were determined using Affymetrix Barley1 GeneChip arrays. Probe set, followed by single probe, comparisons between genotypes allows the identification of single feature polymorphisms in comparisons between genotypes. For the OWBs, comparisons between stressed and unstressed conditions defines stress-regulated genes. Keywords: repeat
Project description:Natural populations of the fruit fly, Drosophila melanogaster, segregate genetic variation that leads to cardiac disease phenotypes. Drosophila is well-known as a model for studying the mechanisms by which human disease genes cause pathology, including heart disease, but it is less well appreciated that they may also model the genetic architecture of disease, since flies presumably also have diseases that have a genetic basis. It is reasoned that most of these aberrant inbred line effects would be due to capture of rare variants of large effect as homozygotes, allowing the variants to be mapped rapidly using contemporary genomic approaches. In order to map the genetic variants in flies, we used single feature polymorphism (SFP) analysis to contrast the genome-wide genotype frequencies between pools of flies with aberrant and normal heart phenotype. SFP analysis is an indirect method for genome-wide genotyping that utilizes differential hybridization of genomic DNA to probes on a DNA chip that was initially designed for gene expression profiling, but can be used for species where genotyping chips are not available. DNA was prepared from three independent pools of 15 flies for each of the two types, as well as from the two parental lines. The samples were sheared and labeled with biotin, then hybridized to Affymetrix Drosophila expression microarray chips. Mismatch hybridization, namely a significant difference in the hybridization intensity between the parental lines, was detected from all perfect match (PM) probes, located in over 9,000 probes with an estimated False Discovery Rate of 11%.
Project description:Natural populations of the fruit fly, Drosophila melanogaster, segregate genetic variation that leads to cardiac disease phenotypes. Drosophila is well-known as a model for studying the mechanisms by which human disease genes cause pathology, including heart disease, but it is less well appreciated that they may also model the genetic architecture of disease, since flies presumably also have diseases that have a genetic basis. It is reasoned that most of these aberrant inbred line effects would be due to capture of rare variants of large effect as homozygotes, allowing the variants to be mapped rapidly using contemporary genomic approaches. In order to map the genetic variants in flies, we used single feature polymorphism (SFP) analysis to contrast the genome-wide genotype frequencies between pools of flies with aberrant and normal heart phenotype. SFP analysis is an indirect method for genome-wide genotyping that utilizes differential hybridization of genomic DNA to probes on a DNA chip that was initially designed for gene expression profiling, but can be used for species where genotyping chips are not available.
Project description:BackgroundSepsis is a life-threatening clinical condition that happens when the patient's body has an excessive reaction to an infection, and should be treated in one hour. Due to the urgency of sepsis, doctors and physicians often do not have enough time to perform laboratory tests and analyses to help them forecast the consequences of the sepsis episode. In this context, machine learning can provide a fast computational prediction of sepsis severity, patient survival, and sequential organ failure by just analyzing the electronic health records of the patients. Also, machine learning can be employed to understand which features in the medical records are more predictive of sepsis severity, of patient survival, and of sequential organ failure in a fast and non-invasive way.Dataset and methodsIn this study, we analyzed a dataset of electronic health records of 364 patients collected between 2014 and 2016. The medical record of each patient has 29 clinical features, and includes a binary value for survival, a binary value for septic shock, and a numerical value for the sequential organ failure assessment (SOFA) score. We disjointly utilized each of these three factors as an independent target, and employed several machine learning methods to predict it (binary classifiers for survival and septic shock, and regression analysis for the SOFA score). Afterwards, we used a data mining approach to identify the most important dataset features in relation to each of the three targets separately, and compared these results with the results achieved through a standard biostatistics approach.Results and conclusionsOur results showed that machine learning can be employed efficiently to predict septic shock, SOFA score, and survival of patients diagnoses with sepsis, from their electronic health records data. And regarding clinical feature ranking, our results showed that Random Forests feature selection identified several unexpected symptoms and clinical components as relevant for septic shock, SOFA score, and survival. These discoveries can help doctors and physicians in understanding and predicting septic shock. We made the analyzed dataset and our developed software code publicly available online.