Project description:MotivationA common strategy to infer and quantify interactions between components of a biological system is to deduce them from the network's response to targeted perturbations. Such perturbation experiments are often challenging and costly. Therefore, optimizing the experimental design is essential to achieve a meaningful characterization of biological networks. However, it remains difficult to predict which combination of perturbations allows to infer specific interaction strengths in a given network topology. Yet, such a description of identifiability is necessary to select perturbations that maximize the number of inferable parameters.ResultsWe show analytically that the identifiability of network parameters can be determined by an intuitive maximum-flow problem. Furthermore, we used the theory of matroids to describe identifiability relationships between sets of parameters in order to build identifiable effective network models. Collectively, these results allowed to device strategies for an optimal design of the perturbation experiments. We benchmarked these strategies on a database of human pathways. Remarkably, full network identifiability was achieved, on average, with less than a third of the perturbations that are needed in a random experimental design. Moreover, we determined perturbation combinations that additionally decreased experimental effort compared to single-target perturbations. In summary, we provide a framework that allows to infer a maximal number of interaction strengths with a minimal number of perturbation experiments.Availability and implementationIdentiFlow is available at github.com/GrossTor/IdentiFlow.Supplementary informationSupplementary data are available at Bioinformatics online.
Project description:IntroductionExact tests on proportions exist for single-group and two-group designs, but no general test on proportions exists that is appropriate for any experimental design involving more than two groups, repeated measures, and/or factorial designs.MethodHerein, we extend the analysis of proportions using arcsine transform to any sort of design. The resulting framework, which we have called Analysis of Proportions Using Arcsine Transform (ANOPA), is completely analogous to the analysis of variance for means of continuous data, allowing the examination of interactions, main and simple effects, post-hoc tests, orthogonal contrasts, et cetera.ResultWe illustrate the method with a few examples (single-factor design, two-factor design, within-subject design, and mixed design) and explore type I error rates with Monte Carlo simulations. We also examine power computation and confidence intervals for proportions.DiscussionANOPA is a complete series of analyses for proportions, applicable to any design.
Project description:BackgroundGene set analysis (GSA) aims to evaluate the association between the expression of biological pathways, or a priori defined gene sets, and a particular phenotype. Numerous GSA methods have been proposed to assess the enrichment of sets of genes. However, most methods are developed with respect to a specific alternative scenario, such as a differential mean pattern or a differential coexpression. Moreover, a very limited number of methods can handle either binary, categorical, or continuous phenotypes. In this paper, we develop two novel GSA tests, called SDRs, based on the sufficient dimension reduction technique, which aims to capture sufficient information about the relationship between genes and the phenotype. The advantages of our proposed methods are that they allow for categorical and continuous phenotypes, and they are also able to identify a variety of enriched gene sets.ResultsThrough simulation studies, we compared the type I error and power of SDRs with existing GSA methods for binary, triple, and continuous phenotypes. We found that SDR methods adequately control the type I error rate at the pre-specified nominal level, and they have a satisfactory power to detect gene sets with differential coexpression and to test non-linear associations between gene sets and a continuous phenotype. In addition, the SDR methods were compared with seven widely-used GSA methods using two real microarray datasets for illustration.ConclusionsWe concluded that the SDR methods outperform the others because of their flexibility with regard to handling different kinds of phenotypes and their power to detect a wide range of alternative scenarios. Our real data analysis highlights the differences between GSA methods for detecting enriched gene sets.
Project description:For decades, the mechanism of skeletal patterning along a proximal-distal axis has been an area of intense inquiry. Here, we examine the development of the ribs, simple structures that in most terrestrial vertebrates consist of two skeletal elements-a proximal bone and a distal cartilage portion. While the ribs have been shown to arise from the somites, little is known about how the two segments are specified. During our examination of genetically modified mice, we discovered a series of progressively worsening phenotypes that could not be easily explained. Here, we combine genetic analysis of rib development with agent-based simulations to conclude that proximal-distal patterning and outgrowth could occur based on simple rules. In our model, specification occurs during somite stages due to varying Hedgehog protein levels, while later expansion refines the pattern. This framework is broadly applicable for understanding the mechanisms of skeletal patterning along a proximal-distal axis.
Project description:The success of model-based methods in phylogenetics has motivated much research aimed at generating new, biologically informative models. This new computer-intensive approach to phylogenetics demands validation studies and sound measures of performance. To date there has been little practical guidance available as to when and why the parameters in a particular model can be identified reliably. Here, we illustrate how Data Cloning (DC), a recently developed methodology to compute the maximum likelihood estimates along with their asymptotic variance, can be used to diagnose structural parameter nonidentifiability (NI) and distinguish it from other parameter estimability problems, including when parameters are structurally identifiable, but are not estimable in a given data set (INE), and when parameters are identifiable, and estimable, but only weakly so (WE). The application of the DC theorem uses well-known and widely used Bayesian computational techniques. With the DC approach, practitioners can use Bayesian phylogenetics software to diagnose nonidentifiability. Theoreticians and practitioners alike now have a powerful, yet simple tool to detect nonidentifiability while investigating complex modeling scenarios, where getting closed-form expressions in a probabilistic study is complicated. Furthermore, here we also show how DC can be used as a tool to examine and eliminate the influence of the priors, in particular if the process of prior elicitation is not straightforward. Finally, when applied to phylogenetic inference, DC can be used to study at least two important statistical questions: assessing identifiability of discrete parameters, like the tree topology, and developing efficient sampling methods for computationally expensive posterior densities.
Project description:BACKGROUND: Microarray comparative genomic hybridization (CGH) is currently one of the most powerful techniques to measure DNA copy number in large genomes. In humans, microarray CGH is widely used to assess copy number variants in healthy individuals and copy number aberrations associated with various diseases, syndromes and disease susceptibility. In model organisms such as Caenorhabditis elegans (C. elegans) the technique has been applied to detect mutations, primarily deletions, in strains of interest. Although various constraints on oligonucleotide properties have been suggested to minimize non-specific hybridization and improve the data quality, there have been few experimental validations for CGH experiments. For genomic regions where strict design filters would limit the coverage it would also be useful to quantify the expected loss in data quality associated with relaxed design criteria. RESULTS: We have quantified the effects of filtering various oligonucleotide properties by measuring the resolving power for detecting deletions in the human and C. elegans genomes using NimbleGen microarrays. Approximately twice as many oligonucleotides are typically required to be affected by a deletion in human DNA samples in order to achieve the same statistical confidence as one would observe for a deletion in C. elegans. Surprisingly, the ability to detect deletions strongly depends on the oligonucleotide 15-mer count, which is defined as the sum of the genomic frequency of all the constituent 15-mers within the oligonucleotide. A similarity level above 80% to non-target sequences over the length of the probe produces significant cross-hybridization. We recommend the use of a fairly large melting temperature window of up to 10 C, the elimination of repeat sequences, the elimination of homopolymers longer than 5 nucleotides, and a threshold of -1 kcal/mol on the oligonucleotide self-folding energy. We observed very little difference in data quality when varying the oligonucleotide length between 50 and 70, and even when using an isothermal design strategy. CONCLUSIONS: We have determined experimentally the effects of varying several key oligonucleotide microarray design criteria for detection of deletions in C. elegans and humans with NimbleGen's CGH technology. Our oligonucleotide design recommendations should be applicable for CGH analysis in most species.
Project description:BACKGROUND: RNA sequencing (RNA-Seq) has emerged as a powerful approach for the detection of differential gene expression with both high-throughput and high resolution capabilities possible depending upon the experimental design chosen. Multiplex experimental designs are now readily available, these can be utilised to increase the numbers of samples or replicates profiled at the cost of decreased sequencing depth generated per sample. These strategies impact on the power of the approach to accurately identify differential expression. This study presents a detailed analysis of the power to detect differential expression in a range of scenarios including simulated null and differential expression distributions with varying numbers of biological or technical replicates, sequencing depths and analysis methods. RESULTS: Differential and non-differential expression datasets were simulated using a combination of negative binomial and exponential distributions derived from real RNA-Seq data. These datasets were used to evaluate the performance of three commonly used differential expression analysis algorithms and to quantify the changes in power with respect to true and false positive rates when simulating variations in sequencing depth, biological replication and multiplex experimental design choices. CONCLUSIONS: This work quantitatively explores comparisons between contemporary analysis tools and experimental design choices for the detection of differential expression using RNA-Seq. We found that the DESeq algorithm performs more conservatively than edgeR and NBPSeq. With regard to testing of various experimental designs, this work strongly suggests that greater power is gained through the use of biological replicates relative to library (technical) replicates and sequencing depth. Strikingly, sequencing depth could be reduced as low as 15% without substantial impacts on false positive or true positive rates.
Project description:Fungi are well-known producers of chemically diverse and biologically active secondary metabolites. However, their production yields through fermentation may hamper structural analysis and biological activity downstream investigations. Herein, a systematic experimental design that varies multiple cultivation parameters, followed by chemometrics analysis on HPLC-UV-MS or UHPLC-HRMS/MS data, is presented to enhance the production yield of fungal natural products. The overall procedure typically requires 3-4 months of work when first developed, and up to 3 months as a routine procedure.
Project description:For scientific, ethical and economic reasons, experiments involving animals should be appropriately designed, correctly analysed and transparently reported. This increases the scientific validity of the results, and maximises the knowledge gained from each experiment. A minimum amount of relevant information must be included in scientific publications to ensure that the methods and results of a study can be reviewed, analysed and repeated. Omitting essential information can raise scientific and ethical concerns. We report the findings of a systematic survey of reporting, experimental design and statistical analysis in published biomedical research using laboratory animals. Medline and EMBASE were searched for studies reporting research on live rats, mice and non-human primates carried out in UK and US publicly funded research establishments. Detailed information was collected from 271 publications, about the objective or hypothesis of the study, the number, sex, age and/or weight of animals used, and experimental and statistical methods. Only 59% of the studies stated the hypothesis or objective of the study and the number and characteristics of the animals used. Appropriate and efficient experimental design is a critical component of high-quality science. Most of the papers surveyed did not use randomisation (87%) or blinding (86%), to reduce bias in animal selection and outcome assessment. Only 70% of the publications that used statistical methods described their methods and presented the results with a measure of error or variability. This survey has identified a number of issues that need to be addressed in order to improve experimental design and reporting in publications describing research using animals. Scientific publication is a powerful and important source of information; the authors of scientific publications therefore have a responsibility to describe their methods and results comprehensively, accurately and transparently, and peer reviewers and journal editors share the responsibility to ensure that published studies fulfil these criteria.
Project description:The immense global diversity of HIV-1 is a significant obstacle to developing a safe and effective vaccine. We recently showed that infections established with multiple founder variants are associated with the development of neutralization breadth years later. We propose a novel vaccine design strategy that integrates the variability observed in acute HIV-1 infections with multiple founder variants. We developed a probabilistic model to simulate this variability, yielding a set of sequences that present the minimal diversity seen in an infection with multiple founders. We applied this model to a subtype C consensus sequence for the Envelope (Env) (used as input) and showed that the simulated Env sequences mimic the mutational landscape of an infection with multiple founder variants, including diversity at antibody epitopes. The derived set of multi-founder-variant-like, minimally distant antigens is designed to be used as a vaccine cocktail specific to a HIV-1 subtype or circulating recombinant form and is expected to promote the development of broadly neutralizing antibodies.