Project description:Loop-mediated isothermal amplification (LAMP) is increasingly used in molecular diagnostics as an alternative to PCR based methods. There are numerous reported techniques to detect the LAMP amplification including turbidity, bioluminescence and intercalating fluorescent dyes. In this report we show that quenched fluorescent labels on various LAMP primers can be used to quantify and detect target DNA molecules down to single copy numbers. By selecting different fluorophores, this method can be simply multiplexed. Moreover this highly specific LAMP detection technique can reduce the incidence of false positives originating from mispriming events. Attribution of these events to particular primers will help inform and improve LAMP primer design.
Project description:According to the current view, each microRNA regulates hundreds of genes. Computational tools aim at identifying microRNA targets, usually selecting evolutionarily conserved microRNA binding sites. While the false positive rates have been evaluated for some prediction programs, that information is rarely put forward in studies making use of their predictions. Here, we provide evidence that such predictions are often biologically irrelevant. Focusing on miR-223-guided repression, we observed that it is often smaller than inter-individual variability in gene expression among wild-type mice, suggesting that most predicted targets are functionally insensitive to that microRNA. Furthermore, we found that human haplo-insufficient genes tend to bear the most highly conserved microRNA binding sites. It thus appears that biological functionality of microRNA binding sites depends on the dose-sensitivity of their host gene and that, conversely, it is unlikely that every predicted microRNA target is dose-sensitive enough to be functionally regulated by microRNAs. We also observed that some mRNAs can efficiently titrate microRNAs, providing a reason for microRNA binding site conservation for inefficiently repressed targets. Finally, many conserved microRNA binding sites are conserved in a microRNA-independent fashion: Sequence elements may be conserved for other reasons, while being fortuitously complementary to microRNAs. Collectively, our data suggest that the role of microRNAs in normal and pathological conditions has been overestimated due to the frequent overlooking of false positive rates.
Project description:Self-transcribing active regulatory region sequencing (STARR-seq) and its variants have been widely used to characterize enhancers. However, it has been reported that up to 87% of STARR-seq peaks are located in repressive chromatin and are not functional in the tested cells. While some of the STARR-seq peaks in repressive chromatin might be active in other cell/tissue types, some others might be false positives. Meanwhile, many active enhancers may not be identified by the current STARR-seq methods. Although methods have been proposed to mitigate systematic errors caused by the use of plasmid vectors, the artifacts due to the intrinsic limitations of current STARR-seq methods are still prevalent and the underlying causes are not fully understood. Based on predicted cis-regulatory modules (CRMs) and non-CRMs in the human genome as well as predicted active CRMs and non-active CRMs in a few human cell lines/tissues with STARR-seq data available, we reveal prevalent false positives and false negatives in STARR-seq peaks generated by major variants of STARR-seq methods and possible underlying causes. Our results will help design strategies to improve STARR-seq methods and interpret the results.
Project description:BackgroundA large number of probabilistic models used in sequence analysis assign non-zero probability values to most input sequences. To decide when a given probability is sufficient the most common way is bayesian binary classification, where the probability of the model characterizing the sequence family of interest is compared to that of an alternative probability model. We can use as alternative model a null model. This is the scoring technique used by sequence analysis tools such as HMMER, SAM and INFERNAL. The most prevalent null models are position-independent residue distributions that include: the uniform distribution, genomic distribution, family-specific distribution and the target sequence distribution. This paper presents a study to evaluate the impact of the choice of a null model in the final result of classifications. In particular, we are interested in minimizing the number of false predictions in a classification. This is a crucial issue to reduce costs of biological validation.ResultsFor all the tests, the target null model presented the lowest number of false positives, when using random sequences as a test. The study was performed in DNA sequences using GC content as the measure of content bias, but the results should be valid also for protein sequences. To broaden the application of the results, the study was performed using randomly generated sequences. Previous studies were performed on aminoacid sequences, using only one probabilistic model (HMM) and on a specific benchmark, and lack more general conclusions about the performance of null models. Finally, a benchmark test with P. falciparum confirmed these results.ConclusionsOf the evaluated models the best suited for classification are the uniform model and the target model. However, the use of the uniform model presents a GC bias that can cause more false positives for candidate sequences with extreme compositional bias, a characteristic not described in previous studies. In these cases the target model is more dependable for biological validation due to its higher specificity.
Project description:A basic problem of microarray data analysis is to identify genes whose expression is affected by the distinction between malignancies with different properties. These genes are said to be differentially expressed. Differential expression can be detected by selecting the genes with P-values (derived using an appropriate hypothesis test) below a certain rejection level. This selection, however, is not possible without accepting some false positives and negatives since the two sets of P-values, associated with the genes whose expression is and is not affected by the distinction between the different malignancies, overlap. We describe a procedure for the study of differential expression in microarray data based on receiver-operating characteristic curves. This approach can be useful to select a rejection level that balances the number of false positives and negatives and to assess the degree of overlap between the two sets of P-values. Since this degree of overlap characterises the balance that can be reached between the number of false positives and negatives, this quantity can be seen as a quality measure of microarray data with respect to the detection of differential expression. As an example, we apply our method to data sets studying acute leukaemia.
Project description:Cell autonomous cancer dependencies are now routinely identified using CRISPR loss-of-function viability screens. However, a bias exists that makes it difficult to assess the true essentiality of genes located in amplicons, since the entire amplified region can exhibit lethal scores. These false-positive hits can either be discarded from further analysis, which in cancer models can represent a significant number of hits, or methods can be developed to rescue the true-positives within amplified regions. We propose two methods to rescue true positive hits in amplified regions by correcting for this copy number artefact. The Local Drop Out (LDO) method uses the relative lethality scores within genomic regions to assess true essentiality and does not require additional orthogonal data (e.g. copy number value). LDO is meant to be used in screens covering a dense region of the genome (e.g. a whole chromosome or the whole genome). The General Additive Model (GAM) method models the screening data as a function of the known copy number values and removes the systematic effect from the measured lethality. GAM does not require the same density as LDO, but does require prior knowledge of the copy number values. Both methods have been developed with single sample experiments in mind so that the correction can be applied even in smaller screens. Here we demonstrate the efficacy of both methods at removing the copy number effect and rescuing hits from some of the amplified regions. We estimate a 70-80% decrease of false positive hits with either method in regions of high copy number compared to no correction.
Project description:Organic impurities in compound libraries are known to often cause false-positive signals in screening campaigns for new leads, but organic impurities do not fully account for all false-positive results. We discovered inorganic impurities in our screening library that can also cause positive signals for a variety of targets and/or readout systems, including biochemical and biosensor assays. We investigated in depth the example of zinc for a specific project and in retrospect in various HTS screens at Roche and propose a straightforward counter screen using the chelator TPEN to rule out inhibition caused by zinc.
Project description:Multiplexing strategies for large-scale proteomic analyses have become increasingly prevalent, tandem mass tags (TMT) in particular. Here we used a large iPSC proteomic experiment with twenty-four 10-plex TMT batches to evaluate the effect of integrating multiple TMT batches within a single analysis. We identified a significant inflation rate of protein missing values as multiple batches are integrated and show that this pattern is aggravated at the peptide level. We also show that without normalization strategies to address the batch effects, the high precision of quantitation within a single multiplexed TMT batch is not reproduced when data from multiple TMT batches are integrated.Further, the incidence of false positives was studied by using Y chromosome peptides as an internal control. The iPSC lines quantified in this data set were derived from both male and female donors, hence the peptides mapped to the Y chromosome should be absent from female lines. Nonetheless, these Y chromosome-specific peptides were consistently detected in the female channels of all TMT batches. We then used the same Y chromosome specific peptides to quantify the level of ion coisolation as well as the effect of primary and secondary reporter ion interference. These results were used to propose solutions to mitigate the limitations of multi-batch TMT analyses. We confirm that including a common reference line in every batch increases precision by facilitating normalization across the batches and we propose experimental designs that minimize the effect of cross population reporter ion interference.
Project description:High-throughput measurements of molecular phenotypes provide an unprecedented opportunity to model cellular processes and their impact on disease. These highly structured datasets are usually strongly confounded, creating false positives and reducing power. This has motivated many approaches based on principal components analysis (PCA) to estimate and correct for confounders, which have become indispensable elements of association tests between molecular phenotypes and both genetic and nongenetic factors. Here, we show that these correction approaches induce a bias, and that it persists for large sample sizes and replicates out-of-sample. We prove this theoretically for PCA by deriving an analytic, deterministic, and intuitive bias approximation. We assess other methods with realistic simulations, which show that perturbing any of several basic parameters can cause false positive rate (FPR) inflation. Our experiments show the bias depends on covariate and confounder sparsity, effect sizes, and their correlation. Surprisingly, when the covariate and confounder have [Formula: see text], standard two-step methods all have [Formula: see text]-fold FPR inflation. Our analysis informs best practices for confounder correction in genomic studies, and suggests many false discoveries have been made and replicated in some differential expression analyses.
Project description:BACKGROUND: Identification of historic pathogens is challenging since false positives and negatives are a serious risk. Environmental non-pathogenic contaminants are ubiquitous. Furthermore, public genetic databases contain limited information regarding these species. High-throughput sequencing may help reliably detect and identify historic pathogens. RESULTS: We shotgun-sequenced 8 16th-century Mixtec individuals from the site of Teposcolula Yucundaa (Oaxaca, Mexico) who are reported to have died from the huey cocoliztli ('Great Pestilence' in Nahautl), an unknown disease that decimated native Mexican populations during the Spanish colonial period, in order to identify the pathogen. Comparison of these sequences with those deriving from the surrounding soil and from 4 precontact individuals from the site found a wide variety of contaminant organisms that confounded analyses. Without the comparative sequence data from the precontact individuals and soil, false positives for Yersinia pestis and rickettsiosis could have been reported. CONCLUSIONS: False positives and negatives remain problematic in ancient DNA analyses despite the application of high-throughput sequencing. Our results suggest that several studies claiming the discovery of ancient pathogens may need further verification. Additionally, true single molecule sequencing's short read lengths, inability to sequence through DNA lesions, and limited ancient-DNA-specific technical development hinder its application to palaeopathology.