Browse
Submit Data
Databases
API
Help

Dataset Information

0 Views

0 Connections

0 Citations

0 Reanalyses

0 Downloads

Omics score: 0

A note on the false discovery rate of novel peptides in proteogenomics.

ABSTRACT: Proteogenomics has been well accepted as a tool to discover novel genes. In most conventional proteogenomic studies, a global false discovery rate is used to filter out false positives for identifying credible novel peptides. However, it has been found that the actual level of false positives in novel peptides is often out of control and behaves differently for different genomes.To quantitatively model this problem, we theoretically analyze the subgroup false discovery rates of annotated and novel peptides. Our analysis shows that the annotation completeness ratio of a genome is the dominant factor influencing the subgroup FDR of novel peptides. Experimental results on two real datasets of Escherichia coli and Mycobacterium tuberculosis support our conjecture.yfu@amss.ac.cn or xupingghy@gmail.com or smhe@ict.ac.cnSupplementary data are available at Bioinformatics online.

SUBMITTER: Zhang K

PROVIDER: S-EPMC4595894 | biostudies-literature | 2015 Oct

REPOSITORIES: biostudies-literature

ACCESS DATA

Json Xml

Publications

A note on the false discovery rate of novel peptides in proteogenomics.

Zhang Kun K Fu Yan Y Zeng Wen-Feng WF He Kun K Chi Hao H Liu Chao C Li Yan-Chang YC Gao Yuan Y Xu Ping P He Si-Min SM

Bioinformatics (Oxford, England) 20150614 20

<h4>Motivation</h4>Proteogenomics has been well accepted as a tool to discover novel genes. In most conventional proteogenomic studies, a global false discovery rate is used to filter out false positives for identifying credible novel peptides. However, it has been found that the actual level of false positives in novel peptides is often out of control and behaves differently for different genomes.<h4>Results</h4>To quantitatively model this problem, we theoretically analyze the subgroup false d ...[more]

PMID: 26076724

Similar Datasets

Can the false-discovery rate be misleading?

Project description:The decoy-database approach is currently the gold standard for assessing the confidence of identifications in shotgun proteomic experiments. Here, we demonstrate that what might appear to be a good result under the decoy-database approach for a given false-discovery rate could be, in fact, the product of overfitting. This problem has been overlooked until now and could lead to obtaining boosted identification numbers whose reliability does not correspond to the expected false-discovery rate. To overcome this, we are introducing a modified version of the method, termed a semi-labeled decoy approach, which enables the statistical determination of an overfitted result.

| S-EPMC3313620 | biostudies-literature

Testing jumps via false discovery rate control.

Project description:Many recently developed nonparametric jump tests can be viewed as multiple hypothesis testing problems. For such multiple hypothesis tests, it is well known that controlling type I error often makes a large proportion of erroneous rejections, and such situation becomes even worse when the jump occurrence is a rare event. To obtain more reliable results, we aim to control the false discovery rate (FDR), an efficient compound error measure for erroneous rejections in multiple testing problems. We perform the test via the Barndorff-Nielsen and Shephard (BNS) test statistic, and control the FDR with the Benjamini and Hochberg (BH) procedure. We provide asymptotic results for the FDR control. From simulations, we examine relevant theoretical results and demonstrate the advantages of controlling the FDR. The hybrid approach is then applied to empirical analysis on two benchmark stock indices with high frequency data.

| S-EPMC3616021 | biostudies-literature

A novel approach to minimize false discovery rate in genome-wide data analysis.

Project description:BackgroundHigh-throughput technologies, such as DNA microarray, have significantly advanced biological and biomedical research by enabling researchers to carry out genome-wide screens. One critical task in analyzing genome-wide datasets is to control the false discovery rate (FDR) so that the proportion of false positive features among those called significant is restrained. Recently a number of FDR control methods have been proposed and widely practiced, such as the Benjamini-Hochberg approach, the Storey approach and Significant Analysis of Microarrays (SAM).MethodsThis paper presents a straight-forward yet powerful FDR control method termed miFDR, which aims to minimize FDR when calling a fixed number of significant features. We theoretically proved that the strategy used by miFDR is able to find the optimal number of significant features when the desired FDR is fixed.ResultsWe compared miFDR with the BH approach, the Storey approach and SAM on both simulated datasets and public DNA microarray datasets. The results demonstrated that miFDR outperforms others by identifying more significant features under the same FDR cut-offs. Literature search showed that many genes called only by miFDR are indeed relevant to the underlying biology of interest.ConclusionsFDR has been widely applied to analyzing high-throughput datasets allowed for rapid discoveries. Under the same FDR threshold, miFDR is capable to identify more significant features than its competitors at a compatible level of complexity. Therefore, it can potentially generate great impacts on biological and biomedical research.AvailabilityIf interested, please contact the authors for getting miFDR.

| S-EPMC3856609 | biostudies-literature

False discovery rate estimation and heterobifunctional cross-linkers.

Project description:False discovery rate (FDR) estimation is a cornerstone of proteomics that has recently been adapted to cross-linking/mass spectrometry. Here we demonstrate that heterobifunctional cross-linkers, while theoretically different from homobifunctional cross-linkers, need not be considered separately in practice. We develop and then evaluate the impact of applying a correct FDR formula for use of heterobifunctional cross-linkers and conclude that there are minimal practical advantages. Hence a single formula can be applied to data generated from the many different non-cleavable cross-linkers.

| S-EPMC5944926 | biostudies-literature

Optimal False Discovery Rate Control for Dependent Data.

Project description:This paper considers the problem of optimal false discovery rate control when the test statistics are dependent. An optimal joint oracle procedure, which minimizes the false non-discovery rate subject to a constraint on the false discovery rate is developed. A data-driven marginal plug-in procedure is then proposed to approximate the optimal joint procedure for multivariate normal data. It is shown that the marginal procedure is asymptotically optimal for multivariate normal data with a short-range dependent covariance structure. Numerical results show that the marginal procedure controls false discovery rate and leads to a smaller false non-discovery rate than several commonly used p-value based false discovery rate controlling methods. The procedure is illustrated by an application to a genome-wide association study of neuroblastoma and it identifies a few more genetic variants that are potentially associated with neuroblastoma than several p-value-based false discovery rate controlling procedures.

| S-EPMC3559028 | biostudies-literature

False discovery rate control in two-stage designs.

Project description:BackgroundFor gene expression or gene association studies with a large number of hypotheses the number of measurements per marker in a conventional single-stage design is often low due to limited resources. Two-stage designs have been proposed where in a first stage promising hypotheses are identified and further investigated in the second stage with larger sample sizes. For two types of two-stage designs proposed in the literature we derive multiple testing procedures controlling the False Discovery Rate (FDR) demonstrating FDR control by simulations: designs where a fixed number of top-ranked hypotheses are selected and designs where the selection in the interim analysis is based on an FDR threshold. In contrast to earlier approaches which use only the second-stage data in the hypothesis tests (pilot approach), the proposed testing procedures are based on the pooled data from both stages (integrated approach).ResultsFor both selection rules the multiple testing procedures control the FDR in the considered simulation scenarios. This holds for the case of independent observations across hypotheses as well as for certain correlation structures. Additionally, we show that in scenarios with small effect sizes the testing procedures based on the pooled data from both stages can give a considerable improvement in power compared to tests based on the second-stage data only.ConclusionThe proposed hypothesis tests provide a tool for FDR control for the considered two-stage designs. Comparing the integrated approaches for both selection rules with the corresponding pilot approaches showed an advantage of the integrated approach in many simulation scenarios.

| S-EPMC3496575 | biostudies-literature

The functional false discovery rate with applications to genomics.

Project description:The false discovery rate (FDR) measures the proportion of false discoveries among a set of hypothesis tests called significant. This quantity is typically estimated based on p-values or test statistics. In some scenarios, there is additional information available that may be used to more accurately estimate the FDR. We develop a new framework for formulating and estimating FDRs and q-values when an additional piece of information, which we call an "informative variable", is available. For a given test, the informative variable provides information about the prior probability a null hypothesis is true or the power of that particular test. The FDR is then treated as a function of this informative variable. We consider two applications in genomics. Our first application is a genetics of gene expression (eQTL) experiment in yeast where every genetic marker and gene expression trait pair are tested for associations. The informative variable in this case is the distance between each genetic marker and gene. Our second application is to detect differentially expressed genes in an RNA-seq study carried out in mice. The informative variable in this study is the per-gene read depth. The framework we develop is quite general, and it should be useful in a broad range of scientific applications.

| S-EPMC7846131 | biostudies-literature

A Fuzzy Permutation Method for False Discovery Rate Control.

Project description:Biomedical researchers often encounter the large-p-small-n situations-a great number of variables are measured/recorded for only a few subjects. The authors propose a fuzzy permutation method to address the multiple testing problem for small sample size studies. The method introduces fuzziness into standard permutation analysis to produce randomized p-values, which are then converted into q-values for false discovery rate controls. Simple algebra shows that the fuzzy permutation method is at least as powerful as the standard permutation method under any alternative. Monte-Carlo simulations show that the proposed method has desirable statistical properties whether the study variables are normally or non-normally distributed. A real dataset is analyzed to illustrate its use. The proposed fuzzy permutation method is recommended for use in the large-p-small-n settings.

| S-EPMC4916423 | biostudies-literature

Improved False Discovery Rate Estimation Procedure for Shotgun Proteomics.

Project description:Interpreting the potentially vast number of hypotheses generated by a shotgun proteomics experiment requires a valid and accurate procedure for assigning statistical confidence estimates to identified tandem mass spectra. Despite the crucial role such procedures play in most high-throughput proteomics experiments, the scientific literature has not reached a consensus about the best confidence estimation methodology. In this work, we evaluate, using theoretical and empirical analysis, four previously proposed protocols for estimating the false discovery rate (FDR) associated with a set of identified tandem mass spectra: two variants of the target-decoy competition protocol (TDC) of Elias and Gygi and two variants of the separate target-decoy search protocol of Käll et al. Our analysis reveals significant biases in the two separate target-decoy search protocols. Moreover, the one TDC protocol that provides an unbiased FDR estimate among the target PSMs does so at the cost of forfeiting a random subset of high-scoring spectrum identifications. We therefore propose the mix-max procedure to provide unbiased, accurate FDR estimates in the presence of well-calibrated scores. The method avoids biases associated with the two separate target-decoy search protocols and also avoids the propensity for target-decoy competition to discard a random subset of high-scoring target identifications.

| S-EPMC4533616 | biostudies-literature

Transformation Invariant Control of Voxel-Wise False Discovery Rate.

Project description:Multiple testing for statistical maps remains a critical and challenging problem in brain mapping. Since the false discovery rate (FDR) criterion was introduced to the neuroimaging community a decade ago, many variations have been proposed, mainly to enhance detection power. However, a fundamental geometrical property known as transformation invariance has not been adequately addressed, especially for the voxel-wise FDR. Correction of multiple testing applied after spatial transformation is not necessarily equivalent to transformation applied after correction in the original space. Without the invariance property, assigning different testing spaces will yield different results. We find that normalized residuals of linear models with Gaussian noises are uniformly distributed on a unit high-dimensional sphere, independent of t-statistics and F-statistics. By defining volumetric measure in the hyper-spherical space mapped by normalized residuals, instead of the image's Euclidean space, we can achieve invariant control of the FDR under diffeomorphic transformation. This hyper-spherical measure also reflects intrinsic "volume of randomness" in signals. Experiments with synthetic, semi-synthetic and real images demonstrate that our method significantly reduces FDR inconsistency introduced by the choice of testing spaces.

| S-EPMC5052119 | biostudies-literature

OmicsDI is part of the ELIXIR infrastructure

OmicsDI is an Elixir interoperability service. Learn more ›

Tweets

OmicsDI Databases

PRIDE
PeptideAtlas
MassIVE
JPOST Repository
Physiome Model Repository

EGA
EVA
ENA
LINCS
PAXDB
Cell Collective

MetaboLights
Metabolomics Workbench
MetabolomeExpress
GNPS
BioModels
FAIRDOMHub

ArrayExpress
dbGaP
ExpressionAtlas
GEO
NODE

Information

Databases
Help
API
Contact us
Code on GitHub
Terms of use
Submit Data