Dataset Information

Comparison of false-discovery rates of various decoy databases.

ABSTRACT:

Background

The target-decoy strategy effectively estimates the false-discovery rate (FDR) by creating a decoy database with a size identical to that of the target database. Decoy databases are created by various methods, such as, the reverse, pseudo-reverse, shuffle, pseudo-shuffle, and the de Bruijn methods. FDR is sometimes over- or under-estimated depending on which decoy database is used because the ratios of redundant peptides in the target databases are different, that is, the numbers of unique (non-redundancy) peptides in the target and decoy databases differ.

Results

We used two protein databases (the UniProt Saccharomyces cerevisiae protein database and the UniProt human protein database) to compare the FDRs of various decoy databases. When the ratio of redundant peptides in the target database is low, the FDR is not overestimated by any decoy construction method. However, if the ratio of redundant peptides in the target database is high, the FDR is overestimated when the (pseudo) shuffle decoy database is used. Additionally, human and S. cerevisiae six frame translation databases, which are large databases, also showed outcomes similar to that from the UniProt human protein database.

Conclusion

The FDR must be estimated using the correction factor proposed by Elias and Gygi or that by Kim et al. when (pseudo) shuffle decoy databases are used.

SUBMITTER: Lee S

PROVIDER: S-EPMC8449453 | biostudies-literature | 2021 Sep

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Comparison of false-discovery rates of various decoy databases.

Lee Sangjeong S Park Heejin H Kim Hyunwoo H

Proteome science 20210918 1

<h4>Background</h4>The target-decoy strategy effectively estimates the false-discovery rate (FDR) by creating a decoy database with a size identical to that of the target database. Decoy databases are created by various methods, such as, the reverse, pseudo-reverse, shuffle, pseudo-shuffle, and the de Bruijn methods. FDR is sometimes over- or under-estimated depending on which decoy database is used because the ratios of redundant peptides in the target databases are different, that is, the numb ...[more]

PMID: 34537052

Dataset Information

Comparison of false-discovery rates of various decoy databases.

Background

Results

Conclusion

Publications

Comparison of false-discovery rates of various decoy databases.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Target-small decoy search strategy for false discovery rate estimation.
| S-EPMC6708216 | biostudies-literature

False discovery rates: a new deal.
| S-EPMC5379932 | biostudies-literature

Target-Decoy-Based False Discovery Rate Estimation for Large-Scale Metabolite Identification.
| S-EPMC6252074 | biostudies-literature

Target-decoy approach and false discovery rate: when things may go wrong.
| S-EPMC3220955 | biostudies-literature

A comparison of two classes of methods for estimating false discovery rates in microarray studies.
| S-EPMC3820438 | biostudies-literature

Evaluating statistical methods using plasmode data sets in the age of massive public databases: an illustration using false discovery rates.
| S-EPMC2409977 | biostudies-literature

Averaging Strategy To Reduce Variability in Target-Decoy Estimates of False Discovery Rate.
| S-EPMC6919216 | biostudies-literature

False discovery rates for rare variants from sequenced data.
| S-EPMC4711769 | biostudies-literature

New mixture models for decoy-free false discovery rate estimation in mass spectrometry proteomics.
| S-EPMC7773488 | biostudies-literature

Analysis of gene expression in pathophysiological states: balancing false discovery and false negative rates.
| S-EPMC1334678 | biostudies-literature