Unknown

Dataset Information

0

Sequence count data are poorly fit by the negative binomial distribution.


ABSTRACT: Sequence count data are commonly modelled using the negative binomial (NB) distribution. Several empirical studies, however, have demonstrated that methods based on the NB-assumption do not always succeed in controlling the false discovery rate (FDR) at its nominal level. In this paper, we propose a dedicated statistical goodness of fit test for the NB distribution in regression models and demonstrate that the NB-assumption is violated in many publicly available RNA-Seq and 16S rRNA microbiome datasets. The zero-inflated NB distribution was not found to give a substantially better fit. We also show that the NB-based tests perform worse on the features for which the NB-assumption was violated than on the features for which no significant deviation was detected. This gives an explanation for the poor behaviour of NB-based tests in many published evaluation studies. We conclude that nonparametric tests should be preferred over parametric methods.

SUBMITTER: Hawinkel S 

PROVIDER: S-EPMC7192467 | biostudies-literature | 2020

REPOSITORIES: biostudies-literature

altmetric image

Publications

Sequence count data are poorly fit by the negative binomial distribution.

Hawinkel Stijn S   Rayner J C W JCW   Bijnens Luc L   Thas Olivier O  

PloS one 20200430 4


Sequence count data are commonly modelled using the negative binomial (NB) distribution. Several empirical studies, however, have demonstrated that methods based on the NB-assumption do not always succeed in controlling the false discovery rate (FDR) at its nominal level. In this paper, we propose a dedicated statistical goodness of fit test for the NB distribution in regression models and demonstrate that the NB-assumption is violated in many publicly available RNA-Seq and 16S rRNA microbiome d  ...[more]

Similar Datasets

| S-EPMC9679925 | biostudies-literature
| S-EPMC7541535 | biostudies-literature
| S-EPMC9766880 | biostudies-literature
| S-EPMC4365073 | biostudies-literature
| S-EPMC6706006 | biostudies-literature
| S-EPMC4692373 | biostudies-literature
| S-EPMC5667504 | biostudies-literature
| S-EPMC5619260 | biostudies-literature
| S-EPMC7195715 | biostudies-literature
| S-EPMC6070621 | biostudies-literature