Dataset Information

FastPval: a fast and memory efficient program to calculate very low P-values from empirical distribution.

ABSTRACT:

Motivation

Resampling methods, such as permutation and bootstrap, have been widely used to generate an empirical distribution for assessing the statistical significance of a measurement. However, to obtain a very low P-value, a large size of resampling is required, where computing speed, memory and storage consumption become bottlenecks, and sometimes become impossible, even on a computer cluster.

Results

We have developed a multiple stage P-value calculating program called FastPval that can efficiently calculate very low (up to 10(-9)) P-values from a large number of resampled measurements. With only two input files and a few parameter settings from the users, the program can compute P-values from empirical distribution very efficiently, even on a personal computer. When tested on the order of 10(9) resampled data, our method only uses 52.94% the time used by the conventional method, implemented by standard quicksort and binary search algorithms, and consumes only 0.11% of the memory and storage. Furthermore, our method can be applied to extra large datasets that the conventional method fails to calculate. The accuracy of the method was tested on data generated from Normal, Poison and Gumbel distributions and was found to be no different from the exact ranking approach.

Availability

The FastPval executable file, the java GUI and source code, and the java web start server with example data and introduction, are available at http://wanglab.hku.hk/pvalue.

SUBMITTER: Li MJ

PROVIDER: S-EPMC2971576 | biostudies-literature | 2010 Nov

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

FastPval: a fast and memory efficient program to calculate very low P-values from empirical distribution.

Li Mulin Jun MJ Sham Pak Chung PC Wang Junwen J

Bioinformatics (Oxford, England) 20100921 22

<h4>Motivation</h4>Resampling methods, such as permutation and bootstrap, have been widely used to generate an empirical distribution for assessing the statistical significance of a measurement. However, to obtain a very low P-value, a large size of resampling is required, where computing speed, memory and storage consumption become bottlenecks, and sometimes become impossible, even on a computer cluster.<h4>Results</h4>We have developed a multiple stage P-value calculating program called FastPv ...[more]

PMID: 20861029

Dataset Information

FastPval: a fast and memory efficient program to calculate very low P-values from empirical distribution.

Motivation

Results

Availability

Publications

FastPval: a fast and memory efficient program to calculate very low P-values from empirical distribution.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

SwiftOrtho: A fast, memory-efficient, multiple genome orthology classifier.
| S-EPMC6812468 | biostudies-literature

BLESS 2: accurate, memory-efficient and fast error correction method.
| S-EPMC6280799 | biostudies-other

OCMA: Fast, Memory-Efficient Factorization of Prohibitively Large Relationship Matrices.
| S-EPMC6325911 | biostudies-other

Lighter: fast and memory-efficient sequencing error correction without counting.
| S-EPMC4248469 | biostudies-literature

ecmtool: fast and memory-efficient enumeration of elementary conversion modes.
| S-EPMC9982354 | biostudies-literature

A fast and memory-efficient implementation of the transfer bootstrap.
| S-EPMC7141843 | biostudies-literature

FR-HIT, a very fast program to recruit metagenomic reads to homologous reference genomes.
| S-EPMC3106194 | biostudies-literature

HISAT: a fast spliced aligner with low memory requirements.
| S-EPMC4655817 | biostudies-literature