Unknown

Dataset Information

0

FastGroup: a program to dereplicate libraries of 16S rDNA sequences.


ABSTRACT: BACKGROUND:Ribosomal 16S DNA sequences are an essential tool for identifying and classifying microbes. High-throughput DNA sequencing now makes it economically possible to produce very large datasets of 16S rDNA sequences in short time periods, necessitating new computer tools for analyses. Here we describe FastGroup, a Java program designed to dereplicate libraries of 16S rDNA sequences. By dereplication we mean to: 1) compare all the sequences in a data set to each other, 2) group similar sequences together, and 3) output a representative sequence from each group. In this way, duplicate sequences are removed from a library. RESULTS:FastGroup was tested using a library of single-pass, bacterial 16S rDNA sequences cloned from coral-associated bacteria. We found that the optimal strategy for dereplicating these sequences was to: 1) trim ambiguous bases from the 5' end of the sequences and all sequence 3' of the conserved Bact517 site, 2) match the sequences from the 3' end, and 3) group sequences > or =97% identical to each other. CONCLUSIONS:The FastGroup program simplifies the dereplication of 16S rDNA sequence libraries and prepares the raw sequences for subsequent analyses.

SUBMITTER: Seguritan V 

PROVIDER: S-EPMC59723 | biostudies-literature | 2001

REPOSITORIES: biostudies-literature

altmetric image

Publications

FastGroup: a program to dereplicate libraries of 16S rDNA sequences.

Seguritan V V   Rohwer F F  

BMC bioinformatics 20011016


<h4>Background</h4>Ribosomal 16S DNA sequences are an essential tool for identifying and classifying microbes. High-throughput DNA sequencing now makes it economically possible to produce very large datasets of 16S rDNA sequences in short time periods, necessitating new computer tools for analyses. Here we describe FastGroup, a Java program designed to dereplicate libraries of 16S rDNA sequences. By dereplication we mean to: 1) compare all the sequences in a data set to each other, 2) group simi  ...[more]

Similar Datasets

| S-EPMC1386709 | biostudies-literature
| S-EPMC6967091 | biostudies-literature
| S-EPMC3768542 | biostudies-literature
| S-EPMC5639872 | biostudies-literature
| S-EPMC4623359 | biostudies-literature
| S-EPMC4329515 | biostudies-literature
| S-EPMC206781 | biostudies-other
| S-EPMC4682383 | biostudies-literature
| S-EPMC4019831 | biostudies-literature
| S-EPMC161812 | biostudies-literature