Unknown

Dataset Information

0

A curated dataset of complete Enterobacteriaceae plasmids compiled from the NCBI nucleotide database.


ABSTRACT: Thousands of plasmid sequences are now publicly available in the NCBI nucleotide database, but they are not reliably annotated to distinguish complete plasmids from plasmid fragments, such as gene or contig sequences; therefore, retrieving complete plasmids for downstream analyses is challenging. Here we present a curated dataset of complete bacterial plasmids from the clinically relevant Enterobacteriaceae family. The dataset was compiled from the NCBI nucleotide database using curation steps designed to exclude incomplete plasmid sequences, and chromosomal sequences misannotated as plasmids. Over 2000 complete plasmid sequences are included in the curated plasmid dataset. Protein sequences produced from translating each complete plasmid nucleotide sequence in all 6 frames are also provided. Further analysis and discussion of the dataset is presented in an accompanying research article: "Ordering the mob: insights into replicon and MOB typing…" (Orlek et al., 2017) [1]. The curated plasmid sequences are publicly available in the Figshare repository.

SUBMITTER: Orlek A 

PROVIDER: S-EPMC5426034 | biostudies-literature | 2017 Jun

REPOSITORIES: biostudies-literature

altmetric image

Publications

A curated dataset of complete Enterobacteriaceae plasmids compiled from the NCBI nucleotide database.

Orlek Alex A   Phan Hang H   Sheppard Anna E AE   Doumith Michel M   Ellington Matthew M   Peto Tim T   Crook Derrick D   Walker A Sarah AS   Woodford Neil N   Anjum Muna F MF   Stoesser Nicole N  

Data in brief 20170423


Thousands of plasmid sequences are now publicly available in the NCBI nucleotide database, but they are not reliably annotated to distinguish complete plasmids from plasmid fragments, such as gene or contig sequences; therefore, retrieving complete plasmids for downstream analyses is challenging. Here we present a curated dataset of complete bacterial plasmids from the clinically relevant Enterobacteriaceae family. The dataset was compiled from the NCBI nucleotide database using curation steps d  ...[more]

Similar Datasets

| S-EPMC3245000 | biostudies-literature
| S-EPMC5824777 | biostudies-literature
| S-EPMC5571335 | biostudies-literature
| S-EPMC3818805 | biostudies-literature
| S-EPMC1525205 | biostudies-literature
| S-EPMC5923110 | biostudies-literature
| S-EPMC6940296 | biostudies-literature
| S-EPMC4383940 | biostudies-literature
| S-EPMC10716757 | biostudies-literature
| S-EPMC107441 | biostudies-literature