Unknown

Dataset Information

0

Misidentification of genome assemblies in public databases: The case of Naumovozyma dairenensis and proposal of a protocol to correct misidentifications.


ABSTRACT: Online sequence databases such as NCBI GenBank serve as a tremendously useful platform for researchers to share and reuse published data. However, submission systems lack control for errors such as organism misidentification, which once entered in the database can be propagated and mislead downstream analyses. Here we present an illustrating case of misidentification of Candida albicans from a clinical sample as Naumovozyma dairenensis based on whole-genome shotgun data. Analyses of phylogenetic markers, read mapping and single nucleotide polymorphisms served to correct the identification. We propose that the routine use of such analyses could help to detect misidentifications arising from unsupervised analyses and correct them before they enter the databases. Finally, we discuss broader implications of such misidentifications and the difficulty of correcting them once they are in the records.

SUBMITTER: Stavrou AA 

PROVIDER: S-EPMC6001429 | biostudies-literature | 2018 Jun

REPOSITORIES: biostudies-literature

altmetric image

Publications

Misidentification of genome assemblies in public databases: The case of Naumovozyma dairenensis and proposal of a protocol to correct misidentifications.

Stavrou Aimilia A AA   Mixão Verónica V   Boekhout Teun T   Gabaldón Toni T  

Yeast (Chichester, England) 20180222 6


Online sequence databases such as NCBI GenBank serve as a tremendously useful platform for researchers to share and reuse published data. However, submission systems lack control for errors such as organism misidentification, which once entered in the database can be propagated and mislead downstream analyses. Here we present an illustrating case of misidentification of Candida albicans from a clinical sample as Naumovozyma dairenensis based on whole-genome shotgun data. Analyses of phylogenetic  ...[more]

Similar Datasets

| S-EPMC5017631 | biostudies-literature
2009-11-30 | GSE18619 | GEO
2010-05-19 | E-GEOD-18619 | biostudies-arrayexpress
| S-EPMC8208688 | biostudies-literature
| S-EPMC2243317 | biostudies-other
| S-EPMC4473057 | biostudies-literature
| S-EPMC6385979 | biostudies-other
| S-EPMC10902322 | biostudies-literature
| S-EPMC3171281 | biostudies-literature
| S-EPMC7045990 | biostudies-literature