Unknown

Dataset Information

0

Human Contamination in Public Genome Assemblies.


ABSTRACT: Contamination in genome assembly can lead to wrong or confusing results when using such genome as reference in sequence comparison. Although bacterial contamination is well known, the problem of human-originated contamination received little attention. In this study we surveyed 45,735 available genome assemblies for evidence of human contamination. We used lineage specificity to distinguish between contamination and conservation. We found that 154 genome assemblies contain fragments that with high confidence originate as contamination from human DNA. Majority of contaminating human sequences were present in the reference human genome assembly for over a decade. We recommend that existing contaminated genomes should be revised to remove contaminated sequence, and that new assemblies should be thoroughly checked for presence of human DNA before submitting them to public databases.

SUBMITTER: Kryukov K 

PROVIDER: S-EPMC5017631 | biostudies-literature | 2016

REPOSITORIES: biostudies-literature

altmetric image

Publications

Human Contamination in Public Genome Assemblies.

Kryukov Kirill K   Imanishi Tadashi T  

PloS one 20160909 9


Contamination in genome assembly can lead to wrong or confusing results when using such genome as reference in sequence comparison. Although bacterial contamination is well known, the problem of human-originated contamination received little attention. In this study we surveyed 45,735 available genome assemblies for evidence of human contamination. We used lineage specificity to distinguish between contamination and conservation. We found that 154 genome assemblies contain fragments that with hi  ...[more]

Similar Datasets

| S-EPMC7003083 | biostudies-literature
| S-EPMC4824900 | biostudies-literature
| S-EPMC357027 | biostudies-literature
| S-EPMC3248429 | biostudies-literature
| S-EPMC10158259 | biostudies-literature
| S-EPMC3040168 | biostudies-literature
| PRJNA369439 | ENA
| S-EPMC10849470 | biostudies-literature
| S-EPMC5264480 | biostudies-literature
| S-EPMC6072799 | biostudies-literature