Ontology highlight
ABSTRACT:
SUBMITTER: Abram K
PROVIDER: S-EPMC7838162 | biostudies-literature | 2021 Jan
REPOSITORIES: biostudies-literature
Abram Kaleb K Udaondo Zulema Z Bleker Carissa C Wanchai Visanu V Wassenaar Trudy M TM Robeson Michael S MS Ussery David W DW
Communications biology 20210126 1
In this study, more than one hundred thousand Escherichia coli and Shigella genomes were examined and classified. This is, to our knowledge, the largest E. coli genome dataset analyzed to date. A Mash-based analysis of a cleaned set of 10,667 E. coli genomes from GenBank revealed 14 distinct phylogroups. A representative genome or medoid identified for each phylogroup was used as a proxy to classify 95,525 unassembled genomes from the Sequence Read Archive (SRA). We find that most of the sequenc ...[more]