Project description:Extrachromosomal mobile genetic elements (eMGEs), including phages and plasmids, that can move across different microbes, play important roles in genome evolution and shaping the structure of microbial communities. However, we still know very little about eMGEs, especially their abundances, distributions and putative functions in microbiomes. Thus, a comprehensive description of eMGEs is of great utility. Here we present mMGE, a comprehensive catalog of 517 251 non-redundant eMGEs, including 92 492 plasmids and 424 759 phages, derived from diverse body sites of 66 425 human metagenomic samples. About half the eMGEs could be further grouped into 70 074 clusters using relaxed criteria (referred as to eMGE clusters below). We provide extensive annotations of the identified eMGEs including sequence characteristics, taxonomy affiliation, gene contents and their prokaryotic hosts. We also calculate the prevalence, both within and across samples for each eMGE and eMGE cluster, enabling users to see putative associations of eMGEs with human phenotypes or their distribution preferences. All eMGE records can be browsed or queried in multiple ways, such as eMGE clusters, metagenomic samples and associated hosts. The mMGE is equipped with a user-friendly interface and a BLAST server, facilitating easy access/queries to all its contents easily. mMGE is freely available for academic use at: https://mgedb.comp-sysbio.org.
Project description:BackgroundElucidating the ecological and biological identity of extrachromosomal mobile genetic elements (eMGEs), such as plasmids and bacteriophages, in the human gut remains challenging due to their high complexity and diversity.ResultsHere, we show efficient identification of eMGEs as complete circular or linear contigs from PacBio long-read metagenomic data. De novo assembly of PacBio long reads from 12 faecal samples generated 82 eMGE contigs (2.5~666.7-kb), which were classified as 71 plasmids and 11 bacteriophages, including 58 novel plasmids and six bacteriophages, and complete genomes of five diverse crAssphages with terminal direct repeats. In a dataset of 413 gut metagenomes from five countries, many of the identified plasmids were highly abundant and prevalent. The ratio of gut plasmids by our plasmid data is more than twice that in the public database. Plasmids outnumbered bacterial chromosomes three to one on average in this metagenomic dataset. Host prediction suggested that Bacteroidetes-associated plasmids predominated, regardless of microbial abundance. The analysis found several plasmid-enriched functions, such as inorganic ion transport, while antibiotic resistance genes were harboured mostly in low-abundance Proteobacteria-associated plasmids.ConclusionsOverall, long-read metagenomics provided an efficient approach for unravelling the complete structure of human gut eMGEs, particularly plasmids.
Project description:Integrated analysis of whole-genome sequencing, long-range optical mapping, single-cell DNA sequencing, and fluorescence in situ hybridization to find extrachromosomal DNA (ecDNA) as the primary source of MYC amplifications and driver fusions in SCLC. ecDNAs bring to proximity enhancer elements and oncogenes through circularization, creating transcription-amplifying units, driving heterogeneity of MYC gene dosage and expression of SCLC lineage-defining transcription factors.
Project description:Integrated analysis of whole-genome sequencing, long-range optical mapping, single-cell DNA sequencing, and fluorescence in situ hybridization to find extrachromosomal DNA (ecDNA) as the primary source of MYC amplifications and driver fusions in SCLC. ecDNAs bring to proximity enhancer elements and oncogenes through circularization, creating transcription-amplifying units, driving heterogeneity of MYC gene dosage and expression of SCLC lineage-defining transcription factors.
Project description:Integrated analysis of whole-genome sequencing, long-range optical mapping, single-cell DNA sequencing, and fluorescence in situ hybridization to find extrachromosomal DNA (ecDNA) as the primary source of MYC amplifications and driver fusions in SCLC. ecDNAs bring to proximity enhancer elements and oncogenes through circularization, creating transcription-amplifying units, driving heterogeneity of MYC gene dosage and expression of SCLC lineage-defining transcription factors.