Unknown

Dataset Information

0

Whole Genome Sequences of 23 Species from the Drosophila montium Species Group (Diptera: Drosophilidae): A Resource for Testing Evolutionary Hypotheses.


ABSTRACT: Large groups of species with well-defined phylogenies are excellent systems for testing evolutionary hypotheses. In this paper, we describe the creation of a comparative genomic resource consisting of 23 genomes from the species-rich Drosophila montium species group, 22 of which are presented here for the first time. The montium group is well-positioned for clade genomics. Within the montium clade, evolutionary distances are such that large numbers of sequences can be accurately aligned while also recovering strong signals of divergence; and the distance between the montium group and D. melanogaster is short enough so that orthologous sequence can be readily identified. All genomes were assembled from a single, small-insert library using MaSuRCA, before going through an extensive post-assembly pipeline. Estimated genome sizes within the montium group range from 155 Mb to 223 Mb (mean = 196 Mb). The absence of long-distance information during the assembly process resulted in fragmented assemblies, with the scaffold NG50s varying widely based on repeat content and sample heterozygosity (min = 18 kb, max = 390 kb, mean = 74 kb). The total scaffold length for most assemblies is also shorter than the estimated genome size, typically by 5-15%. However, subsequent analysis showed that our assemblies are highly complete. Despite large differences in contiguity, all assemblies contain at least 96% of known single-copy Dipteran genes (BUSCOs, n = 2,799). Similarly, by aligning our assemblies to the D. melanogaster genome and remapping coordinates for a large set of transcriptional enhancers (n = 3,457), we showed that each montium assembly contains orthologs for at least 91% of D. melanogaster enhancers. Importantly, the genic and enhancer contents of our assemblies are comparable to that of far more contiguous Drosophila assemblies. The alignment of our own D. serrata assembly to a previously published PacBio D. serrata assembly also showed that our longest scaffolds (up to 1 Mb) are free of large-scale misassemblies. Our genome assemblies are a valuable resource that can be used to further resolve the montium group phylogeny; study the evolution of protein-coding genes and cis-regulatory sequences; and determine the genetic basis of ecological and behavioral adaptations.

SUBMITTER: Bronski MJ 

PROVIDER: S-EPMC7202002 | biostudies-literature | 2020 May

REPOSITORIES: biostudies-literature

altmetric image

Publications

Whole Genome Sequences of 23 Species from the <i>Drosophila montium</i> Species Group (Diptera: Drosophilidae): A Resource for Testing Evolutionary Hypotheses.

Bronski Michael J MJ   Martinez Ciera C CC   Weld Holli A HA   Eisen Michael B MB  

G3 (Bethesda, Md.) 20200504 5


Large groups of species with well-defined phylogenies are excellent systems for testing evolutionary hypotheses. In this paper, we describe the creation of a comparative genomic resource consisting of 23 genomes from the species-rich <i>Drosophila montium</i> species group, 22 of which are presented here for the first time. The <i>montium</i> group is well-positioned for clade genomics. Within the <i>montium</i> clade, evolutionary distances are such that large numbers of sequences can be accura  ...[more]

Similar Datasets

| S-EPMC6797199 | biostudies-literature
| S-EPMC6722520 | biostudies-literature
| S-EPMC6160839 | biostudies-other
| S-EPMC4023246 | biostudies-literature
| S-EPMC5360346 | biostudies-literature
| S-EPMC7143264 | biostudies-literature
| S-EPMC6129143 | biostudies-other
| S-EPMC3036140 | biostudies-literature
| S-EPMC2760465 | biostudies-literature
| S-EPMC4962979 | biostudies-literature