Project description:Genome assemblies are in the process of becoming an increasingly important tool for understanding genetic diversity in threatened species. Unfortunately, due to limited budgets typical for the area of conservation biology, genome assemblies of threatened species, when available, tend to be highly fragmented, represented by tens of thousands of scaffolds not assigned to chromosomal locations. The recent advent of high-throughput chromosome conformation capture (Hi-C) enables more contiguous assemblies containing scaffolds spanning the length of entire chromosomes for little additional cost. These inexpensive contiguous assemblies can be generated using Hi-C scaffolding of existing short-read draft assemblies, where N50 of the draft contigs is larger than 0.1% of the estimated genome size and can greatly improve analyses and facilitate visualization of genome-wide features including distribution of genetic diversity in markers along chromosomes or chromosome-length scaffolds. We compared distribution of genetic diversity along chromosomes of eight mammalian species, including six listed as threatened by IUCN, where both draft genome assemblies and newer chromosome-level assemblies were available. The chromosome-level assemblies showed marked improvement in localization and visualization of genetic diversity, especially where the distribution of low heterozygosity across the genomes of threatened species was not uniform.
Project description:Mosses compose one of the three lineages of bryophytes. Today, about 13,000 species of mosses are recognized from across the globe, and at least one-third of this diversity composes the Hypnales, a lineage characterized by an early rapid radiation. We sequenced and de novo assembled the genomes of two hypnalean mosses, namely Entodon seductrix and Hypnum curvifolium, based on the 10x genomics and Hi-C data. The genome assemblies of E. seductrix and H. curvifolium comprise 348.4 and 262.0 Mb, respectively, estimated by k-mer analyses to represent 93.3% and 97.2% of their total genome size. Both genomes were assembled at the chromosome level, with scaffold N50 of 30.0 and 20.7 Mb, respectively. The annotated genome of E. seductrix comprises 25,801 protein-coding genes and that of H. curvifolium 29,077, estimated to represent 96.8% and 97.2%, respectively, of the total gene spaces based on BUSCO (Benchmarking Universal Single-Copy Ortholog) assessment. For both genomes, most contigs were anchored to the largest 11 pseudomolecules, corresponding to the 11 chromosomes of the two species, and each with a putative sex-related chromosome characterized by low gene density. The chromosomes of E. seductrix and H. curvifolium are highly syntenic, suggests limited architectural shifts occurred following the rapid radiation of the Hypnales. We compared their genomic features to the model moss Physcomitrium patens. The hypnalean moss genomes lack signatures of recent whole-genome duplication. The presented high-quality moss genomes provide new resources for comparative genomics to potentially unveil the genomic evolution of derived moss lineages.
Project description:Previous analyses suggested that the Nicotiana sylvestris CMSII mutant carried a large deletion in its mitochondrial genome. Here, we show by cosmid mapping that the deletion is 60 kb in length and contains several mitochondrial genes or ORFs, including the complex I nad7 gene. However, due to the presence of large duplications in the progenitor mitochondrial genome, the only unique gene that appears to be deleted is nad7. RNA gel blot data confirm the absence of nad7 expression, strongly suggesting that the molecular basis for the CMSII abnormal phenotype, poor growth and male sterility, is the altered complex I structure. The CMSII mitochondrial genome appears to consist essentially of one of two subgenomes resulting from recombination between direct short repeats. In the progenitor mitochondrial genome both recombination products are detected by PCR and, reciprocally, the parental fragments are detected at the substoichiometric level in the mutant. The CMSII mtDNA organization has been maintained through six sexual generations.
Project description:Domestic ducks are raised for meat, eggs and feather down, and almost all varieties are descended from the Mallard (Anas platyrhynchos). Here, we report chromosome-level high-quality genome assemblies for meat and laying duck breeds, and the Mallard. Our new genomic databases contain annotations for thousands of new protein-coding genes and recover a major percentage of the presumed "missing genes" in birds. We obtain the entire genomic sequences for the C-type lectin (CTL) family members that regulate eggshell biomineralization. Our population and comparative genomics analyses provide more than 36 million sequence variants between duck populations. Furthermore, a mutant cell line allows confirmation of the predicted anti-adipogenic function of NR2F2 in the duck, and uncovered mutations specific to Pekin duck that potentially affect adipose deposition. Our study provides insights into avian evolution and the genetics of oviparity, and will be a rich resource for the future genetic improvement of commercial traits in the duck.
Project description:Calamus simplicifolius and Daemonorops jenkinsiana are two representative rattans, the most significant material sources for the rattan industry. However, the lack of reference genome sequences is a major obstacle for basic and applied biology on rattan. We produced two chromosome-level genome assemblies of C. simplicifolius and D. jenkinsiana using Illumina, Pacific Biosciences, and Hi-C sequencing data. A total of ∼730 Gb and ∼682 Gb of raw data covered the predicted genome lengths (∼1.98 Gb of C. simplicifolius and ∼1.61 Gb of D. jenkinsiana) to ∼372 × and ∼426 × read depths, respectively. The two de novo genome assemblies, ∼1.94 Gb and ∼1.58 Gb, were generated with scaffold N50s of ∼160 Mb and ∼119 Mb in C. simplicifolius and D. jenkinsiana, respectively. The C. simplicifolius and D. jenkinsiana genomes were predicted to harbor 51,235 and 53,342 intact protein-coding gene models, respectively. Benchmarking Universal Single-Copy Orthologs evaluation demonstrated that genome completeness reached 96.4% and 91.3% in the C. simplicifolius and D. jenkinsiana genomes, respectively. Genome evolution showed that four Arecaceae plants clustered together, and the divergence time between the two rattans was ∼19.3 million years ago. Additionally, we identified 193 and 172 genes involved in the lignin biosynthesis pathway in the C. simplicifolius and D. jenkinsiana genomes, respectively. We present the first de novo assemblies of two rattan genomes (C. simplicifolius and D. jenkinsiana). These data will not only provide a fundamental resource for functional genomics, particularly in promoting germplasm utilization for breeding, but also serve as reference genomes for comparative studies between and among different species.
Project description:Slavum lentiscoides and Chaetogeoica ovagalla are two aphid species from the subtribe Fordina of Fordini within the subfamily Eriosomatinae, and they produce galls on their primary host plants Pistacia. We assembled chromosome-level genomes of these two species using Nanopore long-read sequencing and Hi-C technology. A 332 Mb genome assembly of S. lentiscoides with a scaffold N50 of 19.77 Mb, including 11,747 genes, and a 289 Mb genome assembly of C. ovagalla with a scaffold N50 of 11.85 Mb, containing 14,492 genes, were obtained. The Benchmarking Universal Single-Copy Orthologs (BUSCO) benchmark of the two genome assemblies reached 93.7% (91.9% single-copy) and 97.0% (95.3% single-copy), respectively. The high-quality genome assemblies in our study provide valuable resources for future genomic research of galling aphids.
Project description:BackgroundAnopheles coluzzii and Anopheles arabiensis belong to the Anopheles gambiae complex and are among the major malaria vectors in sub-Saharan Africa. However, chromosome-level reference genome assemblies are still lacking for these medically important mosquito species.FindingsIn this study, we produced de novo chromosome-level genome assemblies for A. coluzzii and A. arabiensis using the long-read Oxford Nanopore sequencing technology and the Hi-C scaffolding approach. We obtained 273.4 and 256.8 Mb of the total assemblies for A. coluzzii and A. arabiensis, respectively. Each assembly consists of 3 chromosome-scale scaffolds (X, 2, 3), complete mitochondrion, and unordered contigs identified as autosomal pericentromeric DNA, X pericentromeric DNA, and Y sequences. Comparison of these assemblies with the existing assemblies for these species demonstrated that we obtained improved reference-quality genomes. The new assemblies allowed us to identify genomic coordinates for the breakpoint regions of fixed and polymorphic chromosomal inversions in A. coluzzii and A. arabiensis.ConclusionThe new chromosome-level assemblies will facilitate functional and population genomic studies in A. coluzzii and A. arabiensis. The presented assembly pipeline will accelerate progress toward creating high-quality genome references for other disease vectors.
Project description:This study reported the complete nucleotide sequence of the Nicotiana tabacum TN90 chloroplast (cp) genome. The cpDNA was 155 992 bp in length and contained 133 individual genes (79 protein encoding genes, 30 tRNA genes and four rRNA genes). Maximum-likelihood (ML) phylogenetic tree for 17 species with Arabidopsis thaliana, Oryza sativa, and Anomochloa marantoidea as an outgroup resulted in a single tree with - lnL =542 222.71, where the Nicotiana tabacum TN90 plastid was clustered with three previous reported Nicotiana species: N. tomentosiformis, N. undulata and N. tabacum. The TN90 variety of tobacco cp genome sequence reported in this study will accelerate tobacco improvement in the future.
Project description:BackgroundNicotiana sylvestris and Nicotiana tomentosiformis are members of the Solanaceae family that includes tomato, potato, eggplant and pepper. These two Nicotiana species originate from South America and exhibit different alkaloid and diterpenoid production. N. sylvestris is cultivated largely as an ornamental plant and it has been used as a diploid model system for studies of terpenoid production, plastid engineering, and resistance to biotic and abiotic stress. N. sylvestris and N. tomentosiformis are considered to be modern descendants of the maternal and paternal donors that formed Nicotiana tabacum about 200,000 years ago through interspecific hybridization. Here we report the first genome-wide analysis of these two Nicotiana species.ResultsDraft genomes of N. sylvestris and N. tomentosiformis were assembled to 82.9% and 71.6% of their expected size respectively, with N50 sizes of about 80 kb. The repeat content was 72-75%, with a higher proportion of retrotransposons and copia-like long terminal repeats in N. tomentosiformis. The transcriptome assemblies showed that 44,000-53,000 transcripts were expressed in the roots, leaves or flowers. The key genes involved in terpenoid metabolism, alkaloid metabolism and heavy metal transport showed differential expression in the leaves, roots and flowers of N. sylvestris and N. tomentosiformis.ConclusionsThe reference genomes of N. sylvestris and N. tomentosiformis represent a significant contribution to the SOL100 initiative because, as members of the Nicotiana genus of Solanaceae, they strengthen the value of the already existing resources by providing additional comparative information, thereby helping to improve our understanding of plant metabolism and evolution.
Project description:IntroductionPlants are sessile organisms that maximize reproductive success by adapting to their environment. One of the key steps in the reproductive phase of angiosperms is flower development, requiring the perception of multiple endogenous and exogenous signals integrated via a complex regulatory network. Key floral regulators, including the main transcription factor of the photoperiodic pathway (CONSTANS, CO) and the central floral pathway integrator (FLOWERING LOCUS T, FT), are known in many species.Methods and resultsWe identified several CO-like (COL) proteins in tobacco (Nicotiana tabacum). The NtCOL2a/b proteins in the day-neutral plant N. tabacum were most closely related to Arabidopsis CO. We characterized the diurnal expression profiles of corresponding genes in leaves under short-day (SD) and long-day (LD) conditions and confirmed their expression in phloem companion cells. Furthermore, we analyzed the orthologs of NtCOL2a/b in the maternal LD ancestor (N. sylvestris) and paternal, facultative SD ancestor (N. tomentosiformis) of N. tabacum and found that they were expressed in the same diurnal manner. NtCOL2a/b overexpression or knock-out using the CRISPR/Cas9 system did not support a substantial role for the CO homologs in the control of floral transition in N. tabacum. However, NsCOL2 overexpression induced flowering in N. sylvestris under typically non-inductive SD conditions, correlating with the upregulation of the endogenous NsFTd gene.DiscussionOur results suggest that NsFTd is transcriptionally regulated by NsCOL2 and that this COL2-dependent photoperiodic floral induction seems to be lost in N. tabacum, providing insight into the diverse genetics of photoperiod-dependent flowering in different Nicotiana species.