Project description:Centromeres are functionally conserved chromosomal loci essential for proper chromosome segregation during cell division, yet they show high sequence diversity across species. Despite their variation, a near universal feature of centromeres is the presence of repetitive sequences, such as DNA satellites and transposable elements (TEs). Because of their rapidly evolving karyotypes, gibbons represent a compelling model to investigate divergence of functional centromere sequences across short evolutionary timescales. In this study, we use ChIP-seq, RNA-seq, and fluorescence in situ hybridization to comprehensively investigate the centromeric repeat content of the four extant gibbon genera (Hoolock, Hylobates, Nomascus, and Siamang). In all gibbon genera, we find that CENP-A nucleosomes and the DNA-proteins that interface with the inner kinetochore preferentially bind retroelements of broad classes rather than satellite DNA. A previously identified gibbon-specific composite retrotransposon, LAVA, known to be expanded within the centromere regions of one gibbon genus (Hoolock), displays centromere- and species-specific sequence differences, potentially as a result of its co-option to a centromeric function. When dissecting centromere satellite composition, we discovered the presence of the retroelement-derived macrosatellite SST1 in multiple centromeres of Hoolock, whereas alpha-satellites represent the predominate satellite in the other genera, further suggesting an independent evolutionary trajectory for Hoolock centromeres. Finally, using de novo assembly of centromere sequences, we determined that transcripts originating from gibbon centromeres recapitulate the species-specific TE composition. Combined, our data reveal dynamic shifts in the repeat content that define gibbon centromeres and coincide with the extensive karyotypic diversity within this lineage.
Project description:Centromeres are functionally conserved chromosomal loci essential for proper chromosome segregation during cell division, yet they show high sequence diversity across species. A near universal feature of centromeres is the presence of repetitive sequences, such as satellites and transposable elements (TEs). Because of their rapidly evolving karyotypes, gibbons represent a compelling model to investigate divergence of functional centromere sequences across short evolutionary timescales. Previously, we identified a novel composite retrotransposon, LAVA, that is exclusive to gibbons and expanded within the centromere regions of one gibbon genus, Hoolock. In this study, we use ChIP-seq, RNA-seq and fluorescence in situ hybridization to comprehensively investigate the repeat content of centromeres of the four extant gibbon genera (Hoolock, Hylobates, Nomascus and Siamang). We find that CENP-A nucleosomes and the DNA-protein interface with the inner kinetochore are enriched in retroelements in all gibbon genera, rather than satellite DNA. We find that LAVA in Hoolock is enriched in the centromeres of most chromosomes and shows centromere- and species-specific sequence and structural differences compared to other genera, potentially as a result of its co-option to a centromeric function. In contrast, we found that a centromeric retroelement-derived macrosatellite, SST1, corresponds with chromosome breakpoint reuse across gibbons and shows high sequence conservation across genera. Finally, using de novo assembly of centromere-specific sequences, we determine that transcripts originating from gibbon centromeres recapitulate species-specific TE diversity. Combined, our data reveals dynamic, species-specific shifts in repeat content that define gibbon centromeres and coincide with the extensive karyotypic diversity observed within this lineage.
Project description:Centromeres are functionally conserved chromosomal loci essential for proper chromosome segregation during cell division, yet they show high sequence diversity across species. A near universal feature of centromeres is the presence of repetitive sequences, such as satellites and transposable elements (TEs). Because of their rapidly evolving karyotypes, gibbons represent a compelling model to investigate divergence of functional centromere sequences across short evolutionary timescales. Previously, we identified a novel composite retrotransposon, LAVA, that is exclusive to gibbons and expanded within the centromere regions of one gibbon genus, Hoolock. In this study, we use ChIP-seq, RNA-seq and fluorescence in situ hybridization to comprehensively investigate the repeat content of centromeres of the four extant gibbon genera (Hoolock, Hylobates, Nomascus and Siamang). We find that CENP-A nucleosomes and the DNA-protein interface with the inner kinetochore are enriched in retroelements in all gibbon genera, rather than satellite DNA. We find that LAVA in Hoolock is enriched in the centromeres of most chromosomes and shows centromere- and species-specific sequence and structural differences compared to other genera, potentially as a result of its co-option to a centromeric function. In contrast, we found that a centromeric retroelement-derived macrosatellite, SST1, corresponds with chromosome breakpoint reuse across gibbons and shows high sequence conservation across genera. Finally, using de novo assembly of centromere-specific sequences, we determine that transcripts originating from gibbon centromeres recapitulate species-specific TE diversity. Combined, our data reveals dynamic, species-specific shifts in repeat content that define gibbon centromeres and coincide with the extensive karyotypic diversity observed within this lineage.
Project description:Plants have evolved an elaborate innate immune system against invading pathogens. Within this system, intracellular nucleotide-binding leucine-rich repeat (NLR) immune receptors are known play critical roles in effector-triggered immunity (ETI) plant defense. We performed genome-wide identification and classification of NLR-coding sequences from the genomes of pepper, tomato, and potato using fixed criteria. We then compared genomic duplication and evolution features. We identified intact 267, 443, and 755 NLR-encoding genes in tomato, potato, and pepper genomes, respectively. Phylogenetic analysis and classification of Solanaceae NLRs revealed that the majority of NLR super family members fell into 14 subgroups, including a TIR-NLR (TNL) subgroup and 13 non-TNL subgroups. Specific subgroups have expanded in each genome, with the expansion in pepper showing subgroup-specific physical clusters. Comparative analysis of duplications showed distinct duplication patterns within pepper and among Solanaceae plants suggesting subgroup- or species-specific gene duplication events after speciation, resulting in divergent evolution. Taken together, genome-wide analysis of NLR family members provide insights into their evolutionary history in Solanaceae. These findings also provide important foundational knowledge for understanding NLR evolution and will empower broader characterization of disease resistance genes to be used for crop breeding.
Project description:BACKGROUND: Viral genomes often contain metabolic genes that were acquired from host genomes (auxiliary genes). It is assumed that these genes are fixed in viral genomes as a result of a selective force, favoring viruses that acquire specific metabolic functions. While many individual auxiliary genes were observed in viral genomes and metagenomes, there is great importance in investigating the abundance of auxiliary genes and metabolic functions in the marine environment towards a better understanding of their role in promoting viral reproduction. RESULTS: In this study, we searched for enriched viral auxiliary genes and mapped them to metabolic pathways. To initially identify enriched auxiliary genes, we analyzed metagenomic microbial reads from the Global Ocean Survey (GOS) dataset that were characterized as viral, as well as marine virome and microbiome datasets from the Line Islands. Viral-enriched genes were mapped to a "global metabolism network" that comprises all KEGG metabolic pathways. Our analysis of the viral-enriched pathways revealed that purine and pyrimidine metabolism pathways are among the most enriched pathways. Moreover, many other viral-enriched metabolic pathways were found to be closely associated with the purine and pyrimidine metabolism pathways. Furthermore, we observed that sequential reactions are promoted in pathways having a high proportion of enriched genes. In addition, these enriched genes were found to be of modular nature, participating in several pathways. CONCLUSIONS: Our naïve metagenomic analyses strongly support the well-established notion that viral auxiliary genes promote viral replication via both degradation of host DNA and RNA as well as a shift of the host metabolism towards nucleotide biosynthesis, clearly indicating that comparative metagenomics can be used to understand different environments and systems without prior knowledge of pathways involved.
Project description:The functionality of long noncoding RNAs (lncRNAs) is disputed. In general, lncRNAs are under weak selective pressures, suggesting that the majority of lncRNAs may be nonfunctional. However, although some surveys showed negligible phenotypic effects upon lncRNA perturbation, key biological roles were demonstrated for individual lncRNAs. Most lncRNAs with proven functions were implicated in gene expression regulation, in pathways related to cellular pluripotency, differentiation, and organ morphogenesis, suggesting that functional lncRNAs may be more abundant in embryonic development, rather than in adult organs. To test this hypothesis, we perform a multidimensional comparative transcriptomics analysis, across five developmental time points (two embryonic stages, newborn, adult, and aged individuals), four organs (brain, kidney, liver, and testes), and three species (mouse, rat, and chicken). We find that, overwhelmingly, lncRNAs are preferentially expressed in adult and aged testes, consistent with the presence of permissive transcription during spermatogenesis. LncRNAs are often differentially expressed among developmental stages and are less abundant in embryos and newborns compared with adult individuals, in agreement with a requirement for tighter expression control and less tolerance for noisy transcription early in development. For differentially expressed lncRNAs, we find that the patterns of expression variation among developmental stages are generally conserved between mouse and rat. Moreover, lncRNAs expressed above noise levels in somatic organs and during development show higher evolutionary conservation, in particular, at their promoter regions. Thus, we show that functionally constrained lncRNA loci are enriched in developing organs, and we suggest that many of these loci may function in an RNA-independent manner.
Project description:We sequenced two maize bacterial artificial chromosome (BAC) clones anchored by the centromere-specific satellite repeat CentC. The two BACs, consisting of approximately 200 kb of cytologically defined centromeric DNA, are composed exclusively of satellite sequences and retrotransposons that can be classified as centromere specific or noncentromere specific on the basis of their distribution in the maize genome. Sequence analysis suggests that the original maize sequences were composed of CentC arrays that were expanded by retrotransposon invasions. Seven centromere-specific retrotransposons of maize (CRM) were found in BAC 16H10. The CRM elements inserted randomly into either CentC monomers or other retrotransposons. Sequence comparisons of the long terminal repeats (LTRs) of individual CRM elements indicated that these elements transposed within the last 1.22 million years. We observed that all of the previously reported centromere-specific retrotransposons in rice and barley, which belong to the same family as the CRM elements, also recently transposed with the oldest element having transposed approximately 3.8 million years ago. Highly conserved sequence motifs were found in the LTRs of the centromere-specific retrotransposons in the grass species, suggesting that the LTRs may be important for the centromere specificity of this retrotransposon family.
Project description:BACKGROUND: The Solanaceae is a family of closely related species with diverse phenotypes that have been exploited for agronomic purposes. Previous studies involving a small number of genes suggested sequence conservation across the Solanaceae. The availability of large collections of Expressed Sequence Tags (ESTs) for the Solanaceae now provides the opportunity to assess sequence conservation and divergence on a genomic scale. RESULTS: All available ESTs and Expressed Transcripts (ETs), 449,224 sequences for six Solanaceae species (potato, tomato, pepper, petunia, tobacco and Nicotiana benthamiana), were clustered and assembled into gene indices. Examination of gene ontologies revealed that the transcripts within the gene indices encode a similar suite of biological processes. Although the ESTs and ETs were derived from a variety of tissues, 55-81% of the sequences had significant similarity at the nucleotide level with sequences among the six species. Putative orthologs could be identified for 28-58% of the sequences. This high degree of sequence conservation was supported by expression profiling using heterologous hybridizations to potato cDNA arrays that showed similar expression patterns in mature leaves for all six solanaceous species. 16-19% of the transcripts within the six Solanaceae gene indices did not have matches among Solanaceae, Arabidopsis, rice or 21 other plant gene indices. CONCLUSION: Results from this genome scale analysis confirmed a high level of sequence conservation at the nucleotide level of the coding sequence among Solanaceae. Additionally, the results indicated that part of the Solanaceae transcriptome is likely to be unique for each species.
Project description:An RNA probe complementary to the endoglucanase 3 gene (cel-3) of Fibrobacter succinogenes S85 hybridized to chromosomal DNAs from isolates representing the genetic diversity of the genus. The probe was subsequently used to identify putative cel-3-containing clones from genomic libraries of representative Fibrobacter isolates. Comparative sequence analyses of the cloned cel-3 genes confirmed that cel-3 is conserved among Fibrobacter isolates and that the ancestral cel-3 gene appears to have coevolved with the genus, since the same genealogy was inferred from sequence comparisons of 16S rRNAs and cel-3 genes. Hybridization comparisons using a xylanase gene probe suggested similar conservation of this gene. Together the data indicate that the cellulolytic apparatus is conserved among Fibrobacter isolates and that comparative analyses of homologous elements of the apparatus from different members, in relationship to the now established phylogeny of the genus, could serve to better define the enzymatic basis of fiber digestion in this genus.