Project description:The horse, like a majority of animal species, has a limited amount of species-specific expressed sequence data available in public databases. As a result, structural models for a majority of genes defined in the equine genome are predictions based on ab initio sequence analysis or the projection of gene structures from other mammalian species. The current study used Illumina-based sequencing of messenger RNA (RNA-seq) to help refine structural annotation of equine protein-coding genes and for a preliminary assessment of gene expression patterns. Sequencing of mRNA from eight equine tissues generated 293,758,105 thirty five-base sequence tags, equaling 10.28 giga-basepairs of total sequence data. The tag alignments represent approximately 208X coverage of the equine mRNA transcriptome and confirmed transcriptional activity for roughly 90% of the protein-coding gene structures predicted by Ensembl and NCBI. Tag coverage was sufficient to define structural annotation for 11,356 genes, while also identifying an additional 456 transcripts with exon/intron features that are not listed by either Ensembl or NCBI. Genomic locus data and intervals for the protein-coding genes predicted by the Ensembl and NCBI annotation pipelines were combined with 75,116 RNA-seq derived transcriptional units to generate a consensus equine protein-coding gene set of 20,302 defined loci. Gene ontology annotation was used to compare the functional and structural categories of genes expressed in either a tissue-restricted pattern or broadly across all tissue samples. Examination of 8 equine RNA samples representing 6 distinct tissues
Project description:The horse, like a majority of animal species, has a limited amount of species-specific expressed sequence data available in public databases. As a result, structural models for a majority of genes defined in the equine genome are predictions based on ab initio sequence analysis or the projection of gene structures from other mammalian species. The current study used Illumina-based sequencing of messenger RNA (RNA-seq) to help refine structural annotation of equine protein-coding genes and for a preliminary assessment of gene expression patterns. Sequencing of mRNA from eight equine tissues generated 293,758,105 thirty five-base sequence tags, equaling 10.28 giga-basepairs of total sequence data. The tag alignments represent approximately 208X coverage of the equine mRNA transcriptome and confirmed transcriptional activity for roughly 90% of the protein-coding gene structures predicted by Ensembl and NCBI. Tag coverage was sufficient to define structural annotation for 11,356 genes, while also identifying an additional 456 transcripts with exon/intron features that are not listed by either Ensembl or NCBI. Genomic locus data and intervals for the protein-coding genes predicted by the Ensembl and NCBI annotation pipelines were combined with 75,116 RNA-seq derived transcriptional units to generate a consensus equine protein-coding gene set of 20,302 defined loci. Gene ontology annotation was used to compare the functional and structural categories of genes expressed in either a tissue-restricted pattern or broadly across all tissue samples.
Project description:The skin commensal yeast Malassezia is associated with several skin disorders. To establish a reference resource, we sought to determine the complete genome sequence of Malassezia sympodialis and identify its protein-coding genes. A novel genome annotation workflow combining RNA sequencing, proteomics, and manual curation was developed to determine gene structures with high accuracy.
Project description:Structural variation has played an important role in the evolutionary restructuring of human and great ape genomes. We generated approximately 10-fold genomic sequence coverage from a western lowland gorilla and integrated these data into a physical and cytogenetic framework to develop a comprehensive view of structural variation. We discovered and validated over 7,665 structural changes within the gorilla lineage including sequence resolution of inversions, deletions, duplications and retrotranspositions. A comparison with human and other ape genomes shows that the gorilla genome has been subjected to the highest rate of segmental duplication. We show that both the gorilla and chimpanzee genomes have experienced independent yet parallel patterns of structural mutation that have not occurred in humans, including the formation of subtelomeric heterochromatic caps, the hyperexpansion of segmental duplications and bursts of retroviral integrations. Our analysis suggests that the chimpanzee and gorilla genomes are structurally more derived than either orangutan or human.
Project description:Macaque species share over 93% genome homology with humans and develop many disease phenotypes similar to those of humans, making them valuable animal models for the study of human diseases (e.g.,HIV and neurodegenerative diseases). However, the quality of genome assembly and annotation for several macaque species lags behind the human genome effort. To close this gap and enhance functional genomics approaches, we employed a combination of de novo linked-read assembly and scaffolding using proximity ligation assay (HiC) to assemble the pig-tailed macaque (Macaca nemestrina) genome. This combinatorial method yielded large scaffolds at chromosome-level with a scaffold N50 of 127.5 Mb; the 23 largest scaffolds covered 90% of the entire genome. This assembly revealed large-scale rearrangements between pig-tailed macaque chromosomes 7, 12, and 13 and human chromosomes 2, 14, and 15. We subsequently annotated the genome using transcriptome and proteomics data from personalized induced pluripotent stem cells (iPSCs) derived from the same animal. Reconstruction of the evolutionary tree using whole genome annotation and orthologous comparisons among three macaque species, human and mouse genomes revealed extensive homology between human and pig-tailed macaques with regards to both pluripotent stem cell genes and innate immune gene pathways. Our results confirm that rhesus and cynomolgus macaques exhibit a closer evolutionary distance to each other than either species exhibits to humans or pig-tailed macaques. These findings demonstrate that pig-tailed macaques can serve as an excellent animal model for the study of many human diseases particularly with regards to pluripotency and innate immune pathways.
Project description:Structural variation has played an important role in the evolutionary restructuring of human and great ape genomes. We generated approximately 10-fold genomic sequence coverage from a western lowland gorilla and integrated these data into a physical and cytogenetic framework to develop a comprehensive view of structural variation. We discovered and validated over 7,665 structural changes within the gorilla lineage including sequence resolution of inversions, deletions, duplications and retrotranspositions. A comparison with human and other ape genomes shows that the gorilla genome has been subjected to the highest rate of segmental duplication. We show that both the gorilla and chimpanzee genomes have experienced independent yet parallel patterns of structural mutation that have not occurred in humans, including the formation of subtelomeric heterochromatic caps, the hyperexpansion of segmental duplications and bursts of retroviral integrations. Our analysis suggests that the chimpanzee and gorilla genomes are structurally more derived than either orangutan or human. all combinations of human, chimpanzee and gorilla are used in 2 different arrayCGH designs. First, a standard 2.1 was used to detected CNVs, and second, we used a custom designed arrayCGH to validate gorilla specific duplications and deletions
Project description:Genetic variation amongst individual humans occurs on many different scales, ranging from gross alterations in the human karyotype to single-nucleotide changes. In this manuscript we explore variation on an intermediate scale-particularly insertions, deletions, and inversions affecting from a few thousand to a few million base pairs. We employed a clone-based method to interrogate this intermediate structural variation in eight individuals of diverse geographic ancestry. Our analysis provides a comprehensive overview of the normal pattern of structural variation present in these genomes, refining the location of 1695 structural variants. We find that 50% were seen in more than one individual and that nearly half lay outside regions of the genome previously described as structurally variant. We discover 525 new insertion sequences that are not present in the human reference genome and show that many of these are variable in copy number among individuals. Sequencing of a subset of structural variants reveals considerable locus complexity and provides insights into the different mutational processes that have shaped the human genome. These data provide the first high-resolution sequence-map of human structural variation-an important standard for genotyping platforms and a prelude to future individual genome sequencing projects. Keywords: comparitive genomic hybridization, copy number variation, structural variation, fosmid end sequencing