Project description:Positive-strand RNA viruses of the order Nidovirales have the largest known RNA genomes of vertebrate and invertebrate viruses with 36.7 and 41.1 kb, respectively. The acquisition of a proofreading exoribonuclease (ExoN) locus by an ancestral nidovirus enabled crossing of the 20 kb barrier. Other factors constraining genome expansions in nidoviruses remain poorly defined. Here, we assemble 76 genome sequences of invertebrate nidoviruses from >500.000 published transcriptome experiments and triple the number of known nidoviruses with >36 kb genomes, including the largest known 64 kb RNA genome. Many of the novel viral lineages acquired putative enzymatic domains that were inserted in open reading frame (ORF) 1a and ORF1b or equivalent regions and may constitute cofactors of the viral replicase or modulate infection otherwise. We classify multi-cistronic ExoN-encoding nidoviruses into seven groups and four subgroups, according to canonical and non-canonical modes of viral polymerase expression by ribosomes and genomic organization (reModes). The largest group employing the canonical reMode comprises invertebrate and vertebrate nidoviruses, including coronaviruses, with genomes ranging from 20 to 36 kb. Six groups with non-canonical reModes include giant invertebrate nidoviruses with 31 to 64 kb genomes. Among them are viruses with segmented genomes and viruses utilizing dual ribosomal frameshifting that we validate experimentally. Moreover, polyprotein length and genome size in nidoviruses show reMode- and host phylum-dependent relationships. We demonstrate that the largest polyproteins in nidoviruses may be close to an upper limit that we hypothesize to be determined by the host-inherent translation fidelity, further constraining nidovirus genome size. Thus, expansion of giant RNA virus genomes, the vertebrate/invertebrate host division, the control of viral replicase expression, and translation fidelity are interconnected.
Project description:The skin commensal yeast Malassezia is associated with several skin disorders. To establish a reference resource, we sought to determine the complete genome sequence of Malassezia sympodialis and identify its protein-coding genes. A novel genome annotation workflow combining RNA sequencing, proteomics, and manual curation was developed to determine gene structures with high accuracy.
Project description:The association of genetic variation with disease and drug response, together with improvements in nucleic acids technologies, has given great optimism for the impact of 'genomic medicine'. However, the formidable size of the diploid human genome has prevented the routine application of sequencing methods to deciphering complete individual human genomes, and has so far limited the realization of the full potential of genomics for science and human health. Working towards the goal of harnessing the power of genomics, we sequenced the diploid genome of a single individual, Dr. James D. Watson, using a massively-parallel method of sequencing in picoliter size reaction vessels. Here we report the results of genotyping the subject's DNA using an Affymetrix 500k GeneChip as well as copy number variations as reported by Agilent 244k comparative genomic hybridization arrays. Keywords: Genotyping, copy number variation (CNV), aCGH