Project description:The great tit is a widely studied passerine bird species in ecology that, in the past decades, has provided important insights into speciation, phenology, behavior and microevolution. After completion of the great tit genome sequence, a customized high density 650k SNP array was developed enabling more detailed genomic studies in this species.
Project description:The post-genomic era has provided researchers with a deluge of protein sequences. However, a significant fraction of the proteins encoded by sequenced genomes remains without an identified function. Here, we aim at determining how many enzymes of uncertain or unknown function are still present in the Saccharomyces cerevisiae and human proteomes. Using information available in the Swiss-Prot, BRENDA and KEGG databases in combination with a Hidden Markov Model-based method, we estimate that >600 yeast and 2000 human proteins (>30% of their proteins of unknown function) are enzymes whose precise function(s) remain(s) to be determined. This illustrates the impressive scale of the 'unknown enzyme problem'. We extensively review classical biochemical as well as more recent systematic experimental and computational approaches that can be used to support enzyme function discovery research. Finally, we discuss the possible roles of the elusive catalysts in light of recent developments in the fields of enzymology and metabolism as well as the significance of the unknown enzyme problem in the context of metabolic modeling, metabolic engineering and rare disease research.
Project description:We present the analysis of twenty human genomes to evaluate the prospects for identifying rare functional variants that contribute to a phenotype of interest. We sequenced at high coverage ten "case" genomes from individuals with severe hemophilia A and ten "control" genomes. We summarize the number of genetic variants emerging from a study of this magnitude, and provide a proof of concept for the identification of rare and highly-penetrant functional variants by confirming that the cause of hemophilia A is easily recognizable in this data set. We also show that the number of novel single nucleotide variants (SNVs) discovered per genome seems to stabilize at about 144,000 new variants per genome, after the first 15 individuals have been sequenced. Finally, we find that, on average, each genome carries 165 homozygous protein-truncating or stop loss variants in genes representing a diverse set of pathways.
Project description:BackgroundSynaptotagmins exist as a large gene family in mammals. There is much interest in the function of certain family members which act crucially in the regulated synaptic vesicle exocytosis required for efficient neurotransmission. Knowledge of the functions of other family members is relatively poor and the presence of Synaptotagmin genes in plants indicates a role for the family as a whole which is wider than neurotransmission. Identification of the Synaptotagmin genes within completely sequenced genomes can provide the entire Synaptotagmin gene complement of each sequenced organism. Defining the detailed structures of all the Synaptotagmin genes and their encoded products can provide a useful resource for functional studies and a deeper understanding of the evolution of the gene family. The current rapid increase in the number of sequenced genomes from different branches of the tree of life, together with the public deposition of evolutionarily diverse transcript sequences make such studies worthwhile.ResultsI have compiled a detailed list of the Synaptotagmin genes of Caenorhabditis, Anopheles, Drosophila, Ciona, Danio, Fugu, Mus, Homo, Arabidopsis and Oryza by examining genomic and transcript sequences from public sequence databases together with some transcript sequences obtained by cDNA library screening and RT-PCR. I have compared all of the genes and investigated the relationship between plant Synaptotagmins and their non-Synaptotagmin counterparts.ConclusionsI have identified and compared 98 Synaptotagmin genes from 10 sequenced genomes. Detailed comparison of transcript sequences reveals abundant and complex variation in Synaptotagmin gene expression and indicates the presence of Synaptotagmin genes in all animals and land plants. Amino acid sequence comparisons indicate patterns of conservation and diversity in function. Phylogenetic analysis shows the origin of Synaptotagmins in multicellular eukaryotes and their great diversification in animals. Synaptotagmins occur in land plants and animals in combinations of 4-16 in different species. The detailed delineation of the Synaptotagmin genes presented here, will allow easier identification of Synaptotagmins in future. Since the functional roles of many of these genes are unknown, this gene collection provides a useful resource for future studies.
Project description:Escherichia coli is an important component of the biosphere and is an ideal model for studies of processes involved in bacterial genome evolution. Sixty-one publically available E. coli and Shigella spp. sequenced genomes are compared, using basic methods to produce phylogenetic and proteomics trees, and to identify the pan- and core genomes of this set of sequenced strains. A hierarchical clustering of variable genes allowed clear separation of the strains into clusters, including known pathotypes; clinically relevant serotypes can also be resolved in this way. In contrast, when in silico MLST was performed, many of the various strains appear jumbled and less well resolved. The predicted pan-genome comprises 15,741 gene families, and only 993 (6%) of the families are represented in every genome, comprising the core genome. The variable or 'accessory' genes thus make up more than 90% of the pan-genome and about 80% of a typical genome; some of these variable genes tend to be co-localized on genomic islands. The diversity within the species E. coli, and the overlap in gene content between this and related species, suggests a continuum rather than sharp species borders in this group of Enterobacteriaceae.
Project description:BackgroundPlant mitochondrial genomes are known for their complexity, and there is abundant evidence demonstrating that this organelle is important for plant sexual reproduction. Cytoplasmic male sterility (CMS) is a phenomenon caused by incompatibility between the nucleus and mitochondria that has been discovered in various plant species. As the exact sequence of steps leading to CMS has not yet been revealed, efforts should be made to elucidate the factors underlying the mechanism of this important trait for crop breeding.ResultsTwo CMS mitochondrial genomes, LD-CMS, derived from Oryza sativa L. ssp. indica (434,735 bp), and CW-CMS, derived from Oryza rufipogon Griff. (559,045 bp), were newly sequenced in this study. Compared to the previously sequenced Nipponbare (Oryza sativa L. ssp. japonica) mitochondrial genome, the presence of 54 out of 56 protein-encoding genes (including pseudo-genes), 22 tRNA genes (including pseudo-tRNAs), and three rRNA genes was conserved. Two other genes were not present in the CW-CMS mitochondrial genome, and one of them was present as part of the newly identified chimeric ORF, CW-orf307. At least 12 genomic recombination events were predicted between the LD-CMS mitochondrial genome and Nipponbare, and 15 between the CW-CMS genome and Nipponbare, and novel genetic structures were formed by these genomic rearrangements in the two CMS lines. At least one of the genomic rearrangements was completely unique to each CMS line and not present in 69 rice cultivars or 9 accessions of O. rufipogon.ConclusionOur results demonstrate novel mitochondrial genomic rearrangements that are unique in CMS cytoplasm, and one of the genes that is unique in the CW mitochondrial genome, CW-orf307, appeared to be the candidate most likely responsible for the CW-CMS event. Genomic rearrangements were dynamic in the CMS lines in comparison with those of rice cultivars, suggesting that 'death' and possible 'birth' processes of the CMS genes occurred during the breeding history of rice.
Project description:Bacillus Calmette Guerin (BCG) vaccines comprise a family of related strains. Whole genome sequencing has allowed the better characterisation of the differences between many of the BCG vaccines. As sequencing technologies improve, updating of publicly available sequence data becomes common practice. We hereby announce the draft genome of four commonly used BCG vaccines in Brazil, Argentina and Venezuela.
Project description:Understanding the relationship between prokaryotic traits and phylogeny is important for predicting and modeling ecological processes. Microbial extracellular enzymes have a pivotal role in nutrient cycling and the decomposition of organic matter, yet little is known about the phylogenetic distribution of genes encoding these enzymes. In this study, we analyzed 3058 annotated prokaryotic genomes to determine which taxa have the genetic potential to produce alkaline phosphatase, chitinase and ?-N-acetyl-glucosaminidase enzymes. We then evaluated the relationship between the genetic potential for enzyme production and 16S rRNA phylogeny using the consenTRAIT algorithm, which calculated the phylogenetic depth and corresponding 16S rRNA sequence identity of clades of potential enzyme producers. Nearly half (49.2%) of the genomes analyzed were found to be capable of extracellular enzyme production, and these were non-randomly distributed across most prokaryotic phyla. On average, clades of potential enzyme-producing organisms had a maximum phylogenetic depth of 0.008004-0.009780, though individual clades varied broadly in both size and depth. These values correspond to a minimum 16S rRNA sequence identity of 98.04-98.40%. The distribution pattern we found is an indication of microdiversity, the occurrence of ecologically or physiologically distinct populations within phylogenetically related groups. Additionally, we found positive correlations among the genes encoding different extracellular enzymes. Our results suggest that the capacity to produce extracellular enzymes varies at relatively fine-scale phylogenetic resolution. This variation is consistent with other traits that require a small number of genes and provides insight into the relationship between taxonomy and traits that may be useful for predicting ecological function.
Project description:SummaryHere we present Sequence Variant Analyzer (SVA), a software tool that assigns a predicted biological function to variants identified in next-generation sequencing studies and provides a browser to visualize the variants in their genomic contexts. SVA also provides for flexible interaction with software implementing variant association tests allowing users to consider both the bioinformatic annotation of identified variants and the strength of their associations with studied traits. We illustrate the annotation features of SVA using two simple examples of sequenced genomes that harbor Mendelian mutations.Availability and implementationFreely available on the web at http://www.svaproject.org.