Dataset Information

Panning for genes--A visual strategy for identifying novel gene orthologs and paralogs.

ABSTRACT: We have developed a rapid visual method for identifying novel members of gene families. Starting with an evolutionary tree, 20-50 protein query sequences for a gene family are selected from different branches of the tree. These query sequences are used to search the GenBank and expressed sequence tag (EST) DNA databases and their nightly updates using the tfastx3 or tfasty3 programs. The results of all 20-50 searches are collated and resorted to highlight EST or genomic sequences that share significant similarity with the query sequences. The statistical significance of each DNA/protein alignment is plotted, highlighting the portion of the query sequence that is present in the database sequence and the percent identity in the aligned region. The collated results for database sequences are linked using the WWW to the underlying scores and alignments; these links can also be used to perform additional searches to characterize the novel sequence further. With traditional "deep" scoring matrices (BLOSUM50) one can search for previously unrecognized families of large protein superfamilies. Alternatively, by using query sequences and EST libraries from the same species (e. g., human or mouse) together with "shallow" scoring matrices and filters that remove high-identity sequences, one can highlight new paralogs of previously described subfamilies. Using query sequences from the glutathione transferase superfamily, we identified two novel mammalian glutathione transferase families that were recognized previously only in plants. Using query sequences from known mammalian glutathione transferase subfamilies, we identified new candidate paralogs from the mouse class-mu, class-pi, and class-theta families.

SUBMITTER: Retief JD

PROVIDER: S-EPMC310732 | biostudies-literature | 1999 Apr

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Panning for genes--A visual strategy for identifying novel gene orthologs and paralogs.

Retief J D JD Lynch K R KR Pearson W R WR

Genome research 19990401 4

We have developed a rapid visual method for identifying novel members of gene families. Starting with an evolutionary tree, 20-50 protein query sequences for a gene family are selected from different branches of the tree. These query sequences are used to search the GenBank and expressed sequence tag (EST) DNA databases and their nightly updates using the tfastx3 or tfasty3 programs. The results of all 20-50 searches are collated and resorted to highlight EST or genomic sequences that share sign ...[more]

PMID: 10207159

Similar Datasets

Project description:Despite over a billion years of evolutionary divergence, several thousand human genes possess clearly identifiable orthologs in yeast, and many have undergone lineage-specific duplications in one or both lineages. These duplicated genes may have been free to diverge in function since their expansion, and it is unclear how or at what rate ancestral functions are retained or partitioned among co-orthologs between species and within gene families. Thus, in order to investigate how ancestral functions are retained or lost post-duplication, we systematically replaced hundreds of essential yeast genes with their human orthologs from gene families that have undergone lineage-specific duplications, including those with single duplications (1 yeast gene to 2 human genes, 1:2) or higher-order expansions (1:>2) in the human lineage. We observe a variable pattern of replaceability across different ortholog classes, with an obvious trend toward differential replaceability inside gene families, and rarely observe replaceability by all members of a family. We quantify the ability of various properties of the orthologs to predict replaceability, showing that in the case of 1:2 orthologs, replaceability is predicted largely by the divergence and tissue-specific expression of the human co-orthologs, i.e., the human proteins that are less diverged from their yeast counterpart and more ubiquitously expressed across human tissues more often replace their single yeast ortholog. These trends were consistent with in silico simulations demonstrating that when only one ortholog can replace its corresponding yeast equivalent, it tends to be the least diverged of the pair. Replaceability of yeast genes having more than 2 human co-orthologs was marked by retention of orthologous interactions in functional or protein networks as well as by more ancestral subcellular localization. Overall, we performed >400 human gene replaceability assays, revealing 50 new human-yeast complementation pairs, thus opening up avenues to further functionally characterize these human genes in a simplified organismal context.

Project description:The monofunctional penicillin-binding DD-peptidases and penicillin-hydrolyzing serine beta-lactamases diverged from a common ancestor by the acquisition of structural changes in the polypeptide chain while retaining the same folding, three-motif amino acid sequence signature, serine-assisted catalytic mechanism, and active-site topology. Fusion events gave rise to multimodular penicillin-binding proteins (PBPs). The acyl serine transferase penicillin-binding (PB) module possesses the three active-site defining motifs of the superfamily; it is linked to the carboxy end of a non-penicillin-binding (n-PB) module through a conserved fusion site; the two modules form a single polypeptide chain which folds on the exterior of the plasma membrane and is anchored by a transmembrane spanner; and the full-size PBPs cluster into two classes, A and B. In the class A PBPs, the n-PB modules are a continuum of diverging sequences; they possess a five-motif amino acid sequence signature, and conserved dicarboxylic amino acid residues are probably elements of the glycosyl transferase catalytic center. The PB modules fall into five subclasses: A1 and A2 in gram-negative bacteria and A3, A4, and A5 in gram-positive bacteria. The full-size class A PBPs combine the required enzymatic activities for peptidoglycan assembly from lipid-transported disaccharide-peptide units and almost certainly prescribe different, PB-module specific traits in peptidoglycan cross-linking. In the class B PBPs, the PB and n-PB modules cluster in a concerted manner. A PB module of subclass B2 or B3 is linked to an n-PB module of subclass B2 or B3 in gram-negative bacteria, and a PB module of subclass B1, B4, or B5 is linked to an n-PB module of subclass B1, B4, or B5 in gram-positive bacteria. Class B PBPs are involved in cell morphogenesis. The three motifs borne by the n-PB modules are probably sites for module-module interaction and the polypeptide stretches which extend between motifs 1 and 2 are sites for protein-protein interaction. The full-size class B PBPs are an assortment of orthologs and paralogs, which prescribe traits as complex as wall expansion and septum formation. PBPs of subclass B1 are unique to gram-positive bacteria. They are not essential, but they represent an important mechanism of resistance to penicillin among the enterococci and staphylococci. Natural evolution and PBP- and beta-lactamase-mediated resistance show that the ability of the catalytic centers to adapt their properties to new situations is limitless. Studies of the reaction pathways by using the methods of quantum chemistry suggest that resistance to penicillin is a road of no return.

Project description:A recent paper (Nehrt et al., PLoS Comput. Biol. 7:e1002073, 2011) has proposed a metric for the "functional similarity" between two genes that uses only the Gene Ontology (GO) annotations directly derived from published experimental results. Applying this metric, the authors concluded that paralogous genes within the mouse genome or the human genome are more functionally similar on average than orthologous genes between these genomes, an unexpected result with broad implications if true. We suggest, based on both theoretical and empirical considerations, that this proposed metric should not be interpreted as a functional similarity, and therefore cannot be used to support any conclusions about the "ortholog conjecture" (or, more properly, the "ortholog functional conservation hypothesis"). First, we reexamine the case studies presented by Nehrt et al. as examples of orthologs with divergent functions, and come to a very different conclusion: they actually exemplify how GO annotations for orthologous genes provide complementary information about conserved biological functions. We then show that there is a global ascertainment bias in the experiment-based GO annotations for human and mouse genes: particular types of experiments tend to be performed in different model organisms. We conclude that the reported statistical differences in annotations between pairs of orthologous genes do not reflect differences in biological function, but rather complementarity in experimental approaches. Our results underscore two general considerations for researchers proposing novel types of analysis based on the GO: 1) that GO annotations are often incomplete, potentially in a biased manner, and subject to an "open world assumption" (absence of an annotation does not imply absence of a function), and 2) that conclusions drawn from a novel, large-scale GO analysis should whenever possible be supported by careful, in-depth examination of examples, to help ensure the conclusions have a justifiable biological basis.

Dataset Information

Panning for genes--A visual strategy for identifying novel gene orthologs and paralogs.

Publications

Panning for genes--A visual strategy for identifying novel gene orthologs and paralogs.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets