Dataset Information

PairsDB atlas of protein sequence space.

ABSTRACT: Sequence similarity/database searching is a cornerstone of molecular biology. PairsDB is a database intended to make exploring protein sequences and their similarity relationships quick and easy. Behind PairsDB is a comprehensive collection of protein sequences and BLAST and PSI-BLAST alignments between them. Instead of running BLAST or PSI-BLAST individually on each request, results are retrieved instantaneously from a database of pre-computed alignments. Filtering options allow you to find a set of sequences satisfying a set of criteria-for example, all human proteins with solved structure and without transmembrane segments. PairsDB is continually updated and covers all sequences in Uniprot. The data is stored in a MySQL relational database. Data files will be made available for download at ftp://nic.funet.fi/pub/sci/molbio. PairsDB can also be accessed interactively at http://pairsdb.csc.fi. PairsDB data is a valuable platform to build various downstream automated analysis pipelines. For example, the graph of all-against-all similarity relationships is the starting point for clustering protein families, delineating domains, improving alignment accuracy by consistency measures, and defining orthologous genes. Moreover, query-anchored stacked sequence alignments, profiles and consensus sequences are useful in studies of sequence conservation patterns for clues about possible functional sites.

SUBMITTER: Heger A

PROVIDER: S-EPMC2238971 | biostudies-literature | 2008 Jan

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

PairsDB atlas of protein sequence space.

Heger Andreas A Korpelainen Eija E Hupponen Taavi T Mattila Kimmo K Ollikainen Vesa V Holm Liisa L

Nucleic acids research 20071105 Database issue

Sequence similarity/database searching is a cornerstone of molecular biology. PairsDB is a database intended to make exploring protein sequences and their similarity relationships quick and easy. Behind PairsDB is a comprehensive collection of protein sequences and BLAST and PSI-BLAST alignments between them. Instead of running BLAST or PSI-BLAST individually on each request, results are retrieved instantaneously from a database of pre-computed alignments. Filtering options allow you to find a s ...[more]

PMID: 17986464

Similar Datasets

Project description:The specificity of the kinase-regulator interaction is driven by a limited set of interfacial residues in each protein that strongly. To identify combinations of these interface residues that are functional and potentially insulated from existing two-component signaling pathways in E. coli, we constructed a dual library of mutants in which the key, coevolving interface residues of a canonical two-component system, PhoQ and PhoP, were randomized. We used NNS codons to randomize six residues in PhoQ and five residues in PhoP, all of which lie at the interface formed by the two proteins in complex and that are critical to determining partner specificity in all two-component signaling pathways. To identify functional combinations of residues, we first grew the library of PhoQ-PhoP variants overnight in medium with low Mg2+, which activates PhoQ. Because cells must phosphorylate PhoP to grow when extracellular Mg2+ is limiting, this step enriches for functional PhoQ-PhoP variants. Variants that survived selection in limiting Mg2+ were then subjected to Sort-Seq, using fluorescence-activated cell sorting (FACS) and deep sequencing to quantify the signal responsiveness of variants in the library. To gauge the phosphorylation of PhoP in vivo, we used a fluorescent transcriptional reporter, PmgrB-yfp. In the presence of low extracellular Mg2+, functional PhoQ promotes the phosphorylation of PhoP and the production of YFP, whereas in the presence of high concentrations of Mg2+, PhoQ drives the dephosphorylation of PhoP, limiting the accumulation of YFP). The library was grown in each condition for 6 hours before sorting and sequencing. To identify variants that are signal responsive and drive YFP production specifically in low Mg2+, we sorted cells from each condition into 8 separate bins and deep sequenced the randomized regions of variants collected in each bin. We then calculated the frequency of each variant in each bin to yield the distributions of individual variants in low and high Mg2+, which were fit to Gaussians. From these fits, we assessed the mean level of YFP in each condition and the fold-induction, or signal responsiveness, of each variant detected in the library. The 11 codons / amino acids listed in this dataset refer to codons 12, 14, 15, 18, and 19 in PhoP and codons 284, 288, 289, 292, 302, and 303 in PhoQ, in that order.

Dataset Information

PairsDB atlas of protein sequence space.

Publications

PairsDB atlas of protein sequence space.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets