Dataset Information

Clustering of cognate proteins among distinct proteomes derived from multiple links to a single seed sequence.

ABSTRACT:

Background

Modern proteomes evolved by modification of pre-existing ones. It is extremely important to comparative biology that related proteins be identified as members of the same cognate group, since a characterized putative homolog could be used to find clues about the function of uncharacterized proteins from the same group. Typically, databases of related proteins focus on those from completely-sequenced genomes. Unfortunately, relatively few organisms have had their genomes fully sequenced; accordingly, many proteins are ignored by the currently available databases of cognate proteins, despite the high amount of important genes that are functionally described only for these incomplete proteomes.

Results

We have developed a method to cluster cognate proteins from multiple organisms beginning with only one sequence, through connectivity saturation with that Seed sequence. We show that the generated clusters are in agreement with some other approaches based on full genome comparison.

Conclusion

The method produced results that are as reliable as those produced by conventional clustering approaches. Generating clusters based only on individual proteins of interest is less time consuming than generating clusters for whole proteomes.

SUBMITTER: Barbosa-Silva A

PROVIDER: S-EPMC2277401 | biostudies-literature | 2008 Mar

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Clustering of cognate proteins among distinct proteomes derived from multiple links to a single seed sequence.

Barbosa-Silva Adriano A Satagopam Venkata P VP Schneider Reinhard R Ortega J Miguel JM

BMC bioinformatics 20080305

<h4>Background</h4>Modern proteomes evolved by modification of pre-existing ones. It is extremely important to comparative biology that related proteins be identified as members of the same cognate group, since a characterized putative homolog could be used to find clues about the function of uncharacterized proteins from the same group. Typically, databases of related proteins focus on those from completely-sequenced genomes. Unfortunately, relatively few organisms have had their genomes fully ...[more]

PMID: 18321373

Similar Datasets

Project description:BackgroundThe structural and functional features associated with Simple Sequence Proteins (SSPs) are non-globularity, disease states, signaling and post-translational modification. SSPs are also an important source of genetic and possibly phenotypic variation. Analysis of 249 prokaryotic proteomes offers a new opportunity to examine the genomic properties of SSPs.ResultsSSPs are a minority but they grow with proteome size. This relationship is exhibited across species varying in genomic GC, mutational bias, life style, and pathogenicity. Their proportion in each proteome is strongly influenced by genomic base compositional bias. In most species simple duplications is favoured, but in a few cases such as Mycobacteria, large families of duplications occur. Amino acid preference in SSPs exhibits a trend towards low cost of biosynthesis. In SSPs and in non-SSPs, Alanine, Glycine, Leucine, and Valine are abundant in species widely varying in genomic GC whereas Isoleucine and Lysine are rich only in organisms with low genomic GC. Arginine is abundant in SSPs of two species and in the non-SSPs of Xanthomonas oryzae. Asparagine is abundant only in SSPs of low GC species. Aspartic acid is abundant only in the non-SSPs of Halobacterium sp NRC1. The abundance of Serine in SSPs of 62 species extends over a broader range compared to that of non-SSPs. Threonine(T) is abundant only in SSPs of a couple of species. SSPs exhibit preferential association with Cell surface, Cell membrane and Transport functions and a negative association with Metabolism. Mesophiles and Thermophiles display similar ranges in the content of SSPs.ConclusionAlthough SSPs are a minority, the genomic forces of base compositional bias and duplications influence their growth and pattern in each species. The preferences and abundance of amino acids are governed by low biosynthetic cost, evolutionary age and base composition of codons. Abundance of charged amino acids Arginine and Aspartic acid is severely restricted. SSPs preferentially associate with cell surface and interface functions as opposed to metabolism, wherein proteins of high sequence complexity with globular structures are preferred. Mesophiles and Thermophiles are similar with respect to the content of SSPs. Our analysis serves to expand the commonly held views on SSPs.

Project description:Owing to rapid growth in the elucidation of genome sequences of various organisms, deducing proteome sequences has become imperative, in order to have an improved understanding of biological processes. Since the traditional Edman method was unsuitable for high-throughput sequencing and also for N-terminus modified proteins, mass spectrometry (MS) based methods, mainly based on soft ionization modes: electrospray ionization and matrix-assisted laser desorption/ionization, began to gain significance. MS based methods were adaptable for high-throughput studies and applicable for sequencing N-terminus blocked proteins/peptides too. Consequently, over the last decade a new discipline called 'proteomics' has emerged, which encompasses the attributes necessary for high-throughput identification of proteins. 'Proteomics' may also be regarded as an offshoot of the classic field, 'biochemistry'. Many protein sequencing and proteomic investigations were successfully accomplished through MS dependent sequence elucidation of 'short proteolytic peptides (typically: 7-20 amino acid residues), which is called the 'shotgun' or 'bottom-up (BU)' approach. While the BU approach continues as a workhorse for proteomics/protein sequencing, attempts to sequence intact proteins without proteolysis, called the 'top-down (TD)' approach started, due to ambiguities in the BU approach, e.g., protein inference problem, identification of proteoforms and the discovery of posttranslational modifications (PTMs). The high-throughput TD approach (TD proteomics) is yet in its infancy. Nevertheless, TD characterization of purified intact proteins has been useful for detecting PTMs. With the hope to overcome the pitfalls of BU and TD strategies, another concept called the 'middle-down (MD)' approach was put forward. Similar to BU, the MD approach also involves proteolysis, but in a restricted manner, to produce 'longer' proteolytic peptides than the ones usually obtained in BU studies, thereby providing better sequence coverage. In this regard, special proteases (OmpT, Sap9, IdeS) have been used, which can cleave proteins to produce longer proteolytic peptides. By reviewing ample evidences currently existing in the literature that is predominantly on PTM characterization of histones and antibodies, herein we highlight salient features of the MD approach. Consequently, we are inclined to claim that the MD concept might have widespread applications in future for various research areas, such as clinical, biopharmaceuticals (including PTM analysis) and even for general/routine characterization of proteins including therapeutic proteins, but not just limited to analysis of histones or antibodies.

Project description:Facioscapulohumeral muscular dystrophy (FSHD) is a progressive muscle disorder linked to a contraction of the D4Z4 repeat array in the 4q35 subtelomeric region. This deletion induces epigenetic modifications that affect the expression of several genes located in the vicinity. In each D4Z4 element, we identified the double homeobox 4 (DUX4) gene. DUX4 expresses a transcription factor that plays a major role in the development of FSHD through the initiation of a large gene dysregulation cascade that causes myogenic differentiation defects, atrophy and reduced response to oxidative stress. Because miRNAs variably affect mRNA expression, proteomic approaches are required to define the dysregulated pathways in FSHD. In this study, we optimized a differential isotope protein labeling (ICPL) method combined with shotgun proteomic analysis using a gel-free system (2DLC-MS/MS) to study FSHD myotubes. Primary CD56(+) FSHD myoblasts were found to fuse into myotubes presenting various proportions of an atrophic or a disorganized phenotype. To better understand the FSHD myogenic defect, our improved proteomic procedure was used to compare predominantly atrophic or disorganized myotubes to the same matching healthy control. FSHD atrophic myotubes presented decreased structural and contractile muscle components. This phenotype suggests the occurrence of atrophy-associated proteolysis that likely results from the DUX4-mediated gene dysregulation cascade. The skeletal muscle myosin isoforms were decreased while non-muscle myosin complexes were more abundant. In FSHD disorganized myotubes, myosin isoforms were not reduced, and increased proteins were mostly involved in microtubule network organization and myofibrillar remodeling. A common feature of both FSHD myotube phenotypes was the disturbance of several caveolar proteins, such as PTRF and MURC. Taken together, our data suggest changes in trafficking and in the membrane microdomains of FSHD myotubes. Finally, the adjustment of a nuclear fractionation compatible with mass spectrometry allowed us to highlight alterations of proteins involved in mRNA processing and stability.

Dataset Information

Clustering of cognate proteins among distinct proteomes derived from multiple links to a single seed sequence.

Background

Results

Conclusion

Publications

Clustering of cognate proteins among distinct proteomes derived from multiple links to a single seed sequence.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets