Dataset Information

Efficient ?, ?-motif finder for identification of phenotype-related functional modules.

ABSTRACT: BACKGROUND: Microbial communities in their natural environments exhibit phenotypes that can directly cause particular diseases, convert biomass or wastewater to energy, or degrade various environmental contaminants. Understanding how these communities realize specific phenotypic traits (e.g., carbon fixation, hydrogen production) is critical for addressing health, bioremediation, or bioenergy problems. RESULTS: In this paper, we describe a graph-theoretical method for in silico prediction of the cellular subsystems that are related to the expression of a target phenotype. The proposed (?, ?)-motif finder approach allows for identification of these phenotype-related subsystems that, in addition to metabolic subsystems, could include their regulators, sensors, transporters, and even uncharacterized proteins. By comparing dozens of genome-scale networks of functionally associated proteins, our method efficiently identifies those statistically significant functional modules that are in at least ? networks of phenotype-expressing organisms but appear in no more than ? networks of organisms that do not exhibit the target phenotype. It has been shown via various experiments that the enumerated modules are indeed related to phenotype-expression when tested with different target phenotypes like hydrogen production, motility, aerobic respiration, and acid-tolerance. CONCLUSION: Thus, we have proposed a methodology that can identify potential statistically significant phenotype-related functional modules. The functional module is modeled as an (?, ?)-clique, where ? and ? are two criteria introduced in this work. We also propose a novel network model, called the two-typed, divided network. The new network model and the criteria make the problem tractable even while very large networks are being compared. The code can be downloaded from http://www.freescience.org/cs/ABClique/

SUBMITTER: Schmidt MC

PROVIDER: S-EPMC3287386 | biostudies-literature | 2011

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Efficient α, β-motif finder for identification of phenotype-related functional modules.

Schmidt Matthew C MC Rocha Andrea M AM Padmanabhan Kanchana K Chen Zhengzhang Z Scott Kathleen K Mihelcic James R JR Samatova Nagiza F NF

BMC bioinformatics 20111111

<h4>Background</h4>Microbial communities in their natural environments exhibit phenotypes that can directly cause particular diseases, convert biomass or wastewater to energy, or degrade various environmental contaminants. Understanding how these communities realize specific phenotypic traits (e.g., carbon fixation, hydrogen production) is critical for addressing health, bioremediation, or bioenergy problems.<h4>Results</h4>In this paper, we describe a graph-theoretical method for in silico pred ...[more]

PMID: 22078292

Similar Datasets

Project description:BackgroundIdentifying cellular subsystems that are involved in the expression of a target phenotype has been a very active research area for the past several years. In this paper, cellular subsystem refers to a group of genes (or proteins) that interact and carry out a common function in the cell. Most studies identify genes associated with a phenotype on the basis of some statistical bias, others have extended these statistical methods to analyze functional modules and biological pathways for phenotype-relatedness. However, a biologist might often have a specific question in mind while performing such analysis and most of the resulting subsystems obtained by the existing methods might be largely irrelevant to the question in hand. Arguably, it would be valuable to incorporate biologist's knowledge about the phenotype into the algorithm. This way, it is anticipated that the resulting subsytems would not only be related to the target phenotype but also contain information that the biologist is likely to be interested in.ResultsIn this paper we introduce a fast and theoretically guranteed method called DENSE (Dense and ENriched Subgraph Enumeration) that can take in as input a biologist's prior knowledge as a set of query proteins and identify all the dense functional modules in a biological network that contain some part of the query vertices. The density (in terms of the number of network egdes) and the enrichment (the number of query proteins in the resulting functional module) can be manipulated via two parameters γ and μ, respectively.ConclusionThis algorithm has been applied to the protein functional association network of Clostridium acetobutylicum ATCC 824, a hydrogen producing, acid-tolerant organism. The algorithm was able to verify relationships known to exist in literature and also some previously unknown relationships including those with regulatory and signaling functions. Additionally, we were also able to hypothesize that some uncharacterized proteins are likely associated with the target phenotype. The DENSE code can be downloaded from http://www.freescience.org/cs/DENSE/

Project description:MotivationProtein phosphorylation, driven by specific recognition of substrates by kinases and phosphatases, plays central roles in a variety of important cellular processes such as signaling and enzyme activation. Mass spectrometry enables the determination of phosphorylated peptides (and thereby proteins) in scenarios ranging from targeted in vitro studies to in vivo cell lysates under particular conditions. The characterization of commonalities among identified phosphopeptides provides insights into the specificities of the kinases involved in a study. Several algorithms have been developed to uncover linear motifs representing position-specific amino acid patterns in sets of phosphopeptides. To more fully capture the available information, reduce sensitivity to both parameter choices and natural experimental variation, and develop more precise characterizations of kinase specificities, it is necessary to determine all statistically significant motifs represented in a dataset.ResultsWe have developed MMFPh (Maximal Motif Finder for Phosphoproteomics datasets), which extends the approach of the popular phosphorylation motif software Motif-X (Schwartz and Gygi, 2005) to identify all statistically significant motifs and return the maximal ones (those not subsumed by motifs with more fixed amino acids). In tests with both synthetic and experimental data, we show that MMFPh finds important motifs missed by the greedy approach of Motif-X, while also finding more motifs that are more characteristic of the dataset relative to the background proteome. Thus MMFPh is in some sense both more sensitive and more specific in characterizing the involved kinases. We also show that MMFPh compares favorably to other recent methods for finding phosphorylation motifs. Furthermore, MMFPh is less dependent on parameter choices. We support this powerful new approach with a web interface so that it may become a useful tool for studies of kinase specificity and phosphorylation site prediction.AvailabilityA web server is at www.cs.dartmouth.edu/~cbk/.

Project description:A central problem in the bioinformatics of gene regulation is to find the binding sites for regulatory proteins. One of the most promising approaches toward identifying these short and fuzzy sequence patterns is the comparative analysis of orthologous intergenic regions of related species. This analysis is complicated by various factors. First, one needs to take the phylogenetic relationship between the species into account in order to distinguish conservation that is due to the occurrence of functional sites from spurious conservation that is due to evolutionary proximity. Second, one has to deal with the complexities of multiple alignments of orthologous intergenic regions, and one has to consider the possibility that functional sites may occur outside of conserved segments. Here we present a new motif sampling algorithm, PhyloGibbs, that runs on arbitrary collections of multiple local sequence alignments of orthologous sequences. The algorithm searches over all ways in which an arbitrary number of binding sites for an arbitrary number of transcription factors (TFs) can be assigned to the multiple sequence alignments. These binding site configurations are scored by a Bayesian probabilistic model that treats aligned sequences by a model for the evolution of binding sites and "background" intergenic DNA. This model takes the phylogenetic relationship between the species in the alignment explicitly into account. The algorithm uses simulated annealing and Monte Carlo Markov-chain sampling to rigorously assign posterior probabilities to all the binding sites that it reports. In tests on synthetic data and real data from five Saccharomyces species our algorithm performs significantly better than four other motif-finding algorithms, including algorithms that also take phylogeny into account. Our results also show that, in contrast to the other algorithms, PhyloGibbs can make realistic estimates of the reliability of its predictions. Our tests suggest that, running on the five-species multiple alignment of a single gene's upstream region, PhyloGibbs on average recovers over 50% of all binding sites in S. cerevisiae at a specificity of about 50%, and 33% of all binding sites at a specificity of about 85%. We also tested PhyloGibbs on collections of multiple alignments of intergenic regions that were recently annotated, based on ChIP-on-chip data, to contain binding sites for the same TF. We compared PhyloGibbs's results with the previous analysis of these data using six other motif-finding algorithms. For 16 of 21 TFs for which all other motif-finding methods failed to find a significant motif, PhyloGibbs did recover a motif that matches the literature consensus. In 11 cases where there was disagreement in the results we compiled lists of known target genes from the literature, and found that running PhyloGibbs on their regulatory regions yielded a binding motif matching the literature consensus in all but one of the cases. Interestingly, these literature gene lists had little overlap with the targets annotated based on the ChIP-on-chip data. The PhyloGibbs code can be downloaded from http://www.biozentrum.unibas.ch/~nimwegen/cgi-bin/phylogibbs.cgi or http://www.imsc.res.in/~rsidd/phylogibbs. The full set of predicted sites from our tests on yeast are available at http://www.swissregulon.unibas.ch.

Project description:The use of solar photovoltaic systems (PVs) is increasing as a clean and affordable source of electric energy. The Pv cell is the main component of the PV system. To improve the performance, control, and evaluation of the PV system, it is necessary to provide accurate design and to define the intrinsic parameters of the solar cells. There are many methods for optimizing the parameters of the solar cells. The first class of methods is called the analytical methods that provide the model parameters using datasheet information or I-V curve data. The second class of methods is the optimization-based methods that define the problem as an optimization problem. The optimization problem objective is to minimize the error metrics and it is solved using metaheuristic optimization algorithms. The third class of methods is composed of a hybrid of both the analytical and the metaheuristic approaches, some parameters are computed by the analytical approach and the rest are found using metaheuristic optimization algorithms. Research in this area faces two challenges; (1) finding an optimal model for the parameters of the solar cells and (2) the lack of data about the photovoltaic cells. This paper proposes an optimization-based algorithm for accurately estimating the parameters of solar cells. It is using the Improved Equilibrium Optimizer algorithm (IEO). This algorithm is improved using the Opposition Based Learning (OBL) at the initialization phase of EO to improve its population diversity in the search space. Opposition-based Learning (OBL) is a new concept in machine learning inspired by the opposite relationship among entities. There are two common models for solar cells; the single diode model (SDM) and double diode model (DDM) have been used to demonstrate the capabilities of IEO in estimating the parameters of solar cells. The proposed methodology can find accurate solutions while reducing the computational cost. Compared to other existing techniques, the proposed algorithm yields less mean absolute error. The results were compared with seven optimization algorithms using data of different solar cells and PV panels. The experimental results revealed that IEO is superior to the most competitive algorithms in terms of the accuracy of the final solutions.

Project description:BackgroundEchinococcosis caused by larval of Echinococcus is prevalent all over the world. Although clinical experience showed that the presence of tapeworms could not be found in liver lesions, the repeated infection and aggravation of lesions still occur in the host. Here, this study constructed a multifactor-driven disease-related dysfunction network to explore the potential molecular pathogenesis mechanism in different hosts after E.multilocularis infection.MethodFirst, iTRAQ sequencing was performed on human liver infected with E.multilocularis. Second, obtained microRNAs(miRNAs) expression profiles of humans and canine infected with Echinococcus from the GEO database. In addition, we also performed differential expression analysis, protein interaction network analysis, enrichment analysis, and crosstalk analysis to obtain genes and modules related to E.multilocularis infection. Pivot analysis is used to calculate the potential regulatory effects of multiple factors on the module and identify related non-coding RNAs(ncRNAs) and transcription factors(TFs). Finally, we screened the target genes of miRNAs of Echinococcus to further explore its infection mechanism.ResultsA total of 267 differentially expressed proteins from humans and 3,635 differentially expressed genes from canine were obtained. They participated in 16 human-related dysfunction modules and five canine-related dysfunction modules, respectively. Both human and canine dysfunction modules are significantly involved in BMP signaling pathway and TGF-beta signaling pathway. In addition, pivot analysis found that 1,129 ncRNAs and 110 TFs significantly regulated human dysfunction modules, 158 ncRNAs and nine TFs significantly regulated canine dysfunction modules. Surprisingly, the Echinococcus miR-184 plays a role in the pathogenicity regulation by targeting nine TFs and one ncRNA in humans. Similarly, miR-184 can also cause physiological dysfunction by regulating two transcription factors in canine.ConclusionThe results show that the miRNA-184 of Echinococcus can regulate the pathogenic process through various biological functions and pathways. The results laid a solid theoretical foundation for biologists to further explore the pathogenic mechanism of Echinococcosis.

Dataset Information

Efficient ?, ?-motif finder for identification of phenotype-related functional modules.

Publications

Efficient α, β-motif finder for identification of phenotype-related functional modules.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets