Dataset Information

Implementation of homology based and non-homology based computational methods for the identification and annotation of orphan enzymes: using Mycobacterium tuberculosis H37Rv as a case study.

ABSTRACT:

Background

Homology based methods are one of the most important and widely used approaches for functional annotation of high-throughput microbial genome data. A major limitation of these methods is the absence of well-characterized sequences for certain functions. The non-homology methods based on the context and the interactions of a protein are very useful for identifying missing metabolic activities and functional annotation in the absence of significant sequence similarity. In the current work, we employ both homology and context-based methods, incrementally, to identify local holes and chokepoints, whose presence in the Mycobacterium tuberculosis genome is indicated based on its interaction with known proteins in a metabolic network context, but have not been annotated. We have developed two computational procedures using network theory to identify orphan enzymes ('Hole finding protocol') coupled with the identification of candidate proteins for the predicted orphan enzyme ('Hole filling protocol'). We propose an integrated interaction score based on scores from the STRING database to identify candidate protein sequences for the orphan enzymes from M. tuberculosis, as a case study, which are most likely to perform the missing function.

Results

The application of an automated homology-based enzyme identification protocol, ModEnzA, on M. tuberculosis genome yielded 56 novel enzyme predictions. We further predicted 74 putative local holes, 6 choke points, and 3 high confidence local holes in the genome using 'Hole finding protocol'. The 'Hole-filling protocol' was validated on the E. coli genome using artificial in-silico enzyme knockouts where our method showed 25% increased accuracy, compared to other methods, in assigning the correct sequence for the knocked-out enzyme amongst the top 10 ranks. The method was further validated on 8 additional genomes.

Conclusions

We have developed methods that can be generalized to augment homology-based annotation to identify missing enzyme coding genes and to predict a candidate protein for them. For pathogens such as M. tuberculosis, this work holds significance in terms of increasing the protein repertoire and thereby, the potential for identifying novel drug targets.

SUBMITTER: Sinha S

PROVIDER: S-EPMC7574302 | biostudies-literature | 2020 Oct

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Implementation of homology based and non-homology based computational methods for the identification and annotation of orphan enzymes: using Mycobacterium tuberculosis H37Rv as a case study.

Sinha Swati S Lynn Andrew M AM Desai Dhwani K DK

BMC bioinformatics 20201019 1

<h4>Background</h4>Homology based methods are one of the most important and widely used approaches for functional annotation of high-throughput microbial genome data. A major limitation of these methods is the absence of well-characterized sequences for certain functions. The non-homology methods based on the context and the interactions of a protein are very useful for identifying missing metabolic activities and functional annotation in the absence of significant sequence similarity. In the cu ...[more]

PMID: 33076816

Similar Datasets

Project description:BACKGROUND: H. sapiens-M. tuberculosis H37Rv protein-protein interaction (PPI) data are essential for understanding the infection mechanism of the formidable pathogen M. tuberculosis H37Rv. Computational prediction is an important strategy to fill the gap in experimental H. sapiens-M. tuberculosis H37Rv PPI data. Homology-based prediction is frequently used in predicting both intra-species and inter-species PPIs. However, some limitations are not properly resolved in several published works that predict eukaryote-prokaryote inter-species PPIs using intra-species template PPIs. RESULTS: We develop a stringent homology-based prediction approach by taking into account (i) differences between eukaryotic and prokaryotic proteins and (ii) differences between inter-species and intra-species PPI interfaces. We compare our stringent homology-based approach to a conventional homology-based approach for predicting host-pathogen PPIs, based on cellular compartment distribution analysis, disease gene list enrichment analysis, pathway enrichment analysis and functional category enrichment analysis. These analyses support the validity of our prediction result, and clearly show that our approach has better performance in predicting H. sapiens-M. tuberculosis H37Rv PPIs. Using our stringent homology-based approach, we have predicted a set of highly plausible H. sapiens-M. tuberculosis H37Rv PPIs which might be useful for many of related studies. Based on our analysis of the H. sapiens-M. tuberculosis H37Rv PPI network predicted by our stringent homology-based approach, we have discovered several interesting properties which are reported here for the first time. We find that both host proteins and pathogen proteins involved in the host-pathogen PPIs tend to be hubs in their own intra-species PPI network. Also, both host and pathogen proteins involved in host-pathogen PPIs tend to have longer primary sequence, tend to have more domains, tend to be more hydrophilic, etc. And the protein domains from both host and pathogen proteins involved in host-pathogen PPIs tend to have lower charge, and tend to be more hydrophilic. CONCLUSIONS: Our stringent homology-based prediction approach provides a better strategy in predicting PPIs between eukaryotic hosts and prokaryotic pathogens than a conventional homology-based approach. The properties we have observed from the predicted H. sapiens-M. tuberculosis H37Rv PPI network are useful for understanding inter-species host-pathogen PPI networks and provide novel insights for host-pathogen interaction studies.

Project description:BackgroundMycobacterium tuberculosis (M.tb) is the causative agent of tuberculosis, killing ~1.7 million people annually. The remarkable capacity of this pathogen to escape the host immune system for decades and then to cause active tuberculosis disease, makes M.tb a successful pathogen. Currently available anti-mycobacterial therapy has poor compliance due to requirement of prolonged treatment resulting in accelerated emergence of drug resistant strains. Hence, there is an urgent need to identify new chemical entities with novel mechanism of action and potent activity against the drug resistant strains.ResultsThis study describes novel computational models developed for predicting inhibitors against both replicative and non-replicative phase of drug-tolerant M.tb under carbon starvation stage. These models were trained on highly diverse dataset of 2135 compounds using four classes of binary fingerprint namely PubChem, MACCS, EState, SubStructure. We achieved the best performance Matthews correlation coefficient (MCC) of 0.45 using the model based on MACCS fingerprints for replicative phase inhibitor dataset. In case of non-replicative phase, Hybrid model based on PubChem, MACCS, EState, SubStructure fingerprints performed better with maximum MCC value of 0.28. In this study, we have shown that molecular weight, polar surface area and rotatable bond count of inhibitors (replicating and non-replicating phase) are significantly different from non-inhibitors. The fragment analysis suggests that substructures like hetero_N_nonbasic, heterocyclic, carboxylic_ester, and hetero_N_basic_no_H are predominant in replicating phase inhibitors while hetero_O, ketone, secondary_mixed_amine are preferred in the non-replicative phase inhibitors. It was observed that nitro, alkyne, and enamine are important for the molecules inhibiting bacilli residing in both the phases. In this study, we introduced a new algorithm based on Matthews correlation coefficient called MCCA for feature selection and found that this algorithm is better or comparable to frequency based approach.ConclusionIn this study, we have developed computational models to predict phase specific inhibitors against drug resistant strains of M.tb grown under carbon starvation. Based on simple molecular properties, we have derived some rules, which would be useful in robust identification of tuberculosis inhibitors. Based on these observations, we have developed a webserver for predicting inhibitors against drug tolerant M.tb H37Rv available at http://crdd.osdd.net/oscadd/mdri/.

Project description:The ability to adapt to different conditions is key for Mycobacterium tuberculosis, the causative agent of tuberculosis (TB), to successfully infect human hosts. Adaptations allow the organism to evade the host immune responses during acute infections and persist for an extended period of time during the latent infectious stage. In latently infected individuals, estimated to include one-third of the human population, the organism exists in a variety of metabolic states, which impedes the development of a simple strategy for controlling or eradicating this disease. Direct knowledge of the metabolic states of M. tuberculosis in patients would aid in the management of the disease as well as in forming the basis for developing new drugs and designing more efficacious drug cocktails. Here, we propose an in silico approach to create state-specific models based on readily available gene expression data. The coupling of differential gene expression data with a metabolic network model allowed us to characterize the metabolic adaptations of M. tuberculosis H37Rv to hypoxia. Given the microarray data for the alterations in gene expression, our model predicted reduced oxygen uptake, ATP production changes, and a global change from an oxidative to a reductive tricarboxylic acid (TCA) program. Alterations in the biomass composition indicated an increase in the cell wall metabolites required for cell-wall growth, as well as heightened accumulation of triacylglycerol in preparation for a low-nutrient, low metabolic activity life style. In contrast, the gene expression program in the deletion mutant of dosR, which encodes the immediate hypoxic response regulator, failed to adapt to low-oxygen stress. Our predictions were compatible with recent experimental observations of M. tuberculosis activity under hypoxic and anaerobic conditions. Importantly, alterations in the flow and accumulation of a particular metabolite were not necessarily directly linked to differential gene expression of the enzymes catalyzing the related metabolic reactions.

Dataset Information

Implementation of homology based and non-homology based computational methods for the identification and annotation of orphan enzymes: using Mycobacterium tuberculosis H37Rv as a case study.

Background

Results

Conclusions

Publications

Implementation of homology based and non-homology based computational methods for the identification and annotation of orphan enzymes: using Mycobacterium tuberculosis H37Rv as a case study.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets