Project description:The human sulfatase family has 17 members, 13 of which have been characterized biochemically. These enzymes specifically hydrolyze sulfate esters in glycosaminoglycans, sulfolipids, or steroid sulfates, thereby playing key roles in cellular degradation, cell signaling, and hormone regulation. The loss of sulfatase activity has been linked to severe pathophysiological conditions such as lysosomal storage disorders, developmental abnormalities, or cancer. A novel member of this family, arylsulfatase K (ARSK), was identified bioinformatically through its conserved sulfatase signature sequence directing posttranslational generation of the catalytic formylglycine residue in sulfatases. However, overall sequence identity of ARSK with other human sulfatases is low (18-22%). Here we demonstrate that ARSK indeed shows desulfation activity toward arylsulfate pseudosubstrates. When expressed in human cells, ARSK was detected as a 68-kDa glycoprotein carrying at least four N-glycans of both the complex and high-mannose type. Purified ARSK turned over p-nitrocatechol and p-nitrophenyl sulfate. This activity was dependent on cysteine 80, which was verified to undergo conversion to formylglycine. Kinetic parameters were similar to those of several lysosomal sulfatases involved in degradation of sulfated glycosaminoglycans. An acidic pH optimum (~4.6) and colocalization with LAMP1 verified lysosomal functioning of ARSK. Further, it carries mannose 6-phosphate, indicating lysosomal sorting via mannose 6-phosphate receptors. ARSK mRNA expression was found in all tissues tested, suggesting a ubiquitous physiological substrate and a so far non-classified lysosomal storage disorder in the case of ARSK deficiency, as shown before for all other lysosomal sulfatases.
Project description:BackgroundAccurate computational identification of eukaryotic gene organization is a long-standing problem. Despite the fundamental importance of precise annotation of genes encoded in newly sequenced genomes, the accuracy of predicted gene structures has not been critically evaluated, mostly due to the scarcity of proper assessment methods.ResultsWe present a gene-structure-aware multiple sequence alignment method for gene prediction using amino acid sequences translated from homologous genes from many genomes. The approach provides rich information concerning the reliability of each predicted gene structure. We have also devised an iterative method that attempts to improve the structures of suspiciously predicted genes based on a spliced alignment algorithm using consensus sequences or reliable homologs as templates. Application of our methods to cytochrome P450 and ribosomal proteins from 47 plant genomes indicated that 50 ~ 60 % of the annotated gene structures are likely to contain some defects. Whereas more than half of the defect-containing genes may be intrinsically broken, i.e. they are pseudogenes or gene fragments, located in unfinished sequencing areas, or corresponding to non-productive isoforms, the defects found in a majority of the remaining gene candidates can be remedied by our iterative refinement method.ConclusionsRefinement of eukaryotic gene structures mediated by gene-structure-aware multiple protein sequence alignment is a useful strategy to dramatically improve the overall prediction quality of a set of homologous genes. Our method will be applicable to various families of protein-coding genes if their domain structures are evolutionarily stable. It is also feasible to apply our method to gene families from all kingdoms of life, not just plants.
Project description:OBJECTIVE:To describe the genetic variants in the ARSA gene in Sri Lankan patients with metachromatic leukodystrophy (MLD). As the variant profile of MLD in the Sri Lankan population is currently unknown. RESULTS:Twenty patients from eighteen Sri Lankan families were screened for ARSA gene mutations. We found 13 different genetic variants of these three were novel. The three novel variants were p.Asp281Asn, p.Asp283Asn, p.Ala344Asp. Seven patients out of 20 were also positive for the pseudodeficiency (PD) allele c.1049A>G (p.Asn350Ser). This is the first report to describe the molecular genetic variants of Sri Lankan patients with MLD.
Project description:Thermostability issue of protein point mutations is a common occurrence in protein engineering. An application which predicts the thermostability of mutants can be helpful for guiding decision making process in protein design via mutagenesis. An in silico point mutation scanning method is frequently used to find "hot spots" in proteins for focused mutagenesis. ProTherm (http://gibk26.bio.kyutech.ac.jp/jouhou/Protherm/protherm.html) is a public database that consists of thousands of protein mutants' experimentally measured thermostability. Two data sets based on two differently measured thermostability properties of protein single point mutations, namely the unfolding free energy change (ddG) and melting temperature change (dTm) were obtained from this database. Folding free energy change calculation from Rosetta, structural information of the point mutations as well as amino acid physical properties were obtained for building thermostability prediction models with informatics modeling tools. Five supervised machine learning methods (support vector machine, random forests, artificial neural network, naïve Bayes classifier, K nearest neighbor) and partial least squares regression are used for building the prediction models. Binary and ternary classifications as well as regression models were built and evaluated. Data set redundancy and balancing, the reverse mutations technique, feature selection, and comparison to other published methods were discussed. Rosetta calculated folding free energy change ranked as the most influential features in all prediction models. Other descriptors also made significant contributions to increasing the accuracy of the prediction models.
Project description:Hydrophobic-Polar model is a simplified representation of Protein Structure Prediction (PSP) problem. However, even with the HP model, the PSP problem remains NP-complete. This work proposes a systematic and problem specific design for operators of the evolutionary program which hybrids with local search hill climbing, to efficiently explore the search space of PSP and thereby obtain an optimum conformation. The proposed algorithm achieves this by incorporating the following novel features: (i) new initialization method which generates only valid individuals with (rather than random) better fitness values; (ii) use of probability-based selection operators that limit the local convergence; (iii) use of secondary structure based mutation operator that makes the structure more closely to the laboratory determined structure; and (iv) incorporating all the above-mentioned features developed a complete two-tier framework. The developed framework builds the protein conformation on the square and triangular lattice. The test has been performed using benchmark sequences, and a comparative evaluation is done with various state-of-the-art algorithms. Moreover, in addition to hypothetical test sequences, we have tested protein sequences deposited in protein database repository. It has been observed that the proposed framework has shown superior performance regarding accuracy (fitness value) and speed (number of generations needed to attain the final conformation). The concepts used to enhance the performance are generic and can be used with any other population-based search algorithm such as genetic algorithm, ant colony optimization, and immune algorithm.
Project description:Three-dimensional structures of RNA-protein complexes are crucial for understanding their diverse functions. However, the number of the RNA-protein complex structures solved by experiments is still limited at present. To solve this problem, some computational protocols have been proposed to predict three-dimensional RNA-protein complex structures. But the prediction accuracies of these protocols are lower. The reason may be that these protocols don't fully incorporate the features of RNA-protein interfaces. Here we propose a novel computational protocol for three-dimensional RNA-protein complex structure prediction, 3dRPC, which applies new schemes to the discreteness of molecule and charge in docking algorithm and the construction of the reference state in scoring function in order to take account of the features of RNA-protein interfaces. This protocol achieves a high accuracy comparable to the well-developed algorithms for three-dimensional structure prediction of protein-protein complexes when tested on a RNA-protein docking benchmark.
Project description:BackgroundMucopolysaccharidoses (MPS) are monogenic metabolic disorders that significantly affect the skeleton. Eleven enzyme defects in the lysosomal degradation of glycosaminoglycans (GAGs) have been assigned to the known MPS subtypes (I-IX). Arylsulfatase K (ARSK) is a recently characterised lysosomal hydrolase involved in GAG degradation that removes the 2-O-sulfate group from 2-sulfoglucuronate. Knockout of Arsk in mice was consistent with mild storage pathology, but no human phenotype has yet been described.MethodsIn this study, we report four affected individuals of two unrelated consanguineous families with homozygous variants c.250C>T, p.(Arg84Cys) and c.560T>A, p.(Leu187Ter) in ARSK, respectively. Functional consequences of the two ARSK variants were assessed by mutation-specific ARSK constructs derived by site-directed mutagenesis, which were ectopically expressed in HT1080 cells. Urinary GAG excretion was analysed by dimethylene blue and electrophoresis, as well as liquid chromatography/mass spectrometry (LC-MS)/MS analysis.ResultsThe phenotypes of the affected individuals include MPS features, such as short stature, coarse facial features and dysostosis multiplex. Reverse phenotyping in two of the four individuals revealed additional cardiac and ophthalmological abnormalities. Mild elevation of dermatan sulfate was detected in the two subjects investigated by LC-MS/MS. Human HT1080 cells expressing the ARSK-Leu187Ter construct exhibited absent protein levels by western blot, and cells with the ARSK-Arg84Cys construct showed markedly reduced enzyme activity in an ARSK-specific enzymatic assay against 2-O-sulfoglucuronate-containing disaccharides as analysed by C18-reversed-phase chromatography followed by MS.ConclusionOur work provides a detailed clinical and molecular characterisation of a novel subtype of mucopolysaccharidosis, which we suggest to designate subtype X.
Project description:Proteins have evolved to use water to help guide folding. A physically motivated, nonpairwise-additive model of water-mediated interactions added to a protein structure prediction Hamiltonian yields marked improvement in the quality of structure prediction for larger proteins. Free energy profile analysis suggests that long-range water-mediated potentials guide folding and smooth the underlying folding funnel. Analyzing simulation trajectories gives direct evidence that water-mediated interactions facilitate native-like packing of supersecondary structural elements. Long-range pairing of hydrophilic groups is an integral part of protein architecture. Specific water-mediated interactions are a universal feature of biomolecular recognition landscapes in both folding and binding.
Project description:Pyrazinamide plays an important role in tuberculosis treatment; however, its use is complicated by side-effects and challenges with reliable drug susceptibility testing. Resistance to pyrazinamide is largely driven by mutations in pyrazinamidase (pncA), responsible for drug activation, but genetic heterogeneity has hindered development of a molecular diagnostic test. We proposed to use information on how variants were likely to affect the 3D structure of pncA to identify variants likely to lead to pyrazinamide resistance. We curated 610 pncA mutations with high confidence experimental and clinical information on pyrazinamide susceptibility. The molecular consequences of each mutation on protein stability, conformation, and interactions were computationally assessed using our comprehensive suite of graph-based signature methods, mCSM. The molecular consequences of the variants were used to train a classifier with an accuracy of 80%. Our model was tested against internationally curated clinical datasets, achieving up to 85% accuracy. Screening of 600 Victorian clinical isolates identified a set of previously unreported variants, which our model had a 71% agreement with drug susceptibility testing. Here, we have shown the 3D structure of pncA can be used to accurately identify pyrazinamide resistance mutations. SUSPECT-PZA is freely available at: http://biosig.unimelb.edu.au/suspect_pza/.
Project description:Hearing loss is a heterogeneous disorder. Identification of causative mutations is demanding due to genetic heterogeneity. In this study, we investigated the genetic cause of sensorineural hearing loss in patients with severe/profound deafness. After the exclusion of GJB2-GJB6 mutations, we performed whole exome sequencing in 32 unrelated Argentinean families. Mutations were detected in 16 known deafness genes in 20 patients: ACTG1, ADGRV1 (GPR98), CDH23, COL4A3, COL4A5, DFNA5 (GSDDE), EYA4, LARS2, LOXHD1, MITF, MYO6, MYO7A, TECTA, TMPRSS3, USH2A and WSF1. Notably, 11 variants affecting 9 different non-GJB2 genes resulted novel: c.12829C > T, p.(Arg4277*) in ADGRV1; c.337del, p.(Asp109*) and c.3352del, p.(Gly1118Alafs*7) in CDH23; c.3500G > A, p.(Gly1167Glu) in COL4A3; c.1183C > T, p.(Pro395Ser) and c.1759C > T, p.(Pro587Ser) in COL4A5; c.580 + 2 T > C in EYA4; c.1481dup, p.(Leu495Profs*31) in LARS2; c.1939 T > C, p.(Phe647Leu), in MYO6; c.733C > T, p.(Gln245*) in MYO7A and c.242C > G, p.(Ser81*) in TMPRSS3 genes. To predict the effect of these variants, novel protein modeling and protein stability analysis were employed. These results highlight the value of whole exome sequencing to identify candidate variants, as well as bioinformatic strategies to infer their pathogenicity.