Project description:Animal toxins are of interest to a wide range of scientists, due to their numerous applications in pharmacology, neurology, hematology, medicine, and drug research. This, and to a lesser extent the development of new performing tools in transcriptomics and proteomics, has led to an increase in toxin discovery. In this context, providing publicly available data on animal toxins has become essential. The UniProtKB/Swiss-Prot Tox-Prot program (http://www.uniprot.org/program/Toxins) plays a crucial role by providing such an access to venom protein sequences and functions from all venomous species. This program has up to now curated more than 5000 venom proteins to the high-quality standards of UniProtKB/Swiss-Prot (release 2012_02). Proteins targeted by these toxins are also available in the knowledgebase. This paper describes in details the type of information provided by UniProtKB/Swiss-Prot for toxins, as well as the structured format of the knowledgebase.
Project description:BackgroundThe accuracy of protein 3D structure prediction has been dramatically improved with the help of advances in deep learning. In the recent CASP14, Deepmind demonstrated that their new version of AlphaFold (AF) produces highly accurate 3D models almost close to experimental structures. The success of AF shows that the multiple sequence alignment of a sequence contains rich evolutionary information, leading to accurate 3D models. Despite the success of AF, only the prediction code is open, and training a similar model requires a vast amount of computational resources. Thus, developing a lighter prediction model is still necessary.ResultsIn this study, we propose a new protein 3D structure modeling method, A-Prot, using MSA Transformer, one of the state-of-the-art protein language models. An MSA feature tensor and row attention maps are extracted and converted into 2D residue-residue distance and dihedral angle predictions for a given MSA. We demonstrated that A-Prot predicts long-range contacts better than the existing methods. Additionally, we modeled the 3D structures of the free modeling and hard template-based modeling targets of CASP14. The assessment shows that the A-Prot models are more accurate than most top server groups of CASP14.ConclusionThese results imply that A-Prot accurately captures the evolutionary and structural information of proteins with relatively low computational cost. Thus, A-Prot can provide a clue for the development of other protein property prediction methods.
Project description:Using integrated proteomic and RNA sequencing analysis of COPD and control lung tissues, we identified molecular signatures in COPD.
Project description:<p>Recently, significant progress has been made in characterizing and sequencing the genomic alterations in statistically robust numbers of samples from several types of cancer. For example, The Cancer Genome Atlas (TCGA), International Cancer Genome Consortium (ICGC) and other similar efforts are identifying genomic alterations associated with specific cancers (e.g., copy number aberrations, rearrangements, point mutations, epigenomic changes, etc.). The availability of these multi-dimensional data to the scientific community sets the stage for the development of new molecularly targeted cancer interventions. Understanding the comprehensive functional changes in cancer proteomes arising from genomic alterations and other factors is the next logical step in the development of high-value candidate protein biomarkers. Hence, proteomics can greatly advance the understanding of molecular mechanisms of disease pathology via the analysis of changes in protein expression, their modifications and variations, as well as protein-protein interaction, signaling pathways and networks responsible for cellular functions such as apoptosis and oncogenesis.</p> <p>Realizing this great potential, the NCI launched the second phase of the CPTC initiative in September 2011. Renamed the Clinical Proteomic Tumor Analysis Consortium, CPTAC is beginning to leverage its analytical outputs from Phase I to define cancer proteomes on genomically-characterized biospecimens. The purpose of this integrative approach is to provide the broad scientific community with knowledge that links genotype to proteotype and ultimately phenotype.</p> <p>The data contained in this dataset are derived from samples designed to confirm CPTAC findings from the TCGA samples. These confirmatory samples contain breast, ovarian, colon, and lung tumors collected via a protocol optimized for proteomics. Specifically, ischemic time of the sample was controlled and restricted to less than 30 minutes.</p> <p>ACGT, Inc. produced whole exome, mRNAseq, and miRNAseq for these samples. Corresponding proteomic data are available at: <a href="https://cptac-data-portal.georgetown.edu/cptacPublic/">https://cptac-data-portal.georgetown.edu/cptacPublic/</a></p> <p>The study design was to profile colon, breast, ovarian, and lung tumors both genomically and proteomically. Germline DNA was obtained from blood. Normal control samples for proteomics varied by organ site: adjacent colon tissue for colon cases, contralateral breast tissue for some breast cases, and Fallopian tube fimbria for some ovarian cases. Lung cases had no normal control for proteomic analysis. All cancer samples were derived from primary and untreated tumors.</p>
Project description:Clinical FFPE tissue proteomic analyses were performed for early lung adenocarcinomas including adenocarcinoma in-situ (AIS), minimally invasive adenocarcinoma (MIA) and lepidic predominant invasive adenocarcinoma (LPA).
Project description:The ever-growing global health threat of antibiotic resistance is compelling researchers to explore alternatives to conventional antibiotics. Antimicrobial peptides (AMPs) are emerging as a promising solution to fill this need. Naturally occurring AMPs are produced by all forms of life as part of the innate immune system. High-throughput bioinformatics tools have enabled fast and large-scale discovery of AMPs from genomic, transcriptomic, and proteomic resources of selected organisms. Public protein sequence databases, comprising over 200 million records and growing, serve as comprehensive compendia of sequences from a broad range of source organisms. Yet, large-scale in silico probing of those databases for novel AMP discovery using modern deep learning techniques has rarely been reported. In the present study, we propose an AMP mining workflow to predict novel AMPs from the UniProtKB/Swiss-Prot database using the AMP prediction tool, AMPlify, as its discovery engine. Using this workflow, we identified 8008 novel putative AMPs from all eukaryotic sequences in the database. Focusing on the practical use of AMPs as suitable antimicrobial agents with applications in the poultry industry, we prioritized 40 of those AMPs based on their similarities to known chicken AMPs in predicted structures. In our tests, 13 out of the 38 successfully synthesized peptides showed antimicrobial activity against Escherichia coli and/or Staphylococcus aureus. AMPlify and the companion scripts supporting the AMP mining workflow presented herein are publicly available at https://github.com/bcgsc/AMPlify.