Project description:Microbiome sequencing model is a Named Entity Recognition (NER) model that identifies and annotates microbiome nucleic acid sequencing method or platform in texts. This is the final model version used to annotate metagenomics publications in Europe PMC and enrich metagenomics studies in MGnify with sequencing metadata from literature. For more information, please refer to the following blogs: http://blog.europepmc.org/2020/11/europe-pmc-publications-metagenomics-annotations.html https://www.ebi.ac.uk/about/news/service-news/enriched-metadata-fields-mgnify-based-text-mining-associated-publications
Project description:NGPS is a method for de-novo, full-length protein sequencing in high throughput. The method is based on cleavage of the protein at semi-random sites by microwave-assisted acid hydrolysis (MAAH), enrichment of LC-MS/MS amenable peptides from the hydrolysate by solid-phase-extraction, LC-MS/MS analysis, de-novo long peptide tag sequencing of resulting peptides and assembly of peptide tags into consensus contigs.
Project description:P53 mutation is closely associated with the occurrence and progression of colon cancer. In this project, we did crotonylomics sequencing by using human colon cancer homologous cell line pair-HCT116+/+(with wild type p53) and HCT116-/- (with null p53). Crotonylomics sequencing results showed that p53 deficiency regulated crotonylation of non-histone proteins.
Project description:We selected humann intervertebral disc samples to perform proteomics analysis. There were 1 case of grade I , 1 case of grade II, 3 cases of grade Ⅲ and 3 cases of grade Ⅳ according to Pfirrmann classfication. RNA seqencing analysis and single-cell RNA sequencing were integrated with proteomics data to identify the hub genes for intervertebral disc degeneration using bioinformatic method.
Project description:Dependent on concise, pre-defined protein sequence databases, traditional search algorithms perform poorly when analyzing mass spectra derived from wholly uncharacterized protein products. Conversely, de novo peptide sequencing algorithms can interpret mass spectra without relying on reference databases. However, such algorithms have been difficult to apply to complex protein mixtures, in part due to a lack of methods for automatically validating de novo sequencing results. Here, we present novel metrics for benchmarking de novo sequencing algorithm performance on large scale proteomics datasets, and present a method for accurately calibrating false discovery rates on de novo results. We also present a novel algorithm (LADS) which leverages experimentally disambiguated fragmentation spectra to boost sequencing accuracy and sensitivity. LADS improves sequencing accuracy on longer peptides relative to other algorithms and improves discriminability of correct and incorrect sequences. Using these advancements, we demonstrate accurate de novo identification of peptide sequences not identifiable using database search-based approaches.