Project description:We have performed a Proteogenomics meta-analysis of data sets deposited in ProteomeXchange: PXD000265, PXD000313, PXD000923, PXD001030, PXD001058, PXD002291, PXD002739, PXD002740 and PXD003156 and using 29 RNA-Seq data sets on rice (Oryza sativa). We created a search database comprising translated reads that had been mapped onto the rice genome, as well as officially annotated rice proteins sequences. The RNA Seq database was pre-processed to identify “novel transcripts” for those not mapping fully to an existing exon, and “novel junctions” for those reads mapped with a gap, implying a potential novel splice site that was not annotated in the official gene set. Confidentially identified “novel peptides” i.e. those mapping to a novel junction or novel transcript were post-processed to ensure that there were no other better explanations for the corresponding spectra e.g. peptide from a canonical gene with a modification or amino acid substitution. Data were exported from the pipeline in PSI mzIdentML 1.2 format, containing chromosomal coordinates, and further converted to PSI proBed format for genome visualisation. Novel peptides were searched against other plant databases using BLAST to see if they had predicted in genes from other species. A total of 1584 novel peptides were identified, mapping to ~700 genomic loci in which either new genes have been predicted (~100) or updates to existing gene models have been predicted (~600).