Project description:MudPIT datasets for NaOH-extracted and Salt-and-Detergent (SD)-extracted rat nuclear membranes were obtained from skeletal muscle and liver tissue as described in the following references:
Wilkie GS, Korfali N, Swanson SK, Malik P, Srsen V, Batrakou DG, de las Heras J, Zuleger N, Kerr AR, Florens L, Schirmer EC. Several novel nuclear envelope transmembrane proteins identified in skeletal muscle have cytoskeletal associations Mol Cell Proteomics 2011 Jan;10(1):M110003129
Korfali N, Wilkie GS, Swanson SK, Srsen V, de Las Heras J, Batrakou DG, Malik P, Zuleger N, Kerr AR, Florens L, Schirmer EC. The nuclear envelope proteome differs notably between tissues Nucleus 2012 Nov-Dec;3(6):552-64
The trypsin-digested MS datasets from these studies were searched again using ProLuCID against a database containing tissue-specific junction sequences as such:
We analyzed RNA-Seq data from five rat tissues (heart, skeletal muscle, liver, brain, and testes) produced by two separate research groups, hereby referred to as GSE4 (Yu, Y., J.C. Fuscoe, C. Zhao, C. Guo, M. Jia, T. Qing, et al., A rat RNA-Seq transcriptomic BodyMap across 11 organs and 4 developmental stages. Nat Commun, 2014; 5: 3230.) and GSE5 (Merkin, J., C. Russell, P. ChenC.B. Burge, Evolutionary dynamics of gene and isoform regulation in Mammalian tissues. Science, 2012; 338(6114): 1593-9.)
GSE4 has higher read coverage over junctions, but only one suitable sample per tissue while GSE5 has lower coverage but many replicates. Novel junctions identified in both datasets were considered to be high confidence. Differentially spliced transcripts were determined and quantified using Modeling Alternative Junction Inclusion Quantification (MAJIQ) software. Splice junctions rather than whole isoforms were quantified due to the known limitations in reconstructing whole transcripts from short read data. The software builds splice graphs from RNA-Seq reads spanning spliced junctions, incorporates un-annotated junctions, and uses information from replicate samples to build a Bayesian posterior distribution, outputting the expected percentage spliced in values (E|PSI|) corresponding to the percentage of transcripts that are expected to contain the given splice junction or in the case of a comparison, change in PSI (E|dPSI). In accordance with previous literature, we considered an alternative splicing event between tissues when E|dPSI greater than 0.2.
Tissue-specific junction sequence databases were generated for rat muscle and liver tissues. In each case the union of all splice junctions detected by STAR aligner for each replicate from GSE4 and GSE5 was taken. Junctions were filtered to remove junctions supported by less than 6 reads and junctions predicting an intron length of less than 60 nucleotides (a standard cut-off as there are very few smaller annotated junctions). Junction coordinates were extended by 66 nucleotides in both directions, and then translated in three frames according to the directionality of the gene. This produced peptide sequences of about 44 amino acids in length. Sequences were removed if the translation produced a stop codon before the junction based on standard practice. Novel exons and intron retentions predicted by MAJIQ were similarly translated in three frames and added to the database, but were removed if the translated sequence was less than 7 amino acids long. All novel and annotated junctions were combined with novel exons, intron retentions, and Ensembl rat protein database sequences to produce a final protein sequence database. Finally, coordinates of junction peptides were checked against the original fasta sequence to be sure that the peptide crossed the junction with a start less or equal to 22 amino acids away and an end greter or equal to 22 amino acids away.
2018-01-18 | MSV000081951 | MassIVE