Project description:Purpose: In order to understand the functional significance of sperm transcriptome in stallion fertility, the aim of this study was to generate a detailed body of knowledge about the sperm RNA profile that defines a normal fertile stallion. Methods: The 50 bp single-end ABI SOLiD raw reads were directly aligned with the horse reference sequence EcuCab2 using ABI aligner software (NovoalignCS version 1.00.09, novocraft.com) which uses multiple indexes in the reference genome, identifies candidate alignment locations for each primary read, and allows completion of the alignment. Results: Next generation sequencing (NGS) of total RNA from the sperm of two reproductively normal stallions generated about 70 million raw reads and more than 3 Gb of sequence per sample; over half of these aligned with the EcuCab2 reference genome. Altogether, 19,257 sequence tags with average coverage ?1 (normalized number of transcripts) were mapped in the horse genome. Conclusion: The sequence of stallion sperm transcriptome is an important foundation for the discovery of transcripts of known and novel genes, and non-coding RNAs, thus improving the annotation of the horse genome sequence draft and providing markers for evaluating stallion fertility. Reproductively fertile Stallion sperm transcriptome as revealed by RNA sequencing
Project description:Purpose: In order to understand the functional significance of sperm transcriptome in stallion fertility, the aim of this study was to generate a detailed body of knowledge about the sperm RNA profile that defines a normal fertile stallion. Methods: The 50 bp single-end ABI SOLiD raw reads were directly aligned with the horse reference sequence EcuCab2 using ABI aligner software (NovoalignCS version 1.00.09, novocraft.com) which uses multiple indexes in the reference genome, identifies candidate alignment locations for each primary read, and allows completion of the alignment. Results: Next generation sequencing (NGS) of total RNA from the sperm of two reproductively normal stallions generated about 70 million raw reads and more than 3 Gb of sequence per sample; over half of these aligned with the EcuCab2 reference genome. Altogether, 19,257 sequence tags with average coverage ≥1 (normalized number of transcripts) were mapped in the horse genome. Conclusion: The sequence of stallion sperm transcriptome is an important foundation for the discovery of transcripts of known and novel genes, and non-coding RNAs, thus improving the annotation of the horse genome sequence draft and providing markers for evaluating stallion fertility.
Project description:Mass spectrometry-based shotgun proteomics is currently based on assigning matches between mass-spectra of protein fragments resulting from protease digestion and amino acid sequences predicted from nucleic acid sequences. At the same time, the method lacks reliability in identification of every single amino acid of proteins proteome-wide. We proposed a way to interpret shotgun proteomics results, specifically in data-dependent acquisition mode, as a protein sequence coverage by multiple reads, just as it is done in the field of nucleic acid sequencing for the calling of single nucleotide variants. Multiple reads for each letter in the proteome could be provided by overlapping distinct peptides, which confirm the presence of certain amino acid residues in the overlapping stretch with much lower false discovery rate than conventional 1%. These overlapping distinct peptides were, first, miscleaved tryptic peptides in combination with their properly cleaved counterparts, and, second, the peptides generated by several proteases with different specificities after digestion of the same specimen and analyzed separately. We illustrated this approach using publicly available multiprotease proteomic datasets and in-home data for HEK-293 cell line subproteomes obtained using trypsin, LysC and GluC proteases. A general coverage of proteome in exemplary datasets, even with a single read, was 20-30% at 5-8 thousand protein groups identified. Inside this percentage, 5-7% of the whole proteome were covered at least two-fold and, thus, identified with increased reliability. Of 36 single amino acid variants identified in the HEK-293 cell line, seven variants were covered at least two-fold. The sequence coverage by multiple reads may be further increased with gain in proteome depth and the number of multiple proteases used.
2021-12-07 | MSV000088536 | MassIVE
Project description:Low coverage WGS shotgun sequencing of Myotis
Project description:Intent of the experiment: evaluate whether copy number gains and losses occur throughout the processing of passaging, i.e. test the genomic stability of the patient-derived xenograft models. DNA was extracted from frozen xenograft samples of different passages. KAPA DNA Library Preparation Kit was used to prepare DNA libraries, which were sequenced at low coverage on a HiSeq2000 (Illumina) with a V3 flowcell generating 50bp reads. Raw reads were aligned to the human reference genome version hg19 with Burrows-Wheeler Aligner software package and after duplicate removal further analyzed with QDNAseq to exclude known regions with low mapping quality, correct for the genomic wave and to count the reads per bin. Binned data were further segmented with the ASCAT (Allele-Specific Copy number Analysis of Tumours) algorithm.