Project description:Construction of a comprehensive spectral library for the coral reef fish, Acanthochromis polyacanthus, from both DIA and DDA MS runs. The spectral library was then used to quantify proteomes of individual fish exposed to different environmental conditions including ocean acidification and ocean warming. Proteomes were measured for both liver and brain tissue and differential expression between environmental conditions was analyzed.
Project description:Objective: To understand the transcriptome data of liver fibrosis induced by CCl4 in mice.Methods: C57BL/6 mice were used to duplicate the liver fibrosis model induced by CCl-4.After successful replication, mouse RNA was extracted for subsequent experiments.The basic principle of RNA extraction is to prevent RNA degradation during extraction, maximize RNA extraction efficiency, and ensure the extraction of high-quality RNA with good integrity and purity from target samples.According to the species (such as animal sample or plant sample) and sample status (such as animal tissue or animal cell) of the sample, select the appropriate extraction reagent and corresponding extraction method.The detection of high-quality RNA in samples is the basis for the success of the whole project. In order to ensure the accuracy of sequencing data, we performed quality inspection on samples, and the library construction can only be performed after the detection results meet the requirements for sequencing library construction.Sample test results are shown in the quality inspection report.After the library construction samples passed the test, the mRNA of eukaryotes was enriched with magnetic beads with Oligo (dT).Subsequently fragmentation buffer was added to randomly interrupt the mRNA.First-strand cDNA was synthesized using mRNA as a template with a six-base random primer (random hexamers), followed by buffer, dNTPs, and DNA polymerase I to synthesize the second-strand cDNA, which was subsequently purified using AMPure XP beads.Purified double-stranded cDNA was then end-repaired, A-tailed and ligated with sequencing adapters, followed by fragment size selection with AMPure XP beads, and finally PCR enrichment to obtain the final cDNA library.After library construction, insert size and effective concentration of the library were tested to ensure library quality.After passing the library test, different libraries were pooling according to the amount of data under the target and sequenced on the machine.RESULTS: RNA-seq data of CCL4-induced liver fibrosis model in mice were finally obtained in our experiment, which provided a molecular biological basis for our future study of liver fibrosis in mice.
Project description:BACKGROUND:Discovering single nucleotide polymorphisms (SNPs) from agriculture crop genome sequences has been a widely used strategy for developing genetic markers for several applications including marker-assisted breeding, population diversity studies for eco-geographical adaption, genotyping crop germplasm collections, and others. Accurately detecting SNPs from large polyploid crop genomes such as wheat is crucial and challenging. A few variant calling methods have been previously developed but they show a low concordance between their variant calls. A gold standard of variant sets generated from one human individual sample was established for variant calling tool evaluations, however hitherto no gold standard of crop variant set is available for wheat use. The intent of this study was to evaluate seven SNP variant calling tools (FreeBayes, GATK, Platypus, Samtools/mpileup, SNVer, VarScan, VarDict) with the two most popular mapping tools (BWA-mem and Bowtie2) on wheat whole exome capture (WEC) re-sequencing data from allohexaploid wheat. RESULTS:We found the BWA-mem mapping tool had both a higher mapping rate and a higher accuracy rate than Bowtie2. With the same mapping quality (MQ) cutoff, BWA-mem detected more variant bases in mapping reads than Bowtie2. The reads preprocessed with quality trimming or duplicate removal did not significantly affect the final mapping performance in terms of mapped reads. Based on the concordance and receiver operating characteristic (ROC), the Samtools/mpileup variant calling tool with BWA-mem mapping of raw sequence reads outperformed other tests followed by FreeBayes and GATK in terms of specificity and sensitivity. VarDict and VarScan were the poorest performing variant calling tools with the wheat WEC sequence data. CONCLUSION:The BWA-mem and Samtools/mpileup pipeline, with no need to preprocess the raw read data before mapping onto the reference genome, was ascertained the optimum for SNP calling for the complex wheat genome re-sequencing. These results also provide useful guidelines for reliable variant identification from deep sequencing of other large polyploid crop genomes.
Project description:BACKGROUND: Solexa/Illumina short-read ultra-high throughput DNA sequencing technology produces millions of short tags (up to 36 bases) by parallel sequencing-by-synthesis of DNA colonies. The processing and statistical analysis of such high-throughput data poses new challenges; currently a fair proportion of the tags are routinely discarded due to an inability to match them to a reference sequence, thereby reducing the effective throughput of the technology. RESULTS: We propose a novel base calling algorithm using model-based clustering and probability theory to identify ambiguous bases and code them with IUPAC symbols. We also select optimal sub-tags using a score based on information content to remove uncertain bases towards the ends of the reads. CONCLUSION: We show that the method improves genome coverage and number of usable tags as compared with Solexa's data processing pipeline by an average of 15%. An R package is provided which allows fast and accurate base calling of Solexa's fluorescence intensity files and the production of informative diagnostic plots.
Project description:Genotype calling plays important roles in population-genomic studies, which have been greatly accelerated by sequencing technologies. To take full advantage of the resultant information, we have developed maximum-likelihood (ML) methods for calling genotypes from high-throughput sequencing data. As the statistical uncertainties associated with sequencing data depend on depths of coverage, we have developed two types of genotype callers. One approach is appropriate for low-coverage sequencing data, and incorporates population-level information on genotype frequencies and error rates pre-estimated by an ML method. Performance evaluation using computer simulations and human data shows that the proposed framework yields less biased estimates of allele frequencies and more accurate genotype calls than current widely used methods. Another type of genotype caller applies to high-coverage sequencing data, requires no prior genotype-frequency estimates, and makes no assumption on the number of alleles at a polymorphic site. Using computer simulations, we determine the depth of coverage necessary to accurately characterize polymorphisms using this second method. We applied the proposed method to high-coverage (mean 18×) sequencing data of 83 clones from a population of Daphnia pulex The results show that the proposed method enables conservative and reasonably powerful detection of polymorphisms with arbitrary numbers of alleles. We have extended the proposed method to the analysis of genomic data for polyploid organisms, showing that calling accurate polyploid genotypes requires much higher coverage than diploid genotypes.
Project description:To optimize the genome annotation, four tissue RNA libraries (i.e. heart, liver, lung and kidney) were constructed using the Illumina mRNA-Seq Prep Kit
Project description:To optimize the genome annotation, nine tissue and one pool RNA libraries (i.e. heart, liver, spleen, lung, kidney, muscle, fat, ovary, pool.) were constructed using the Illumina mRNA-spleeneq Prep Kit
Project description:We present a statistical framework for estimation and application of sample allele frequency spectra from New-Generation Sequencing (NGS) data. In this method, we first estimate the allele frequency spectrum using maximum likelihood. In contrast to previous methods, the likelihood function is calculated using a dynamic programming algorithm and numerically optimized using analytical derivatives. We then use a bayesian method for estimating the sample allele frequency in a single site, and show how the method can be used for genotype calling and SNP calling. We also show how the method can be extended to various other cases including cases with deviations from Hardy-Weinberg equilibrium. We evaluate the statistical properties of the methods using simulations and by application to a real data set.
Project description:Whole-exome sequencing is an attractive alternative to microarray analysis because of the low cost and potential ability to detect copy number variations (CNV) of various sizes (from 1-2 exons to several Mb). Previous comparison of the most popular CNV calling tools showed a high portion of false-positive calls. Moreover, due to a lack of a gold standard CNV set, the results are limited and incomparable. Here, we aimed to perform a comprehensive analysis of tools capable of germline CNV calling available at the moment using a single CNV standard and reference sample set. Compiling variants from previous studies with Bayesian estimation approach, we constructed an internal standard for NA12878 sample (pilot National Institute of Standards and Technology Reference Material) including 110,050 CNV or non-CNV exons. The standard was used to evaluate the performance of 16 germline CNV calling tools on the NA12878 sample and 10 correlated exomes as a reference set with respect to length distribution, concordance, and efficiency. Each algorithm had a certain range of detected lengths and showed low concordance with other tools. Most tools are focused on detection of a limited number of CNVs one to seven exons long with a false-positive rate below 50%. EXCAVATOR2, exomeCopy, and FishingCNV focused on detection of a wide range of variations but showed low precision. Upon unified comparison, the tools were not equivalent. The analysis performed allows choosing algorithms or ensembles of algorithms most suitable for a specific goal, e.g. population studies or medical genetics.