Project description:Genome graphs, including the recently released draft human pangenome graph, can represent the breadth of genetic diversity and thus transcend the limits of traditional linear reference genomes. However, there are no genome-graph-compatible tools for analyzing whole genome bisulfite sequencing (WGBS) data. To close this gap, we introduce methylGrapher, a tool tailored for accurate DNA methylation analysis by mapping WGBS data to a genome graph. Notably, methylGrapher can reconstruct methylation patterns along haplotype paths precisely and efficiently. To demonstrate the utility of methylGrapher, we analyzed the WGBS data derived from five individuals whose genomes were included in the first Human Pangenome draft as well as WGBS data from ENCODE (EN-TEx). Along with standard performance benchmarking, we show that methylGrapher fully recapitulates DNA methylation patterns defined by classic linear genome analysis approaches. Importantly, methylGrapher captures a substantial number of CpG sites that are missed by linear methods, and improves overall genome coverage while reducing alignment reference bias. Thus, methylGrapher is a first step towards unlocking the full potential of Human Pangenome graphs in genomic DNA methylation analysis.
Project description:The draft genome of L. sativa (lettuce) cv. Tizian was sequenced in two Illumina sequencing runs, mate pair and shotgun. This entry contains the RAW sequencing data.
Project description:We present a draft genome assembly that includes 200 Gb of Illumina reads, 4 Gb of Moleculo synthetic long-reads and 108 Gb of Chicago libraries, with a final size matching the estimated genome size of 2.7 Gb, and a scaffold N50 of 4.8 Mb. We also present an alternative assembly including 27 Gb raw reads generated using the Pacific Biosciences platform. In addition, we sequenced the proteome of the same individual and RNA from three different tissue types from three other species of squid species (Onychoteuthis banksii, Dosidicus gigas, and Sthenoteuthis oualaniensis) to assist genome annotation. We annotated 33,406 protein coding genes supported by evidence and the genome completeness estimated by BUSCO reached 92%. Repetitive regions cover 49.17% of the genome.
Project description:We sequenced and analyzed the genome of a highly inbred miniature Chinese pig strain, the Banna Minipig Inbred Line (BMI). we conducted whole genome screening using next generation sequencing (NGS) technology and performed SNP calling using Sus Scrofa genome assembly Sscrofa11.1.
Project description:Although a standard genome-wide significance level has been accepted for the testing of association between common genetic variants and disease, the era of whole-genome sequencing (WGS) requires a new threshold. The allele frequency spectrum of sequence-identified variants is very different from common variants, and the identified rare genetic variation is usually jointly analyzed in a series of genomic windows or regions. In nearby or overlapping windows, these test statistics will be correlated, and the degree of correlation is likely to depend on the choice of window size, overlap, and the test statistic. Furthermore, multiple analyses may be performed using different windows or test statistics. Here we propose an empirical approach for estimating genome-wide significance thresholds for data arising from WGS studies, and we demonstrate that the empirical threshold can be efficiently estimated by extrapolating from calculations performed on a small genomic region. Because analysis of WGS may need to be repeated with different choices of test statistics or windows, this prediction approach makes it computationally feasible to estimate genome-wide significance thresholds for different analysis choices. Based on UK10K whole-genome sequence data, we derive genome-wide significance thresholds ranging between 2.5 × 10(-8) and 8 × 10(-8) for our analytic choices in window-based testing, and thresholds of 0.6 × 10(-8) -1.5 × 10(-8) for a combined analytic strategy of testing common variants using single-SNP tests together with rare variants analyzed with our sliding-window test strategy.