Project description:BackgroundStructural Variations (SVs) are genomic rearrangements derived from duplication, deletion, insertion, inversion, and translocation events. In the past, SVs detection was limited to cytological approaches, then to Next-Generation Sequencing (NGS) short reads and partitioned assemblies. Nowadays, technologies such as DNA long read sequencing and optical mapping have revolutionized the understanding of SVs in genomes, due to the enhancement of the power of SVs detection. This study aims to investigate performance of two techniques, 1) long-read sequencing obtained with the MinION device (Oxford Nanopore Technologies) and 2) optical mapping obtained with Saphyr device (Bionano Genomics) to detect and characterize SVs in the genomes of the two ecotypes of Arabidopsis thaliana, Columbia-0 (Col-0) and Landsberg erecta 1 (Ler-1).ResultsWe described the SVs detected from the alignment of the best ONT assembly and DLE-1 optical maps of A. thaliana Ler-1 against the public reference genome Col-0 TAIR10.1. After filtering (SV > 1 kb), 1184 and 591 Ler-1 SVs were retained from ONT and Bionano technologies respectively. A total of 948 Ler-1 ONT SVs (80.1%) corresponded to 563 Bionano SVs (95.3%) leading to 563 common locations. The specific locations were scrutinized to assess improvement in SV detection by either technology. The ONT SVs were mostly detected near TE and gene features, and resistance genes seemed particularly impacted.ConclusionsStructural variations linked to ONT sequencing error were removed and false positives limited, with high quality Bionano SVs being conserved. When compared with the Col-0 TAIR10.1 reference genome, most of the detected SVs discovered by both technologies were found in the same locations. ONT assembly sequence leads to more specific SVs than Bionano one, the latter being more efficient to characterize large SVs. Even if both technologies are complementary approaches, ONT data appears to be more adapted to large scale populations studies, while Bionano performs better in improving assembly and describing specificity of a genome compared to a reference.
Project description:The Oxford Nanopore Technologies (ONT) MinION is a new sequencing technology that potentially offers read lengths of tens of kilobases (kb) limited only by the length of DNA molecules presented to it. The device has a low capital cost, is by far the most portable DNA sequencer available, and can produce data in real-time. It has numerous prospective applications including improving genome sequence assemblies and resolution of repeat-rich regions. Before such a technology is widely adopted, it is important to assess its performance and limitations in respect of throughput and accuracy. In this study we assessed the performance of the MinION by re-sequencing three bacterial genomes, with very different nucleotide compositions ranging from 28.6% to 70.7%; the high G + C strain was underrepresented in the sequencing reads. We estimate the error rate of the MinION (after base calling) to be 38.2%. Mean and median read lengths were 2 kb and 1 kb respectively, while the longest single read was 98 kb. The whole length of a 5 kb rRNA operon was covered by a single read. As the first nanopore-based single molecule sequencer available to researchers, the MinION is an exciting prospect; however, the current error rate limits its ability to compete with existing sequencing technologies, though we do show that MinION sequence reads can enhance contiguity of de novo assembly when used in conjunction with Illumina MiSeq data.
Project description:Pharmacogenomics (PGx) studies the impact of interindividual genomic variation on drug response, allowing the opportunity to tailor the dosing regimen for each patient. Current targeted PGx testing platforms are mainly based on microarray, polymerase chain reaction, or short-read sequencing. Despite demonstrating great value for the identification of single nucleotide variants (SNVs) and insertion/deletions (INDELs), these assays do not permit identification of large structural variants, nor do they allow unambiguous haplotype phasing for star-allele assignment. Here, we used Oxford Nanopore Technologies' adaptive sampling to enrich a panel of 1,036 genes with well-documented PGx relevance extracted from the Pharmacogenomics Knowledge Base (PharmGKB). By evaluating concordance with existing truth sets, we demonstrate accurate variant and star-allele calling for five Genome in a Bottle reference samples. We show that up to three samples can be multiplexed on one PromethION flow cell without a significant drop in variant calling performance, resulting in 99.35% and 99.84% recall and precision for the targeted variants, respectively. This work advances the use of nanopore sequencing in clinical PGx settings.
Project description:High quality reference genome sequences are the core of modern genomics. Oxford Nanopore Technologies (ONT) produces inexpensive DNA sequences, but has high error rates, which make sequence assembly and analysis difficult as genome size and complexity increases. Robust experimental design is necessary for ONT genome sequencing and assembly, but few studies have addressed eukaryotic organisms. Here, we present novel results using simulated and empirical ONT and DNA libraries to identify best practices for sequencing and assembly for several model species. We find that the unique error structure of ONT libraries causes errors to accumulate and assembly statistics plateau as sequence depth increases. High-quality assembled eukaryotic sequences require high-molecular-weight DNA extractions that increase sequence read length, and computational protocols that reduce error through pre-assembly correction and read selection. Our quantitative results will be helpful for researchers seeking guidance for de novo assembly projects.
Project description:Flax (Linum usitatissimum L.) is attacked by numerous devastating fungal pathogens, including Colletotrichum lini, Aureobasidium pullulans, and Fusarium verticillioides (Fusarium moniliforme). The effective control of flax diseases follows the paradigm of extensive molecular research on pathogenicity. However, such studies require quality genome sequences of the studied organisms. This article reports on the approaches to assembling a high-quality fungal genome from the Oxford Nanopore Technologies data. We sequenced the genomes of C. lini, A. pullulans, and F. verticillioides (F. moniliforme) and received different volumes of sequencing data: 1.7 Gb, 3.9 Gb, and 11.1 Gb, respectively. To obtain the optimal genome sequences, we studied the effect of input data quality and genome coverage on assembly statistics and tested the performance of different assembling and polishing software. For C. lini, the most contiguous and complete assembly was obtained by the Flye assembler and the Homopolish polisher. The genome coverage had more effect than data quality on assembly statistics, likely due to the relatively low amount of sequencing data obtained for C. lini. The final assembly was 53.4 Mb long and 96.4% complete (according to the glomerellales_odb10 BUSCO dataset), consisted of 42 contigs, and had an N50 of 4.4 Mb. For A. pullulans and F. verticillioides (F. moniliforme), the best assemblies were produced by Canu-Medaka and Canu-Homopolish, respectively. The final assembly of A. pullulans had a length of 29.5 Mb, 99.4% completeness (dothideomycetes_odb10), an N50 of 2.4 Mb and consisted of 32 contigs. F. verticillioides (F. moniliforme) assembly was 44.1 Mb long, 97.8% complete (hypocreales_odb10), consisted of 54 contigs, and had an N50 of 4.4 Mb. The obtained results can serve as a guideline for assembling a de novo genome of a fungus. In addition, our data can be used in genomic studies of fungal pathogens or plant-pathogen interactions and assist in the management of flax diseases.
Project description:Structural variation (SV) is a major cause of genetic disorders. In this paper, we show that low-depth (specifically, 4×) whole-genome sequencing using a single Oxford Nanopore MinION flow cell suffices to support sensitive detection of SV, particularly pathogenic SV for supporting clinical diagnosis. When using 4× ONT WGS data, existing SV calling software often fails to detect pathogenic SV, especially in the form of long deletion, terminal deletion, duplication, and unbalanced translocation. Our new SV calling software SENSV can achieve high sensitivity for all types of SV and a breakpoint precision typically ± 100 bp; both features are important for clinical concerns. The improvement achieved by SENSV stems from several new algorithms. We evaluated SENSV and other software using both real and simulated data. The former was based on 24 patient samples, each diagnosed with a genetic disorder. SENSV found the pathogenic SV in 22 out of 24 cases (all heterozygous, size from hundreds of kbp to a few Mbp), reporting breakpoints within 100 bp of the true answers. On the other hand, no existing software can detect the pathogenic SV in more than 10 out of 24 cases, even when the breakpoint requirement is relaxed to ± 2000 bp.