Project description:The ligation step in RNA sequencing library generation is a known source of bias. We present the first comparison of the standard duplex adaptor protocol supplied by Life Technologies for use on the Ion Torrent PGM with an alternate single adaptor approach involving CircLigase (CircLig). We also investigate whether using the thermostable ligase Methanobacterium thermoautotrophicum RNA ligase K97A (Mth K97A) for the initial ligation step in the CircLigase protocol reduces bias. A pool of small RNA fragments of known composition was converted into a sequencing library using one of three protocols and sequenced on an Ion Torrent PGM. The single adaptor CircLigase-based approach significantly reduces, but does not eliminate, bias in Ion Torrent data. Using Mth K97A as part of the CircLig method does not further reduce bias.
Project description:Mutators represent a successful strategy in rapidly adapting asexual populations, but theory predicts their eventual extinction due to their unsustainably large deleterious load. While antimutator invasions have been documented experimentally, important discrepancies among studies remain currently unexplained. Here we show that a largely neglected factor, the mutational idiosyncrasy displayed by different mutators, can play a major role in this process. Analysing phylogenetically diverse bacteria, we find marked and systematic differences in the protein-disruptive effects of mutations caused by different mutators in species with different GC compositions. Computer simulations show that these differences can account for order-of-magnitude changes in antimutator fitness for a realistic range of parameters. Overall, our results suggest that antimutator dynamics may be highly dependent on the specific genetic, ecological and evolutionary history of a given population. This context-dependency further complicates our understanding of mutators in clinical settings, as well as their role in shaping bacterial genome size and composition.
Project description:BackgroundNext-generation sequencing does not yield fully unbiased estimates for read abundance, which may impact on the conclusions that can be drawn from sequencing data. The ligation step in RNA sequencing library generation is a known source of bias, motivating developments in enzyme technology and library construction protocols. We present the first comparison of the standard duplex adaptor protocol supplied by Life Technologies for use on the Ion Torrent PGM with an alternate single adaptor approach involving CircLigase (CircLig protocol).A correlation between over-representation in sequenced libraries and degree of secondary structure has been reported previously, therefore we also investigated whether bias could be reduced by ligation with an enzyme that functions at a temperature not permissive for such structure.ResultsA pool of small RNA fragments of known composition was converted into a sequencing library using one of three protocols and sequenced on an Ion Torrent PGM. The CircLig protocol resulted in less over-representation of specific sequences than the standard protocol. Over-represented sequences are more likely to be predicted to have secondary structure and to co-fold with adaptor sequences. However, use of the thermostable ligase Methanobacterium thermoautotrophicum RNA ligase K97A (Mth K97A) was not sufficient to reduce bias.ConclusionsThe single adaptor CircLigase-based approach significantly reduces, but does not eliminate, bias in Ion Torrent data. Ligases that function at temperatures to remove the possible influence of secondary structure on library generation may be of value, although Mth K97A is not effective in this case.
Project description:GC content bias describes the dependence between fragment count (read coverage) and GC content found in Illumina sequencing data. This bias can dominate the signal of interest for analyses that focus on measuring fragment abundance within a genome, such as copy number estimation (DNA-seq). The bias is not consistent between samples; and there is no consensus as to the best methods to remove it in a single sample. We analyze regularities in the GC bias patterns, and find a compact description for this unimodal curve family. It is the GC content of the full DNA fragment, not only the sequenced read, that most influences fragment count. This GC effect is unimodal: both GC-rich fragments and AT-rich fragments are underrepresented in the sequencing results. This empirical evidence strengthens the hypothesis that PCR is the most important cause of the GC bias. We propose a model that produces predictions at the base pair level, allowing strand-specific GC-effect correction regardless of the downstream smoothing or binning. These GC modeling considerations can inform other high-throughput sequencing analyses such as ChIP-seq and RNA-seq.
Project description:BackgroundGenes of conserved order in bacterial genomes tend to evolve slower than genes whose order is not conserved. In addition, genes with a GC content lower than the GC content of the resident genome are known to be selectively silenced by the histone-like nucleoid structuring protein (H-NS) in Salmonella.ResultsIn this study, we use a comparative genomics approach to demonstrate that in Salmonella, genes whose order is not conserved (or genes without homologs) in closely related bacteria possess a significantly lower average GC content in comparison to genes that preserve their relative position in the genome. Moreover, these genes are more frequently targeted by H-NS than genes that have conserved their genomic neighborhood. We also observed that duplicated genes that do not preserve their genomic neighborhood are, on average, under less selective pressure.ConclusionsWe establish a strong association between gene order, GC content and gene silencing in a model bacterial species. This analysis suggests that genes that are not under strong selective pressure (evolve faster than others) in Salmonella tend to accumulate more AT-rich mutations and are eventually silenced by H-NS. Our findings may establish new approaches for a better understanding of bacterial genome evolution and function, using information from functional and comparative genomics.
Project description:Standard Illumina libraries are biased toward sequences of intermediate GC-content. This results in an underrepresentation of GC-rich regions in sequencing projects of genomes with heterogeneous base composition, such as mammals and birds. We developed a simple, cost-effective protocol to enrich sheared genomic DNA in its GC-rich fraction by subtracting AT-rich DNA. This was achieved by heating DNA up to 90 °C before applying Illumina library preparation. We tested the new approach on chicken DNA and found that heated DNA increased average coverage in the GC-richest chromosomes by a factor up to six. Using a Taq polymerase supposedly appropriate for PCR amplification of GC-rich sequences had a much weaker effect. Our protocol should greatly facilitate sequencing and resequencing of the GC-richest regions of heterogeneous genomes, in combination with standard short-read and long-read technologies.
Project description:The variation of GC content is a key genome feature because it is associated with fundamental elements of genome organization. However, the reason for this variation is still an open question. Different kinds of hypotheses have been proposed to explain the variation of GC content during genome evolution. However, these hypotheses have not been explicitly investigated in whole plastome sequences. Dendrobium is one of the largest genera in the orchid species. Evolutionary studies of the plastomic organization and base composition are limited in this genus. In this study, we obtained the high-quality plastome sequences of D. loddigesii and D. devonianum. The comparison results showed a nearly identical organization in Dendrobium plastomes, indicating that the plastomic organization is highly conserved in Dendrobium genus. Furthermore, the impact of three evolutionary forces-selection, mutational biases, and GC-biased gene conversion (gBGC)-on the variation of GC content in Dendrobium plastomes was evaluated. Our results revealed: (1) consistent GC content evolution trends and mutational biases in single-copy (SC) and inverted repeats (IRs) regions; and (2) that gBGC has influenced the plastome-wide GC content evolution. These results suggest that both mutational biases and gBGC affect GC content in the plastomes of Dendrobium genus.
Project description:Next-generation sequencing (NGS) is increasingly recognized for its ability to overcome allele ambiguity and deliver high-resolution typing in the HLA system. Using this technology, non-uniform read distribution can impede the reliability of variant detection, which renders high-confidence genotype calling particularly difficult to achieve in the polymorphic HLA complex. Recently, library construction has been implicated as the dominant factor in instigating coverage bias. To study the impact of this phenomenon on HLA genotyping, we performed long-range PCR on 12 samples to amplify HLA-A, -B, -C, -DRB1, and -DQB1, and compared the relative contribution of three Illumina library construction methods (TruSeq Nano, Nextera, Nextera XT) in generating downstream bias. Here, we show high GC% to be a good predictor of low sequencing depth. Compared to standard TruSeq Nano, GC bias was more prominent in transposase-based protocols, particularly Nextera XT, likely through a combination of transposase insertion bias being coupled with a high number of PCR enrichment cycles. Importantly, our findings demonstrate non-uniform read depth can have a direct and negative impact on the robustness of HLA genotyping, which has clinical implications for users when choosing a library construction strategy that aims to balance cost and throughput with data quality.