Dataset Information

In-depth comparative analysis of Illumina^® MiSeq run metrics: Development of a wet-lab quality assessment tool.

ABSTRACT: Whole genome sequencing of bacterial isolates has become a daily task in many laboratories, generating incredible amounts of data. However, data acquisition is not an end in itself; the goal is to acquire high-quality data useful for understanding genetic relationships. Having a method that could rapidly determine which of the many available run metrics are the most important indicators of overall run quality and having a way to monitor these during a given sequencing run would be extremely helpful to this effect. Therefore, we compared various run metrics across 486 MiSeq runs, from five different machines. By performing a statistical analysis using principal components analysis and a K-means clustering algorithm of the metrics, we were able to validate metric comparisons among instruments, allowing for the development of a predictive algorithm, which permits one to observe whether a given MiSeq run has performed adequately. This algorithm is available in an Excel spreadsheet: that is, MiSeq Instrument & Run (In-Run) Forecast. Our tool can help verify that the quantity/quality of the generated sequencing data consistently meets or exceeds recommended manufacturer expectations. Patterns of deviation from those expectations can be used to assess potential run problems and plan preventative maintenance, which can save valuable time and funding resources.

SUBMITTER: Kastanis GJ

PROVIDER: S-EPMC6487961 | biostudies-literature | 2019 Mar

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

In-depth comparative analysis of Illumina<sup>®</sup> MiSeq run metrics: Development of a wet-lab quality assessment tool.

Kastanis George John GJ Santana-Quintero Luis V LV Sanchez-Leon Maria M Lomonaco Sara S Brown Eric W EW Allard Marc W MW

Molecular ecology resources 20190117 2

Whole genome sequencing of bacterial isolates has become a daily task in many laboratories, generating incredible amounts of data. However, data acquisition is not an end in itself; the goal is to acquire high-quality data useful for understanding genetic relationships. Having a method that could rapidly determine which of the many available run metrics are the most important indicators of overall run quality and having a way to monitor these during a given sequencing run would be extremely help ...[more]

PMID: 30506954

Similar Datasets

Project description:Illumina's MiSeq has become the dominant platform for gene amplicon sequencing in microbial ecology studies; however, various technical concerns, such as reproducibility, still exist. To assess reproducibility, 16S rRNA gene amplicons from 18 soil samples of a reciprocal transplantation experiment were sequenced on an Illumina MiSeq. The V4 region of 16S rRNA gene from each sample was sequenced in triplicate with each replicate having a unique barcode. The average OTU overlap, without considering sequence abundance, at a rarefaction level of 10,323 sequences was 33.4±2.1% and 20.2±1.7% between two and among three technical replicates, respectively. When OTU sequence abundance was considered, the average sequence abundance weighted OTU overlap was 85.6±1.6% and 81.2±2.1% for two and three replicates, respectively. Removing singletons significantly increased the overlap for both (~1-3%, p<0.001). Increasing the sequencing depth to 160,000 reads by deep sequencing increased OTU overlap both when sequence abundance was considered (95%) and when not (44%). However, if singletons were not removed the overlap between two technical replicates (not considering sequence abundance) plateaus at 39% with 30,000 sequences. Diversity measures were not affected by the low overlap as α-diversities were similar among technical replicates while β-diversities (Bray-Curtis) were much smaller among technical replicates than among treatment replicates (e.g., 0.269 vs. 0.374). Higher diversity coverage, but lower OTU overlap, was observed when replicates were sequenced in separate runs. Detrended correspondence analysis indicated that while there was considerable variation among technical replicates, the reproducibility was sufficient for detecting treatment effects for the samples examined. These results suggest that although there is variation among technical replicates, amplicon sequencing on MiSeq is useful for analyzing microbial community structure if used appropriately and with caution. For example, including technical replicates, removing spurious sequences and unrepresentative OTUs, using a clustering method with a high stringency for OTU generation, estimating treatment effects at higher taxonomic levels, and adapting the unique molecular identifier (UMI) and other newly developed methods to lower PCR and sequencing error and to identify true low abundance rare species all can increase reproducibility.

Dataset Information

In-depth comparative analysis of Illumina^® MiSeq run metrics: Development of a wet-lab quality assessment tool.

Publications

In-depth comparative analysis of Illumina<sup>®</sup> MiSeq run metrics: Development of a wet-lab quality assessment tool.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Dataset Information

In-depth comparative analysis of Illumina® MiSeq run metrics: Development of a wet-lab quality assessment tool.

Publications

In-depth comparative analysis of Illumina<sup>®</sup> MiSeq run metrics: Development of a wet-lab quality assessment tool.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

In-depth comparative analysis of Illumina^® MiSeq run metrics: Development of a wet-lab quality assessment tool.