Dataset Information

Identification of errors introduced during high throughput sequencing of the T cell receptor repertoire.

ABSTRACT:

Background

Recent advances in massively parallel sequencing have increased the depth at which T cell receptor (TCR) repertoires can be probed by >3log10, allowing for saturation sequencing of immune repertoires. The resolution of this sequencing is dependent on its accuracy, and direct assessments of the errors formed during high throughput repertoire analyses are limited.

Results

We analyzed 3 monoclonal TCR from TCR transgenic, Rag-/- mice using Illumina® sequencing. A total of 27 sequencing reactions were performed for each TCR using a trifurcating design in which samples were divided into 3 at significant processing junctures. More than 20 million complementarity determining region (CDR) 3 sequences were analyzed. Filtering for lower quality sequences diminished but did not eliminate sequence errors, which occurred within 1-6% of sequences. Erroneous sequences were pre-dominantly of correct length and contained single nucleotide substitutions. Rates of specific substitutions varied dramatically in a position-dependent manner. Four substitutions, all purine-pyrimidine transversions, predominated. Solid phase amplification and sequencing rather than liquid sample amplification and preparation appeared to be the primary sources of error. Analysis of polyclonal repertoires demonstrated the impact of error accumulation on data parameters.

Conclusions

Caution is needed in interpreting repertoire data due to potential contamination with mis-sequence reads. However, a high association of errors with phred score, high relatedness of erroneous sequences with the parental sequence, dominance of specific nt substitutions, and skewed ratio of forward to reverse reads among erroneous sequences indicate approaches to filter erroneous sequences from repertoire data sets.

SUBMITTER: Nguyen P

PROVIDER: S-EPMC3045962 | biostudies-literature | 2011 Feb

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Identification of errors introduced during high throughput sequencing of the T cell receptor repertoire.

Nguyen Phuong P Ma Jing J Pei Deqing D Obert Caroline C Cheng Cheng C Geiger Terrence L TL

BMC genomics 20110211

<h4>Background</h4>Recent advances in massively parallel sequencing have increased the depth at which T cell receptor (TCR) repertoires can be probed by >3log10, allowing for saturation sequencing of immune repertoires. The resolution of this sequencing is dependent on its accuracy, and direct assessments of the errors formed during high throughput repertoire analyses are limited.<h4>Results</h4>We analyzed 3 monoclonal TCR from TCR transgenic, Rag-/- mice using Illumina® sequencing. A total of ...[more]

PMID: 21310087

Similar Datasets

Project description:BackgroundAs one of "γδ-high" species, chicken is an excellent model for the study of γδ T cells in non-mammalian animals. However, a comprehensive characterization of the TCRγδ repertoire is still missing in chicken. The objective of this study was to characterize the expressed TCRγ repertoire in chicken thymus using high-throughput sequencing.MethodsIn this study, we first obtained the detailed genomic organization of the TCRγ locus of chicken based on the latest assembly of the red jungle fowl genome sequences (GRCg6a) and then characterized the TCRγ repertoire in the thymus of four chickens by using 5' Rapid Amplification of cDNA Ends (5' RACE) along with high-throughput sequencing (HTS).ResultsThe chicken TCRγ locus contains a single Cγ gene, three functional Jγ segments and 44 Vγ segments that could be classified into six subgroups, each containing six, nineteen, nine, four, three and three members. Dot-plot analysis of the chicken TCRγ locus against itself showed that almost all the entire zone containing Vγ segments had arisen through tandem duplication events, and the main homology unit, containing 9 or 10 Vγ gene segments, has tandemly duplicated for four times. For the analysis of chicken TCRγ repertoire, more than 100,000 unique Vγ-region nucleotide sequences were obtained from the thymus of each chicken. After alignment to the germline Vγ and Jγ segments identified above, we found that the four chickens had similar repertoire profile of TCRγ. In brief, four Vγ segments (including Vγ3.7, Vγ2.13, Vγ1.6 and Vγ1.3) and six Vγ-Jγ pairs (including Vγ3.7-Jγ3, Vγ2.13-Jγ1, Vγ2.13-Jγ3, Vγ1.6-Jγ3, Vγ3.7-Jγ1 and Vγ1.6-Jγ1) were preferentially utilized by all four individuals, and vast majority of the unique CDR3γ sequences encoded 4 to 22 amino acids with mean 12.90 amino acids, which exhibits a wider length distribution and/or a longer mean length than CDR3γ of human, mice and other animal species.ConclusionsIn this study, we present the first in-depth characterization of the TCRγ repertoire in chicken thymus. We believe that these data will facilitate the studies of adaptive immunology in birds.

Dataset Information

Identification of errors introduced during high throughput sequencing of the T cell receptor repertoire.

Background

Results

Conclusions

Publications

Identification of errors introduced during high throughput sequencing of the T cell receptor repertoire.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets