Unknown

Dataset Information

0

Long-read sequencing and de novo assembly of a Chinese genome.


ABSTRACT: Short-read sequencing has enabled the de novo assembly of several individual human genomes, but with inherent limitations in characterizing repeat elements. Here we sequence a Chinese individual HX1 by single-molecule real-time (SMRT) long-read sequencing, construct a physical map by NanoChannel arrays and generate a de novo assembly of 2.93?Gb (contig N50: 8.3?Mb, scaffold N50: 22.0?Mb, including 39.3?Mb N-bases), together with 206?Mb of alternative haplotypes. The assembly fully or partially fills 274 (28.4%) N-gaps in the reference genome GRCh38. Comparison to GRCh38 reveals 12.8?Mb of HX1-specific sequences, including 4.1?Mb that are not present in previously reported Asian genomes. Furthermore, long-read sequencing of the transcriptome reveals novel spliced genes that are not annotated in GENCODE and are missed by short-read RNA-Seq. Our results imply that improved characterization of genome functional variation may require the use of a range of genomic technologies on diverse human populations.

SUBMITTER: Shi L 

PROVIDER: S-EPMC4931320 | biostudies-literature | 2016 Jun

REPOSITORIES: biostudies-literature

altmetric image

Publications


Short-read sequencing has enabled the de novo assembly of several individual human genomes, but with inherent limitations in characterizing repeat elements. Here we sequence a Chinese individual HX1 by single-molecule real-time (SMRT) long-read sequencing, construct a physical map by NanoChannel arrays and generate a de novo assembly of 2.93 Gb (contig N50: 8.3 Mb, scaffold N50: 22.0 Mb, including 39.3 Mb N-bases), together with 206 Mb of alternative haplotypes. The assembly fully or partially f  ...[more]

Similar Datasets

| S-EPMC7493909 | biostudies-literature
| S-EPMC6048559 | biostudies-literature
| S-EPMC11294445 | biostudies-literature
| S-EPMC7140821 | biostudies-literature
| S-EPMC8741817 | biostudies-literature
| S-EPMC8590762 | biostudies-literature
| S-EPMC7934570 | biostudies-literature
| S-EPMC9300133 | biostudies-literature
| S-EPMC7202035 | biostudies-literature
| S-EPMC9069071 | biostudies-literature