Genomics

Dataset Information

0

Characterization of Missing Human Genome Sequences and Copy-number Polymorphic Insertions.


ABSTRACT: The high level of human genome structural variation among individuals suggests that there must be portions of the genome that have yet to be discovered, annotated and characterized at the sequence level. Using clone resources developed as part of the Human Genome Structural Variation Sequencing Project, we focused on the characterization of 2,363 novel sequence contigs not present in the human reference genome. We determined that these contigs corresponded to 720 distinct loci of which 400 now have an anchored position in the reference genome. We investigated the sequence properties of these loci and determined that 37% of these novel insertions are copy-number polymorphic. We find that they are significantly enriched within the last 5 Mb of chromosomes (a 2.9-fold enrichment, p=1.0e-18, binomial test) and that most arose as a result of deletions in the human lineage after separation from the African great apes. A subset of these sites shows evidence of marked population stratification among Asian, African and European populations, including a 3.9-kb insertion within the first intron of the lactase gene. Complete sequencing of clones from 192 genomic loci, including 156 completely spanned insertions, provides a detailed and contextual view of 1.67 Mb of inserted sequence. Analysis of this sequence identified 477 elements that show evidence of sequence constraint over evolutionary time, as well as matches to 22 RefSeq gene segments. Twenty-six of the insertions contain matches against mRNA-seq data indicating the potential presence of functionally important, unannotated human sequences. Taking advantage of this high-quality sequence, we develop a method to accurately genotype these novel insertions using next-generation whole-genome sequencing datasets.

ORGANISM(S): Homo sapiens

PROVIDER: GSE20634 | GEO | 2010/04/20

SECONDARY ACCESSION(S): PRJNA124851

REPOSITORIES: GEO

Dataset's files

Source:
Action DRS
Other
Items per page:
1 - 1 of 1

Similar Datasets

2022-05-18 | MTBLS3657 | MetaboLights
2007-02-14 | GSE7005 | GEO
2011-09-08 | E-GEOD-29215 | biostudies-arrayexpress
2011-09-08 | GSE29215 | GEO
2010-05-15 | GSE21040 | GEO
2023-07-03 | GSE234089 | GEO
2018-03-16 | GSE108401 | GEO
2011-02-18 | E-GEOD-27381 | biostudies-arrayexpress
2021-08-20 | GSE169198 | GEO
2015-09-15 | GSE61565 | GEO