Unknown

Dataset Information

0

Genomic leftovers: identifying novel microsatellites, over-represented motifs and functional elements in the human genome.


ABSTRACT: The human genome is 99% complete. This study contributes to filling the 1% gap by enriching previously unknown repeat regions called microsatellites (MST). We devised a Global MST Enrichment (GME) kit to enrich and nextgen sequence 2 colorectal cell lines and 16 normal human samples to illustrate its utility in identifying contigs from reads that do not map to the genome reference. The analysis of these samples yielded 790 novel extra-referential concordant contigs that are observed in more than one sample. We searched for evidence of functional elements in the concordant contigs in two ways: (1) BLAST-ing each contig against normal RNA-Seq samples, (2) Checking for predicted functional elements using GlimmerHMM. Of the 790 concordant contigs, 37 had an exact match to at least one RNA-Seq read; 15 aligned to more than 100 RNA-Seq reads. Of the 249 concordant contigs predicted by GlimmerHMM to have functional elements, 6 had at least one exact RNA-Seq match. BLAST-ing these novel contigs against all publically available sequences confirmed that they were found in human and chimpanzee BAC and FOSMID clones sequenced as part of the original human genome project. These extra-referential contigs predominantly contained pentameric repeats, especially two motifs: AATGG and GTGGA.

SUBMITTER: Fonville NC 

PROVIDER: S-EPMC4899811 | biostudies-literature | 2016 Jun

REPOSITORIES: biostudies-literature

altmetric image

Publications

Genomic leftovers: identifying novel microsatellites, over-represented motifs and functional elements in the human genome.

Fonville Natalie C NC   Velmurugan Karthik Raja KR   Tae Hongseok H   Vaksman Zalman Z   McIver Lauren J LJ   Garner Harold R HR  

Scientific reports 20160609


The human genome is 99% complete. This study contributes to filling the 1% gap by enriching previously unknown repeat regions called microsatellites (MST). We devised a Global MST Enrichment (GME) kit to enrich and nextgen sequence 2 colorectal cell lines and 16 normal human samples to illustrate its utility in identifying contigs from reads that do not map to the genome reference. The analysis of these samples yielded 790 novel extra-referential concordant contigs that are observed in more than  ...[more]

Similar Datasets

| S-EPMC2762409 | biostudies-literature
| S-EPMC152803 | biostudies-literature
2014-09-11 | GSE58217 | GEO
| S-EPMC4204604 | biostudies-literature
| S-EPMC1538840 | biostudies-literature
| S-EPMC4551918 | biostudies-literature
2014-09-11 | E-GEOD-58217 | biostudies-arrayexpress
| S-EPMC6678329 | biostudies-literature