Unknown

Dataset Information

0

Complement Genome Annotation Lift Over Using a Weighted Sequence Alignment Strategy.


ABSTRACT: With the broad application of high-throughput sequencing, more whole-genome resequencing data and de novo assemblies of natural populations are becoming available. For a particular species, in general, only the reference genome is well established and annotated. Computational tools based on sequence alignment have been developed to investigate the gene models of individuals belonging to the same or closely related species. During this process, inconsistent alignment often obscures genome annotation lift over and leads to improper functional impact prediction for a genomic variant, especially in plant species. Here, we proposed the zebraic striped dynamic programming algorithm, which provides different weights to genetic features to refine genome annotation lift over. Testing of our zebraic striped dynamic programming algorithm on both plant and animal genomic data showed complementation to standard sequence approach for highly diverse individuals. Using the lift over genome annotation as anchors, a base-pair resolution genome-wide sequence alignment and variant calling pipeline for de novo assembly has been implemented in the GEAN software. GEAN could be used to compare haplotype diversity, refine the genetic variant functional annotation, annotate de novo assembly genome sequence, detect homologous syntenic blocks, improve the quantification of gene expression levels using RNA-seq data, and unify genomic variants for population genetic analysis. We expect that GEAN will be a standard tool for the coming of age of de novo assembly population genetics.

SUBMITTER: Song B 

PROVIDER: S-EPMC6902276 | biostudies-literature | 2019

REPOSITORIES: biostudies-literature

altmetric image

Publications

Complement Genome Annotation Lift Over Using a Weighted Sequence Alignment Strategy.

Song Baoxing B   Sang Qing Q   Wang Hai H   Pei Huimin H   Gan XiangChao X   Wang Fen F  

Frontiers in genetics 20191113


With the broad application of high-throughput sequencing, more whole-genome resequencing data and <i>de novo</i> assemblies of natural populations are becoming available. For a particular species, in general, only the reference genome is well established and annotated. Computational tools based on sequence alignment have been developed to investigate the gene models of individuals belonging to the same or closely related species. During this process, inconsistent alignment often obscures genome  ...[more]

Similar Datasets

| S-EPMC3142524 | biostudies-literature
| S-EPMC4611657 | biostudies-literature
| S-EPMC4422528 | biostudies-literature
| S-EPMC8412297 | biostudies-literature
| S-EPMC4688996 | biostudies-literature
| S-EPMC4920119 | biostudies-literature
| S-EPMC10759460 | biostudies-literature
| S-EPMC5853703 | biostudies-literature
2023-08-04 | PXD038112 | Pride
2023-08-04 | PXD038175 | Pride