Unknown

Dataset Information

0

Assessing genome assembly quality using the LTR Assembly Index (LAI).


ABSTRACT: Assembling a plant genome is challenging due to the abundance of repetitive sequences, yet no standard is available to evaluate the assembly of repeat space. LTR retrotransposons (LTR-RTs) are the predominant interspersed repeat that is poorly assembled in draft genomes. Here, we propose a reference-free genome metric called LTR Assembly Index (LAI) that evaluates assembly continuity using LTR-RTs. After correcting for LTR-RT amplification dynamics, we show that LAI is independent of genome size, genomic LTR-RT content, and gene space evaluation metrics (i.e., BUSCO and CEGMA). By comparing genomic sequences produced by various sequencing techniques, we reveal the significant gain of assembly continuity by using long-read-based techniques over short-read-based methods. Moreover, LAI can facilitate iterative assembly improvement with assembler selection and identify low-quality genomic regions. To apply LAI, intact LTR-RTs and total LTR-RTs should contribute at least 0.1% and 5% to the genome size, respectively. The LAI program is freely available on GitHub: https://github.com/oushujun/LTR_retriever.

SUBMITTER: Ou S 

PROVIDER: S-EPMC6265445 | biostudies-literature | 2018 Nov

REPOSITORIES: biostudies-literature

altmetric image

Publications

Assessing genome assembly quality using the LTR Assembly Index (LAI).

Ou Shujun S   Chen Jinfeng J   Jiang Ning N  

Nucleic acids research 20181101 21


Assembling a plant genome is challenging due to the abundance of repetitive sequences, yet no standard is available to evaluate the assembly of repeat space. LTR retrotransposons (LTR-RTs) are the predominant interspersed repeat that is poorly assembled in draft genomes. Here, we propose a reference-free genome metric called LTR Assembly Index (LAI) that evaluates assembly continuity using LTR-RTs. After correcting for LTR-RT amplification dynamics, we show that LAI is independent of genome size  ...[more]

Similar Datasets

| S-EPMC10184434 | biostudies-literature
| S-EPMC4374685 | biostudies-other
| S-EPMC4284522 | biostudies-literature
| S-EPMC5913808 | biostudies-other
| S-EPMC8557608 | biostudies-literature
| S-EPMC5913339 | biostudies-literature
| S-EPMC8002733 | biostudies-literature
| S-EPMC2387172 | biostudies-literature
| S-EPMC8881204 | biostudies-literature
| S-EPMC6357164 | biostudies-literature