Unknown

Dataset Information

0

Aquila_stLFR: diploid genome assembly based structural variant calling package for stLFR linked-reads.


ABSTRACT:

Motivation

Identifying structural variants (SVs) is critical in health and disease, however, detecting them remains a challenge. Several linked-read sequencing technologies, including 10X Genomics, TELL-Seq and single tube long fragment read (stLFR), have been recently developed as cost-effective approaches to reconstruct multi-megabase haplotypes (phase blocks) from sequence data of a single sample. These technologies provide an optimal sequencing platform to characterize SVs, though few computational algorithms can utilize them. Thus, we developed Aquila_stLFR, an approach that resolves SVs through haplotype-based assembly of stLFR linked-reads.

Results

Aquila_stLFR first partitions long fragment reads into two haplotype-specific blocks with the assistance of the high-quality reference genome, by taking advantage of the potential phasing ability of the linked-read itself. Each haplotype is then assembled independently, to achieve a complete diploid assembly to finally reconstruct the genome-wide SVs. We benchmarked Aquila_stLFR on a well-studied sample, NA24385, and showed Aquila_stLFR can detect medium to large size deletions (50 bp-10 kb) with high sensitivity and medium-size insertions (50 bp-1 kb) with high specificity.

Availability and implementation

Source code and documentation are available on https://github.com/maiziex/Aquila_stLFR.

Supplementary information

Supplementary data are available at Bioinformatics Advances online.

SUBMITTER: Liu YH 

PROVIDER: S-EPMC9710574 | biostudies-literature | 2021

REPOSITORIES: biostudies-literature

altmetric image

Publications

Aquila_stLFR: diploid genome assembly based structural variant calling package for stLFR linked-reads.

Liu Yichen Henry YH   Grubbs Griffin L GL   Zhang Lu L   Fang Xiaodong X   Dill David L DL   Sidow Arend A   Zhou Xin X  

Bioinformatics advances 20210616 1


<h4>Motivation</h4>Identifying structural variants (SVs) is critical in health and disease, however, detecting them remains a challenge. Several linked-read sequencing technologies, including 10X Genomics, TELL-Seq and single tube long fragment read (stLFR), have been recently developed as cost-effective approaches to reconstruct multi-megabase haplotypes (phase blocks) from sequence data of a single sample. These technologies provide an optimal sequencing platform to characterize SVs, though fe  ...[more]

Similar Datasets

| S-EPMC7889865 | biostudies-literature
| S-EPMC6879002 | biostudies-literature
| S-EPMC10045170 | biostudies-literature
| S-EPMC6609817 | biostudies-literature
| S-EPMC9122538 | biostudies-literature
| S-EPMC10997618 | biostudies-literature
| S-EPMC7671403 | biostudies-literature
| S-EPMC7792008 | biostudies-literature
| S-EPMC11916464 | biostudies-literature
| S-EPMC6341484 | biostudies-literature