Unknown

Dataset Information

0

IterCluster: a barcode clustering algorithm for long fragment read analysis.


ABSTRACT: Recent advances in long fragment read (LFR, also known as linked-read technologies or read-cloud) technologies, such as single tube long fragment reads (stLFR), 10X Genomics Chromium reads, and TruSeq synthetic long-reads, have enabled efficient haplotyping and genome assembly. However, in the case of stLFR and 10X Genomics Chromium reads, the long fragments of a genome are covered sparsely by reads in each barcode and most barcodes are contained in multiple long fragments from different regions, which results in inefficient assembly when using long-range information. Thus, methods to address these shortcomings are vital for capitalizing on the additional information obtained using these technologies. We therefore designed IterCluster, a novel, alignment-free clustering algorithm that can cluster barcodes from the same target region of a genome, using -mer frequency-based features and a Markov Cluster (MCL) approach to identify enough reads in a target region of a genome to ensure sufficient target genome sequence depth. The IterCluster method was validated using BGI stLFR and 10X Genomics chromium reads datasets. IterCluster had a higher precision and recall rate on BGI stLFR data compared to 10X Genomics Chromium read data. In addition, we demonstrated how IterCluster improves the de novo assembly results when using a divide-and-conquer strategy on a human genome data set (scaffold/contig N50 = 13.2 kbp/7.1 kbp vs. 17.1 kbp/11.9 kbp before and after IterCluster, respectively). IterCluster provides a new way for determining LFR barcode enrichment and a novel approach for de novo assembly using LFR data. IterCluster is OpenSource and available on https://github.com/JianCong-WENG/IterCluster.

SUBMITTER: Weng J 

PROVIDER: S-EPMC7100596 | biostudies-literature | 2020

REPOSITORIES: biostudies-literature

altmetric image

Publications

IterCluster: a barcode clustering algorithm for long fragment read analysis.

Weng Jiancong J   Chen Tian T   Xie Yinlong Y   Xu Xun X   Zhang Gengyun G   Peters Brock A BA   Drmanac Radoje R  

PeerJ 20200324


Recent advances in long fragment read (LFR, also known as linked-read technologies or read-cloud) technologies, such as single tube long fragment reads (stLFR), 10X Genomics Chromium reads, and TruSeq synthetic long-reads, have enabled efficient haplotyping and genome assembly. However, in the case of stLFR and 10X Genomics Chromium reads, the long fragments of a genome are covered sparsely by reads in each barcode and most barcodes are contained in multiple long fragments from different regions  ...[more]

Similar Datasets

| S-EPMC6049041 | biostudies-literature
| S-EPMC6526642 | biostudies-literature
| S-EPMC7863402 | biostudies-literature
| S-EPMC5382505 | biostudies-literature
| S-EPMC10881092 | biostudies-literature
| S-EPMC4748558 | biostudies-literature
| S-EPMC10589539 | biostudies-literature
| S-EPMC4908357 | biostudies-literature
2024-07-19 | E-MTAB-14238 | biostudies-arrayexpress
| PRJEB12651 | ENA