Genomics

Dataset Information

0

Benchmarking for alignment and variant calling


ABSTRACT: Background: Lung cancer is a heterogeneous disease and the primary cause of cancer-related mortality worldwide. Somatic mutations, including large structural variants, are important biomarkers in lung cancer for selecting targeted therapy. Genomic studies in lung cancer have been conducted using short-read sequencing. Emerging long-read sequencing technologies are a promising alternative to study somatic structural variants, however there is no current consensus on how to process data and call somatic events. In this study, we preformed whole genome sequencing of lung cancer and matched non-tumour samples using long and short read sequencing to comprehensively benchmark three sequence aligners and six structural variant callers comprised of four generic callers (Sniffles2, cuteSV, SVIM and DELLY in generic mode) and two somatic callers (nanomonsv and DELLY in somatic modes). Results: Different combinations of aligners and variant callers influenced somatic structural variant detection. The choice of caller had a significant influence on somatic structural variant detection in terms of variant type, size, sensitivity, and accuracy. The performance of each variant caller was assessed by comparing to somatic structural variants identified by short read sequencing. When compared to somatic structural variants detected with short read sequencing, more events were detected with long read sequencing. The mean recall of somatic variant events identified by short read sequencing was higher for the somatic callers (64%) than generic callers (52%), with nanomonsv (69%) showing the best performance. Conclusion: Long read sequencing can identify somatic structural variants. The longer reads have the potential to improve our understanding of cancer development and inform personalized cancer treatment.

PROVIDER: EGAS00001007819 | EGA |

REPOSITORIES: EGA

Similar Datasets

2023-09-20 | E-MTAB-13306 | biostudies-arrayexpress
2024-08-05 | GSE213338 | GEO
2024-08-05 | GSE213503 | GEO
| EGAD00001015399 | EGA
| EGAD00001015400 | EGA
| phs001255 | dbGaP
2024-10-15 | GSE270257 | GEO
2020-09-09 | GSE133361 | GEO
| EGAS00001004266 | EGA
| PRJNA636886 | ENA