Unknown

Dataset Information

0

Establishing community reference samples, data and call sets for benchmarking cancer mutation detection using whole-genome sequencing.


ABSTRACT: The lack of samples for generating standardized DNA datasets for setting up a sequencing pipeline or benchmarking the performance of different algorithms limits the implementation and uptake of cancer genomics. Here, we describe reference call sets obtained from paired tumor-normal genomic DNA (gDNA) samples derived from a breast cancer cell line-which is highly heterogeneous, with an aneuploid genome, and enriched in somatic alterations-and a matched lymphoblastoid cell line. We partially validated both somatic mutations and germline variants in these call sets via whole-exome sequencing (WES) with different sequencing platforms and targeted sequencing with >2,000-fold coverage, spanning 82% of genomic regions with high confidence. Although the gDNA reference samples are not representative of primary cancer cells from a clinical sample, when setting up a sequencing pipeline, they not only minimize potential biases from technologies, assays and informatics but also provide a unique resource for benchmarking 'tumor-only' or 'matched tumor-normal' analyses.

SUBMITTER: Fang LT 

PROVIDER: S-EPMC8532138 | biostudies-literature | 2021 Sep

REPOSITORIES: biostudies-literature

altmetric image

Publications

Establishing community reference samples, data and call sets for benchmarking cancer mutation detection using whole-genome sequencing.

Fang Li Tai LT   Zhu Bin B   Zhao Yongmei Y   Chen Wanqiu W   Yang Zhaowei Z   Kerrigan Liz L   Langenbach Kurt K   de Mars Maryellen M   Lu Charles C   Idler Kenneth K   Jacob Howard H   Zheng Yuanting Y   Ren Luyao L   Yu Ying Y   Jaeger Erich E   Schroth Gary P GP   Abaan Ogan D OD   Talsania Keyur K   Lack Justin J   Shen Tsai-Wei TW   Chen Zhong Z   Stanbouly Seta S   Tran Bao B   Shetty Jyoti J   Kriga Yuliya Y   Meerzaman Daoud D   Nguyen Cu C   Petitjean Virginie V   Sultan Marc M   Cam Margaret M   Mehta Monika M   Hung Tiffany T   Peters Eric E   Kalamegham Rasika R   Sahraeian Sayed Mohammad Ebrahim SME   Mohiyuddin Marghoob M   Guo Yunfei Y   Yao Lijing L   Song Lei L   Lam Hugo Y K HYK   Drabek Jiri J   Vojta Petr P   Maestro Roberta R   Gasparotto Daniela D   Kõks Sulev S   Reimann Ene E   Scherer Andreas A   Nordlund Jessica J   Liljedahl Ulrika U   Jensen Roderick V RV   Pirooznia Mehdi M   Li Zhipan Z   Xiao Chunlin C   Sherry Stephen T ST   Kusko Rebecca R   Moos Malcolm M   Donaldson Eric E   Tezak Zivana Z   Ning Baitang B   Tong Weida W   Li Jing J   Duerken-Hughes Penelope P   Catalanotti Claudia C   Maheshwari Shamoni S   Shuga Joe J   Liang Winnie S WS   Keats Jonathan J   Adkins Jonathan J   Tassone Erica E   Zismann Victoria V   McDaniel Timothy T   Trent Jeffrey J   Foox Jonathan J   Butler Daniel D   Mason Christopher E CE   Hong Huixiao H   Shi Leming L   Wang Charles C   Xiao Wenming W  

Nature biotechnology 20210909 9


The lack of samples for generating standardized DNA datasets for setting up a sequencing pipeline or benchmarking the performance of different algorithms limits the implementation and uptake of cancer genomics. Here, we describe reference call sets obtained from paired tumor-normal genomic DNA (gDNA) samples derived from a breast cancer cell line-which is highly heterogeneous, with an aneuploid genome, and enriched in somatic alterations-and a matched lymphoblastoid cell line. We partially valid  ...[more]

Similar Datasets

| S-EPMC11245320 | biostudies-literature
| S-EPMC8294598 | biostudies-literature
| S-EPMC10350007 | biostudies-literature
| S-EPMC3383317 | biostudies-literature
| S-EPMC6966772 | biostudies-literature
| S-EPMC5408850 | biostudies-literature
| S-EPMC10762021 | biostudies-literature
| S-EPMC5017645 | biostudies-literature
| S-EPMC4230234 | biostudies-literature
| S-EPMC9354439 | biostudies-literature