Unknown

Dataset Information

0

Pan-Lineage Mycobacterium tuberculosis Reference Genome for Enhanced Molecular Diagnosis.


ABSTRACT: In Mycobacterium tuberculosis (MTB) control, whole genome sequencing-based molecular drug susceptibility testing (molDST-WGS) has emerged as a pivotal tool. However, the current reliance on a single-strain reference limits molDST-WGS's true potential. To address this, we introduce a new pan-lineage reference genome, "MtbRf". We assembled "unmapped" reads from 3,614 MTB genomes (751 L1; 881 L2; 1,700 L3; and 282 L4) into 35 shared, annotated contigs (54 CDSs). We constructed MtbRf through: 1) searching for contig homologs among genome database that precipitating results uniquely within Mycobacteria genus; 2) comparing genomes with H37Rv ("lift-over") to define 18 insertions; and 3) filling gaps in H37Rv with insertions. MtbRf adds 1.18% sequences to H37rv, salvaging >60% of previously unmapped reads. Transcriptomics confirmed gene-expression of new CDSs. The new variants provided a moderate DST predictive value (AUROC 0.60-0.75). MtbRf thus unveils previously hidden genomic information, and lays the foundation for lineage-specific molDST-WGS.

SUBMITTER: Bahk K 

PROVIDER: S-EPMC11339604 | biostudies-literature | 2024 Aug

REPOSITORIES: biostudies-literature

altmetric image

Publications

Pan-lineage Mycobacterium tuberculosis reference genome for enhanced molecular diagnosis.

Bahk Kunhyung K   Sung Joohon J   Seki Mitsuko M   Kim Kyungjong K   Kim Jina J   Choi Hongjo H   Whang Jake J   Mitarai Satoshi S  

DNA research : an international journal for rapid publication of reports on genes and genomes 20240801 4


In Mycobacterium tuberculosis (MTB) control, whole genome sequencing-based molecular drug susceptibility testing (molDST-WGS) has emerged as a pivotal tool. However, the current reliance on a single-strain reference limits molDST-WGS's true potential. To address this, we introduce a new pan-lineage reference genome, 'MtbRf'. We assembled 'unmapped' reads from 3,614 MTB genomes (751 L1; 881 L2; 1,700 L3; and 282 L4) into 35 shared, annotated contigs (54 coding sequences [CDSs]). We constructed Mt  ...[more]

Similar Datasets

| S-EPMC7058165 | biostudies-literature
| S-EPMC9673877 | biostudies-literature
| S-EPMC5814500 | biostudies-literature
| S-EPMC9729191 | biostudies-literature
| PRJEB66375 | ENA
| S-EPMC4457063 | biostudies-literature
| S-EPMC7582865 | biostudies-literature
| S-EPMC6615170 | biostudies-literature
| S-EPMC6440112 | biostudies-literature
| S-EPMC6291124 | biostudies-literature