Unknown

Dataset Information

0

Random forest classifiers trained on simulated data enable accurate short read-based genotyping of structural variants in the alpha globin region at Chr16p13.3.


ABSTRACT: In regions where reads don't align well to a reference, it is generally difficult to characterize structural variation using short read sequencing. Here, we utilize machine learning classifiers and short sequence reads to genotype structural variants in the alpha globin locus on chromosome 16, a medically-relevant region that is challenging to genotype in individuals. Using models trained only with simulated data, we accurately genotype two hard-to-distinguish deletions in two separate human cohorts. Furthermore, population allele frequencies produced by our methods across a wide set of ancestries agree more closely with previously-determined frequencies than those obtained using currently available genotyping software.

SUBMITTER: Hansen NF 

PROVIDER: S-EPMC10705532 | biostudies-literature | 2023 Nov

REPOSITORIES: biostudies-literature

altmetric image

Publications

Random forest classifiers trained on simulated data enable accurate short read-based genotyping of structural variants in the alpha globin region at Chr16p13.3.

Hansen Nancy F NF   Wang Xunde X   Tegegn Mickias B MB   Liu Zhi Z   Gouveia Mateus H MH   Hill Gracelyn G   Lin Jennifer C JC   Okulosubo Temiloluwa T   Shriner Daniel D   Thein Swee Lay SL   Mullikin James C JC  

bioRxiv : the preprint server for biology 20231127


In regions where reads don't align well to a reference, it is generally difficult to characterize structural variation using short read sequencing. Here, we utilize machine learning classifiers and short sequence reads to genotype structural variants in the alpha globin locus on chromosome 16, a medically-relevant region that is challenging to genotype in individuals. Using models trained only with simulated data, we accurately genotype two hard-to-distinguish deletions in two separate human coh  ...[more]

Similar Datasets

| S-EPMC10126698 | biostudies-literature
| S-EPMC5536784 | biostudies-literature
| S-EPMC6044046 | biostudies-literature
| S-EPMC10497997 | biostudies-literature
| S-EPMC8963323 | biostudies-literature
| S-EPMC7671382 | biostudies-literature
| S-EPMC11791018 | biostudies-literature
| S-EPMC479118 | biostudies-literature
| S-EPMC5600879 | biostudies-literature
| S-EPMC2739274 | biostudies-literature