Project description:Constructing high-quality haplotype-resolved genome assemblies has substantially improved the ability to detect and characterize genetic variants. A targeted approach providing readily access to the rich information from haplotype-resolved genome assemblies will be appealing to groups of basic researchers and medical scientists focused on specific genomic regions. Here, using the 4.5 megabase, notoriously difficult-to-assemble major histocompatibility complex (MHC) region as an example, we demonstrated an approach to construct haplotype-resolved assembly of the targeted genomic region with the CRISPR-based enrichment. Compared to the results from haplotype-resolved genome assembly, our targeted approach achieved comparable completeness and accuracy with reduced computing complexity, sequencing cost, as well as the amount of starting materials. Moreover, using the targeted assembled personal MHC haplotypes as the reference both improves the quantification accuracy for sequencing data and enables allele-specific functional genomics analyses of the MHC region. Given its highly efficient use of resources, our approach can greatly facilitate population genetic studies of targeted regions, and may pave a new way to elucidate the molecular mechanisms in disease etiology.
Project description:Constructing high-quality haplotype-resolved genome assemblies has substantially improved the ability to detect and characterize genetic variants. A targeted approach providing readily access to the rich information from haplotype-resolved genome assemblies will be appealing to groups of basic researchers and medical scientists focused on specific genomic regions. Here, using the 4.5 megabase, notoriously difficult-to-assemble major histocompatibility complex (MHC) region as an example, we demonstrated an approach to construct haplotype-resolved assembly of the targeted genomic region with the CRISPR-based enrichment. Compared to the results from haplotype-resolved genome assembly, our targeted approach achieved comparable completeness and accuracy with reduced computing complexity, sequencing cost, as well as the amount of starting materials. Moreover, using the targeted assembled personal MHC haplotypes as the reference both improves the quantification accuracy for sequencing data and enables allele-specific functional genomics analyses of the MHC region. Given its highly efficient use of resources, our approach can greatly facilitate population genetic studies of targeted regions, and may pave a new way to elucidate the molecular mechanisms in disease etiology.
Project description:Aegilops tauschii is the donor of the wheat D subgenome and an important genetic resource for wheat. The assembly of Ae. tauschii acc. AL8/78 reference genome sequence Aet v4.0 was therefore an important milestone for wheat biology and breeding. The combination of the > 4.2 Gb size of the Ae. tauschii genome and > 84% of recently evolved repeated sequences make sequencing this genome challenging. Here, we report further advances in the development of the Ae. tauschii acc. AL8/78 genome sequence. Two new genome-wide optical maps were constructed and employed in the revision of pseudomolecules and estimations of gap lengths. Gaps were closed with contigs of single-molecule Pacific Biosciences reads. The number of gaps in Aet v5.0 decreased by 38,899 compared to Aet v4.0. Transposable elements and protein-coding genes were reannotated. The number of high-confidence genes was reduced from 38,886 in Aet v4.0 to 32,980 in Aet v5.0. A nonredundant set of 478 biologically important genes including many of known function in wheat was manually annotated. Sixty-one microRNA precursor and 60 phasiRNA loci were discovered, annotated, and their expression was characterized. Also characterized was expression of other small RNAs, such as hc-siRNAs and tRFs. This upgraded genome sequence will facilitate the use of Ae. tauschii in wheat breeding and biological research. Aegilops tauschii is the donor of the wheat D subgenome and an important genetic resource for wheat. The assembly of Ae. tauschii acc. AL8/78 reference genome sequence Aet v4.0 was therefore an important milestone for wheat biology and breeding. The combination of the > 4.2 Gb size of the Ae. tauschii genome and > 84% of recently evolved repeated sequences make sequencing this genome challenging. Here, we report further advances in the development of the Ae. tauschii acc. AL8/78 genome sequence. Two new genome-wide optical maps were constructed and employed in the revision of pseudomolecules and estimations of gap lengths. Gaps were closed with contigs of single-molecule Pacific Biosciences reads. The number of gaps in Aet v5.0 decreased by 38,899 compared to Aet v4.0. Transposable elements and protein-coding genes were reannotated. The number of high-confidence genes was reduced from 38,886 in Aet v4.0 to 32,980 in Aet v5.0. A nonredundant set of 478 biologically important genes including many of known function in wheat was manually annotated. Sixty-one microRNA precursor and 60 phasiRNA loci were discovered, annotated, and their expression was characterized. Also characterized was expression of other small RNAs, such as hc-siRNAs and tRFs. This upgraded genome sequence will facilitate the use of Ae. tauschii in wheat breeding and biological research. Aegilops tauschii is the donor of the wheat D subgenome and an important genetic resource for wheat. The assembly of Ae. tauschii acc. AL8/78 reference genome sequence Aet v4.0 was therefore an important milestone for wheat biology and breeding. The combination of the > 4.2 Gb size of the Ae. tauschii genome and > 84% of recently evolved repeated sequences make sequencing this genome challenging. Here, we report further advances in the development of the Ae. tauschii acc. AL8/78 genome sequence. Two new genome-wide optical maps were constructed and employed in the revision of pseudomolecules and estimations of gap lengths. Gaps were closed with contigs of single-molecule Pacific Biosciences reads. The number of gaps in Aet v5.0 decreased by 38,899 compared to Aet v4.0. Transposable elements and protein-coding genes were reannotated. The number of high-confidence genes was reduced from 38,886 in Aet v4.0 to 32,980 in Aet v5.0. A nonredundant set of 478 biologically important genes including many of known function in wheat was manually annotated. Sixty-one microRNA precursor and 60 phasiRNA loci were discovered, annotated, and their expression was characterized. Also characterized was expression of other small RNAs, such as hc-siRNAs and tRFs. This upgraded genome sequence will facilitate the use of Ae. tauschii in wheat breeding and biological research.