Project description:Constructing high-quality haplotype-resolved genome assemblies has substantially improved the ability to detect and characterize genetic variants. A targeted approach providing readily access to the rich information from haplotype-resolved genome assemblies will be appealing to groups of basic researchers and medical scientists focused on specific genomic regions. Here, using the 4.5 megabase, notoriously difficult-to-assemble major histocompatibility complex (MHC) region as an example, we demonstrated an approach to construct haplotype-resolved assembly of the targeted genomic region with the CRISPR-based enrichment. Compared to the results from haplotype-resolved genome assembly, our targeted approach achieved comparable completeness and accuracy with reduced computing complexity, sequencing cost, as well as the amount of starting materials. Moreover, using the targeted assembled personal MHC haplotypes as the reference both improves the quantification accuracy for sequencing data and enables allele-specific functional genomics analyses of the MHC region. Given its highly efficient use of resources, our approach can greatly facilitate population genetic studies of targeted regions, and may pave a new way to elucidate the molecular mechanisms in disease etiology.
Project description:Constructing high-quality haplotype-resolved genome assemblies has substantially improved the ability to detect and characterize genetic variants. A targeted approach providing readily access to the rich information from haplotype-resolved genome assemblies will be appealing to groups of basic researchers and medical scientists focused on specific genomic regions. Here, using the 4.5 megabase, notoriously difficult-to-assemble major histocompatibility complex (MHC) region as an example, we demonstrated an approach to construct haplotype-resolved assembly of the targeted genomic region with the CRISPR-based enrichment. Compared to the results from haplotype-resolved genome assembly, our targeted approach achieved comparable completeness and accuracy with reduced computing complexity, sequencing cost, as well as the amount of starting materials. Moreover, using the targeted assembled personal MHC haplotypes as the reference both improves the quantification accuracy for sequencing data and enables allele-specific functional genomics analyses of the MHC region. Given its highly efficient use of resources, our approach can greatly facilitate population genetic studies of targeted regions, and may pave a new way to elucidate the molecular mechanisms in disease etiology.
Project description:BmN4 cells are cultured cells derived from Bombyx mori ovaries and widely used to study transposon silencing by PIWI-interacting RNAs (piRNAs). A high-accurate genome sequence of BmN4 cells is required to analyze the piRNA pathway using RNA-seq. The genome sequence of BmN4 cells was assembled using Pacific Biosciences (PacBio) HiFi and Oxford Nanopore technology Ultralong (ONT-UL) reads. Microscopic observation and image analysis showed that BmN4 cells were octoploid on average, and the number of chromosomes per cell was highly variable. We concluded the haplotype-resolved assembly of such a complex genome would be difficult; therefore, we assembled a consensus genome sequence. RNA-seq analysis of Siwi knockdown cells also revealed that Siwi-piRISC may target Countdown (Cd), an LTR retrotransposon. By comparing the consensus genome sequence with the reads, we identified differences between haplotypes, particulary structural variants, suggesting that some transposons, including Countdown, increased their copy number in BmN4 cells.
Project description:Trans-homolog interactions encompass potent regulatory functions, which have been studied extensively in Drosophila, where homologs are paired in somatic cells and pairing-dependent gene regulation, or transvection, is well-documented. Nevertheless, the structure of pairing and whether its functional impact is genome-wide have eluded analysis. Accordingly, we generated a diploid cell line from divergent parents and applied haplotype-resolved Hi-C, discovering that homologs pair relatively precisely genome-wide in addition to establishing trans-homolog domains and compartments. We also elucidated the structure of pairing with unprecedented detail, documenting significant variation across the genome. In particular, we characterized two forms: tight pairing, consisting of contiguous small domains, and loose pairing, consisting of single larger domains. Strikingly, active genomic regions (A-type compartments, active chromatin, expressed genes) correlated with tight pairing, suggesting that pairing has a functional role genome-wide. Finally, using RNAi and haplotype-resolved Hi-C, we show that disruption of pairing-promoting factors results in global changes in pairing.
Project description:<p>The section <em>Oleifera</em> (Theaceae) has attracted attention for the high levels of unsaturated fatty acids found in its seeds. Here, we report the chromosome-scale genome of the sect. <em>Oleifera</em> using diploid wild <em>Camellia lanceoleosa</em> with a final size of 3.00 Gb and an N50 scaffold size of 186.43 Mb. Repetitive sequences accounted for 80.63% and were distributed unevenly across the genome. <em>Camellia lanceoleosa</em> underwent a whole-genome duplication event approximately 65 million years ago (65 Mya), prior to the divergence of <em>C</em>. <em>lanceoleosa</em> and <em>Camellia sinensis</em> (approx. 6-7 Mya). Syntenic comparisons of these two species elucidated the genomic rearrangement, appearing to be driven in part by the activity of transposable elements. The expanded and positively selected genes in <em>C</em>. <em>lanceoleosa</em> were significantly enriched in oil biosynthesis, and the expansion of homomeric <em>acetyl-coenzyme A carboxylase</em> (<em>ACCase</em>) genes and the seed-biased expression of genes encoding heteromeric ACCase, diacylglycerol acyltransferase, glyceraldehyde-3-phosphate dehydrogenase and stearoyl-ACP desaturase could be of primary importance for the high oil and oleic acid content found in <em>C. lanceoleosa</em>. Theanine and catechins were present in the leaves of <em>C</em>. <em>lanceoleosa</em>. However, caffeine can not be dectected in the leaves but was abundant in the seeds and roots. The functional and transcriptional divergence of genes encoding SAM-dependent <em>N</em>-methyltransferases may be associated with caffeine accumulation and distribution. Gene expression profiles, structural composition and chromosomal location suggest that the late-acting self-incompatibility of <em>C. lanceoleosa</em> is likely to have favoured a novel mechanism co-occurring with gametophytic self-incompatibility. This study provides valuable resources for quantitative and qualitative improvements and genome assembly of polyploid plants in sect. <em>Oleifera</em>.</p>
Project description:Two complementary protein extraction methodologies coupled with an automated proteomic platform were employed to analyze tissue-specific proteomes and characterize biological and metabolic processes in sweet potato. A total of 74,255 peptides corresponding to 4,321 nonredundant proteins were successfully identified. Data were compared to predicted protein accessions for Ipomea species and mapped on the sweet potato transcriptome and haplotype-resolved genome. A proteogenomics analysis successfully mapped 12,902 peptides against the transcriptome or genome, representing 90.4% of the total 14,275 uniquely identified peptides, predicted 741 new protein-coding genes, and specified 2726 loci where annotations can be further improved. Overall, 39,916 peptides mapped to 3,143 unique proteins in leaves, and 34,339 peptides mapped to 2,928 unique proteins in roots; 32% and 27% unique identified proteins were leaves- and roots-specific, respectively.