Project description:Genomic rearrangement, often driven by insertion sequence (IS) elements, is one of the major processes in the evolution of prokaryotes. Sequence analysis of 16S rRNA of Lactobacillus helveticus, an organism that evolved in a dairy environment and Lactobacillus acidiophilus an organism that evolved associated with the gastrointestinal tract (GIT) demonstrated 98.4% identity suggesting that they divergently evolved from a common ancestor. Moreover, complete genome sequence analysis of both organisms has demonstrated a remarkable degree of gene synteny (75% homologous genes) despite the presence of an exceptionally high number and diversity of IS elements in the Lb. helveticus genome. Array based comparative genomic hybridization (aCGH) performed on nine strains of Lb. helveticus revealed sixteen clusters of open reading frames (ORFs) flanked by IS elements. Four of these ORFs are associated with restriction/modification which may have played a role in accelerated evolution of strains in a commercially intensive ecosystem undoubtedly challenged through successive phage attack. Furthermore, analysis of the IS-flanked clusters demonstrated that the most frequently encountered IS were also those most abundant in the genome (IS1201, ISL2, ISLhe1, ISLhe2, ISLhe65 and ISLhe63). These findings contribute to the overall viewpoint on a versatile character of IS elements and the role they may play in bacterial genome plasticity.
Project description:The ideal genome sequence for medical interpretation is complete and diploid, capturing the full spectrum of genetic variation. Toward this end, there has been progress in discovery of single nucleotide polymorphism (SNP) and small (<10bp) insertion/deletions (indels), but annotation of larger structural variation (SV) including copy number variation (CNV) has been less comprehensive, even with available diploid sequence assemblies. We applied a multi-step sequence and microarray-based analysis to identify numerous previously unknown SVs within the first genome sequence reported from an individual.
Project description:The ideal genome sequence for medical interpretation is complete and diploid, capturing the full spectrum of genetic variation. Toward this end, there has been progress in discovery of single nucleotide polymorphism (SNP) and small (<10bp) insertion/deletions (indels), but annotation of larger structural variation (SV) including copy number variation (CNV) has been less comprehensive, even with available diploid sequence assemblies. We applied a multi-step sequence and microarray-based analysis to identify numerous previously unknown SVs within the first genome sequence reported from an individual.
Project description:The ideal genome sequence for medical interpretation is complete and diploid, capturing the full spectrum of genetic variation. Toward this end, there has been progress in discovery of single nucleotide polymorphism (SNP) and small (<10bp) insertion/deletions (indels), but annotation of larger structural variation (SV) including copy number variation (CNV) has been less comprehensive, even with available diploid sequence assemblies. We applied a multi-step sequence and microarray-based analysis to identify numerous previously unknown SVs within the first genome sequence reported from an individual.
Project description:The ideal genome sequence for medical interpretation is complete and diploid, capturing the full spectrum of genetic variation. Toward this end, there has been progress in discovery of single nucleotide polymorphism (SNP) and small (<10bp) insertion/deletions (indels), but annotation of larger structural variation (SV) including copy number variation (CNV) has been less comprehensive, even with available diploid sequence assemblies. We applied a multi-step sequence and microarray-based analysis to identify numerous previously unknown SVs within the first genome sequence reported from an individual.
Project description:The ideal genome sequence for medical interpretation is complete and diploid, capturing the full spectrum of genetic variation. Toward this end, there has been progress in discovery of single nucleotide polymorphism (SNP) and small (<10bp) insertion/deletions (indels), but annotation of larger structural variation (SV) including copy number variation (CNV) has been less comprehensive, even with available diploid sequence assemblies. We applied a multi-step sequence and microarray-based analysis to identify numerous previously unknown SVs within the first genome sequence reported from an individual.
Project description:The skin commensal yeast Malassezia is associated with several skin disorders. To establish a reference resource, we sought to determine the complete genome sequence of Malassezia sympodialis and identify its protein-coding genes. A novel genome annotation workflow combining RNA sequencing, proteomics, and manual curation was developed to determine gene structures with high accuracy.
Project description:<p><strong>BACKGROUND:</strong> Manchurian walnut (Juglans mandshurica Maxim.) is a tree with multiple industrial uses and medicinal properties in the Juglandaceae family (walnuts and hickories). J. mandshurica produces juglone, which is a toxic allelopathic agent and has potential utilization value. Furthermore, the seed of J. mandshurica is rich in various unsaturated fatty acids and has high nutritive value.</p><p><strong>FINDINGS:</strong> Here, we present a high-quality chromosome-scale reference genome assembly and annotation for J. mandshurica (n = 16) with a contig N50 of 21.4 Mb by combining PacBio high-fidelity reads with high-throughput chromosome conformation capture data. The assembled genome has an estimated sequence size of 548.7 Mb and consists of 657 contigs, 623 scaffolds and 40,453 protein-coding genes. In total, 60.99% of the assembled genome consists of repetitive sequences. Sixteen super-scaffolds corresponding to the 16 chromosomes were assembled, with a scaffold N50 length of 33.7 Mb and a BUSCO complete gene percentage of 98.3%. J. mandshurica displays a close sequence relationship with Juglans cathayensis, with a divergence time of 13.8 million years ago. Combining the high-quality genome, transcriptome and metabolomics data, we constructed a gene-to-metabolite network and identified 566 core and conserved differentially expressed genes, which may be involved in juglone biosynthesis. Five CYP450 genes were found that may contribute to juglone accumulation. NAC, bZip, NF-YA and NF-YC are positively correlated with the juglone content. Some candidate regulators (e.g., FUS3, ABI3, LEC2 and WRI1 transcription factors) involved in the regulation of lipid biosynthesis were also identified.</p><p><strong>CONCLUSIONS:</strong> Our genomic data provide new insights into the evolution of the walnut genome and create a new platform for accelerating molecular breeding and improving the comprehensive utilization of these economically important tree species.</p>