Project description:Targeted DNA sequencing approaches will improve how the size of short tandem repeats is measured for diagnostic tests and pre-clinical studies. The expansion of these sequences causes dozens of disorders, with longer tracts generally leading to a more severe disease. In addition, interruptions are sometimes present within repeats and can alter disease manifestation. Despite advances in methodologies, determining repeat size and identifying interruptions in targeted sequencing datasets remains a major challenge. This is because standard alignment tools are ill-suited for the repetitive nature of these sequences. To address this, we have developed Repeat Detector (RD), a deterministic profile weighting algorithm for counting repeats in targeted sequencing data. We tested RD using blood-derived DNA samples from Huntington’s disease (HD) and Fuchs endothelial corneal dystrophy patients sequenced using either Illumina MiSeq or Pacific Biosciences single-molecule, real-time sequencing platforms. RD was highly accurate in determining repeat sizes of 609 HD blood-derived samples and did not require prior knowledge of the flanking sequences or their polymorphisms within the patient population. We demonstrate that RD can be used to identify individuals with repeat interruptions and may provide a measure of repeat instability within an individual. RD is therefore highly versatile and may find applications in the diagnosis of expanded repeat disorders and the development of novel therapies.
2022-03-22 | GSE199005 | GEO
Project description:Capture Seuencing of Tandem Repeats
| PRJNA422490 | ENA
Project description:Genome Scanning for Short Tandem Repeats
Project description:In plants, RNA polymerase II (Pol II) transcription of inverted DNA repeats produces hairpin RNAs that are processed by several DICER-LIKE enzymes into siRNAs that are 21-24-nt in length. When targeted to transcriptional regulatory regions, the 24-nt size class can induce RNA-directed DNA methylation (RdDM) and transcriptional gene silencing (TGS). In a forward genetic screen to identify mutants defective in RdDM of a target enhancer leading to TGS of a downstream GFP reporter gene in Arabidopsis thaliana, we recovered a structurally mutated silencer locus, named SM-NM-^T35S, in which the 35S promoter driving transcription of an inverted repeat of target enhancer sequences had been specifically deleted. Although Pol II-dependent, hairpin-derived 21-24-nt siRNAs were no longer generated at the newly created SM-NM-^T35S locus, the GFP reporter gene was nevertheless still partially silenced. Silencing was associated with methylation in a short tandem repeat in the upstream target enhancer and with low levels of 24-nt tandem repeat siRNAs. Introducing an nrpd1 mutation into the SM-NM-^T35S line fully released GFP silencing and eliminated both the tandem repeat methylation and associated 24-nt siRNAs, demonstrating their dependence on Pol IV. Deletion of the 35S promoter thus revealed a Pol IV-dependent pathway of 24-nt siRNA biogenesis that was previously inhibited or masked by the Pol II-dependent pathway in wild-type plants. Both Pol II- and Pol IV-dependent siRNAs accrued predominantly from cytosine (C)-containing segments of the tandem repeat monomer, suggesting that the local base composition influenced siRNA accumulation. Preferential accumulation of siRNAs at C-containing sequences was also observed at an endogenous tandem repeat comprising discrete C-rich and AT-rich sections. Our studies illuminate the potential complexity of siRNA generation at repeat-containing loci and show that Pol IV can act in siRNA biogenesis in the absence of a conventional Pol II promoter. Examination of whole-genome DNA methylation status in transgenic T+S Arabidopsis plant
Project description:The biomarker CA125, a peptide epitope located in several tandem repeats of the mucin MUC16, is the gold-standard for monitoring regression and recurrence of high-grade serous ovarian cancer in response to therapy. However, the CA125 epitope along with several structural features of the MUC16 molecule are ill-defined. One central aspect still unresolved is the number of tandem repeats in MUC16 and how many of these contain the CA125 epitope. Studies from the early 2000s assembled short DNA reads to estimate that MUC16 contained 63 repeats. Here, we conduct Nanopore long-read sequencing of MUC16 transcripts from three primary ovarian tumors and established cell lines (OVCAR3, OVCAR5, and Kuramochi) for a more exhaustive and accurate estimation and sequencing of the MUC16 tandem repeats. The consensus sequence derived from these six sources was confirmed by proteomics validation and agrees with recent additions to the NCBI database. We propose a model of MUC16 containing 19—not 63—tandem repeats. Additionally, we predict the structure of the tandem repeat domain using the deep-learning algorithm, AlphaFold. The predicted structure displays an SEA domain and unstructured linker region rich in proline, serine, and threonine residues in all 19 tandem repeats. Our studies now pave the way for a detailed characterization of the CA125 epitope. Sequencing and modeling of the MUC16 tandem repeats along with their glycoproteomic characterization, currently underway in our laboratories, will help identify novel epitopes in the MUC16 molecule that improve on the sensitivity and clinical utility of the current CA125 assay.
Project description:We here describe the first successful construction of a targeted tandem duplication of a large chromosomal segment in Aspergillus oryzae. The targeted tandem chromosomal duplication was achieved by using strains that had 5’ΔpyrG upstream of the region targeted for tandem chromosomal duplication and 3’ΔpyrG downstream of the target region. Consequently, strains bearing a 210-kb targeted tandem chromosomal duplication near the centromeric region of chromosome 8 and strains bearing a targeted tandem chromosomal duplication of a 700-kb region of chromosome 2 were successfully constructed. The strains bearing the tandem chromosomal duplication were efficiently obtained from the regenerated protoplast of the parental strains. However, the generation of the chromosomal duplication did not depend on the introduction of double-stranded breaks (DSBs) by I-SceI. The chromosomal duplications of these strains were stably maintained after five generations of culture under non-selective conditions. The strains bearing the tandem chromosomal duplication in the 700-kb region of chromosome 2 showed highly increased protease activity in solid-state culture, indicating that the duplication of large chromosomal segments could be a useful new breeding technology and gene analysis method.
Project description:Centromeres are the chromosomal sites of assembly for kinetochores, the protein complexes that attach to spindle fibers and mediate separation of chromosomes to daughter cells in mitosis and meiosis. In most multicellular organisms, centromeres comprise a single specific family of tandem repeats, often 100-400 bp in length, found on every chromosome, typically in one location within heterochromatin. Drosophila melanogaster is unusual in that the heterochromatin contains many families of mostly short (5-12 bp) tandem repeats, none of which appears to be present at all centromeres, and none of which is found only at centromeres. Although centromere sequences from a minichromosome have been identified and candidate centromere sequences have been proposed, the DNA sequences at native Drosophila centromeres remain unknown. Here we use native chromatin immunoprecipitation to identify the centromeric sequences bound by the foundational kinetochore protein cenH3, known in vertebrates as CENP-A. In D. melanogaster, these sequences include a few families of 5-bp and 10-bp repeats, but in closely related D. simulans, a partially overlapping set of short repeats and more complex repeats comprise the centromeres. The results suggest that a recent expansion of short repeats is replacing more complex centromeric repeats in the melanogaster subgroup of Drosophila.
Project description:We here describe the first successful construction of a targeted tandem duplication of a large chromosomal segment in Aspergillus oryzae. The targeted tandem chromosomal duplication was achieved by using strains that had 5M-bM-^@M-^YM-NM-^TpyrG upstream of the region targeted for tandem chromosomal duplication and 3M-bM-^@M-^YM-NM-^TpyrG downstream of the target region. Consequently, strains bearing a 210-kb targeted tandem chromosomal duplication near the centromeric region of chromosome 8 and strains bearing a targeted tandem chromosomal duplication of a 700-kb region of chromosome 2 were successfully constructed. The strains bearing the tandem chromosomal duplication were efficiently obtained from the regenerated protoplast of the parental strains. However, the generation of the chromosomal duplication did not depend on the introduction of double-stranded breaks (DSBs) by I-SceI. The chromosomal duplications of these strains were stably maintained after five generations of culture under non-selective conditions. The strains bearing the tandem chromosomal duplication in the 700-kb region of chromosome 2 showed highly increased protease activity in solid-state culture, indicating that the duplication of large chromosomal segments could be a useful new breeding technology and gene analysis method. A. oryzae strain bearing a 210-kb targeted tandem chromosomal duplication, A. oryzae strain bearing a 700-kb targeted tandem chromosomal duplication, and A. oryzae RIB40 (wild type strain), were cultivated in Polypeptone-dextrin medium. After 3 days cultivation, genomic DNAs from the samples were extracted, and array CGH analysis was carried out to confirm the chromosomal duplications in the strains.