Project description:Significant progress has been made in elucidating single nucleotide polymorphism diversity in the human population. However, the majority of the variation space in the genome is structural and remains partially elusive. One form of structural variation is tandem repeats (TRs). Expansion of TRs are responsible for over 40 diseases, but we hypothesize these represent only a fraction of the pathogenic repeat expansions that exist. Here we characterize long or expanded TR variation in 1,115 human genomes as well as a replication cohort of 2,504 genomes, identified using ExpansionHunter Denovo. We found that individual genomes typically harbor several rare, large TRs, generally in non-coding regions of the genome. We noticed that these large TRs are enriched in their proximity to Alu elements. The vast majority of these large TRs seem to be expansions of smaller TRs that are already present in the reference genome. We are providing this TR profile as a resource for comparison to undiagnosed rare disease genomes in order to detect novel disease-causing repeat expansions.
Project description:We present a graph-based method for the analysis of repeat families in a repeat library. We build a repeat domain graph that decomposes a repeat library into repeat domains, short subsequences shared by multiple repeat families, and reveals the mosaic structure of repeat families. Our method recovers documented mosaic repeat structures and suggests additional putative ones. Our method is useful for elucidating the evolutionary history of repeats and annotating de novo generated repeat libraries.
Project description:We tested four gene enrichment and complexity reduction target preparation methods for scoring SFPs on the Affymetrix GeneChip 18k Maize Genome Array (Maize GeneChip). Methylation filtration (MF), Cot filtration (CF), mRNA-derived cRNA, and amplified fragment length polymorphism (AFLP) methods were applied to three diverse maize inbred lines (B73, Mo17, and CML69) with three replications per line (36 Maize GeneChips). Due to large amounts of repetitive, mobile DNA, the maize genome requires a target preparation method that offers both a high level of gene enrichment and accurate scoring of SFPs. The objectives of this research are (i) to determine which target preparation method (CF, MF, mRNA, or AFLP) optimally enriches for gene sequences complementary to probe sequences on the Affymetrix GeneChip Maize Genome Array and (ii) to estimate SFP detection power for each target method. The AFLP technology is covered by patents, and patent applications owned by Keygene N.V. AFLP is a registered trademark of Keygene N.V. GeneChip. This work was supported in part by U.S. National Science Foundation grant DBI-0321467 and USDA-ARS. Mention of trade names or commercial products in this publication is solely for the purpose of providing specific information and does not imply recommendation or endorsement by the USDA. ****[PLEXdb(http://www.plexdb.org) has submitted this series at GEO on behalf of the original contributor, Michael Gore. The equivalent experiment is ZM10 at PLEXdb.]
Project description:The identification of repeat structure in eukaryotic genomes can be time-consuming and difficult because of the large amount of information ( approximately 3 x 10(9) bp) that needs to be processed and compared. We introduce a new approach based on exact word counts to evaluate, de novo, the repeat structure present within large eukaryotic genomes. This approach avoids sequence alignment and similarity search, two of the most time-consuming components of traditional methods for repeat identification. Algorithms were implemented to efficiently calculate exact counts for any length oligonucleotide in large genomes. Based on these oligonucleotide counts, oligonucleotide excess probability clouds, or "P-clouds," were constructed. P-clouds are composed of clusters of related oligonucleotides that occur, as a group, more often than expected by chance. After construction, P-clouds were mapped back onto the genome, and regions of high P-cloud density were identified as repetitive regions based on a sliding window approach. This efficient method is capable of analyzing the repeat content of the entire human genome on a single desktop computer in less than half a day, at least 10-fold faster than current approaches. The predicted repetitive regions strongly overlap with known repeat elements as well as other repetitive regions such as gene families, pseudogenes, and segmental duplicons. This method should be extremely useful as a tool for use in de novo identification of repeat structure in large newly sequenced genomes.
Project description:Current clinical therapy of non-small cell lung cancer depends on histo-pathological classification. This approach poorly predicts clinical outcome for individual patients. Proteogenomic characterization analysis holds promise to improve clinical stratification, thus paving the way for individualized therapy. We investigated proteogenomic characterization and performed comprehensive integrative genomic analysis of human large cell lung cancer. Here we analyzed proteomes of 29 paired normal lung tissues and large cell lung cancer, identified significantly deregulated proteins associated with large cell lung cancer.
Project description:To provide full characterization of genome changes in six commonly used head and neck cancer cell lines. These data will serve as an excellent resource when designing future experiments that attempt to model HNSCC behaviour.
Project description:To provide full characterization of genome changes in six commonly used head and neck cancer cell lines. These data will serve as an excellent resource when designing future experiments that attempt to model HNSCC behaviour. Six commonly used ATCC head and neck cancer cell lines are analyzed.
Project description:Human genomes are now being rapidly sequenced, but not all forms of genetic variation are routinely characterized. In this study, we focus on Alu retrotransposition events and seek to characterize differences in the pattern of mobile insertion between individuals based on the analysis of eight human genomes sequenced using next-generation sequencing. Applying a rapid read-pair analysis algorithm, we discover 4342 Alu insertions not found in the human reference genome and show that 98% of a selected subset (63/64) experimentally validate. Of these new insertions, 89% correspond to AluY elements, suggesting that they arose by retrotransposition. Eighty percent of the Alu insertions have not been previously reported and more novel events were detected in Africans when compared with non-African samples (76% vs. 69%). Using these data, we develop an experimental and computational screen to identify ancestry informative Alu retrotransposition events among different human populations.
Project description:This study provides a first large-scale cloning and characterization of Sclerotinia sclerotiorum milRNAs and milRNAs candidates. Two microRNA-like RNAs (milRNAs) and 42 milRNA candidates were identified by sequence analysis. These milRNAs and candidates provide new insights into the functional roles of small RNAs and adds new resources for the study of plant pathogenic fungi.