Project description:The majority of embryos created through in vitro fertilization (IVF) do not implant. It seems plausible that rates of implantation would improve if we had a better understanding of molecular factors affecting embryo competence. Currently, the process of selecting an embryo for uterine transfer uses an ad hoc combination of morphological criteria, the kinetics of development, and genetic testing for aneuploidy. However, no single criterion can ensure selection of a viable embryo. In contrast, RNA-sequencing (RNA-seq) of embryos could yield high-dimensional data, which may provide additional insight and illuminate the discrepancies among current selection criteria. Recent advances enabling the production of RNA-seq libraries from single cells have facilitated the application of this technique to the study of transcriptional events in early human development. However, these studies have not assessed the quality of their constituent embryos relative to commonly used embryological criteria. Here, we perform proof-of-principle advancement to embryo selection procedures by generating RNA-seq libraries from a trophectoderm biopsy as well as the remaining whole embryo. We combine state-of-the-art embryological methods with low-input RNA-seq to develop the first transcriptome-wide approach for assessing embryo competence. Specifically, we show the capacity of RNA-seq as a promising tool in preimplantation screening by showing that biopsies of an embryo can capture valuable information available in the whole embryo from which they are derived. Furthermore, we show that this technique can be used to generate a RNA-based digital karyotype and to identify candidate competence-associated genes. Together, these data establish the foundation for a future RNA-based diagnostic in IVF.
Project description:The Genetic Association Information Network (GAIN) Data Access Committee was established in June 2007 to provide prompt and fair access to data from six genome-wide association studies through the database of Genotypes and Phenotypes (dbGaP). Of 945 project requests received through 2011, 749 (79%) have been approved; median receipt-to-approval time decreased from 14 days in 2007 to 8 days in 2011. Over half (54%) of the proposed research uses were for GAIN-specific phenotypes; other uses were for method development (26%) and adding controls to other studies (17%). Eight data-management incidents, defined as compromises of any of the data-use conditions, occurred among nine approved users; most were procedural violations, and none violated participant confidentiality. Over 5 years of experience with GAIN data access has demonstrated substantial use of GAIN data by investigators from academic, nonprofit, and for-profit institutions with relatively few and contained policy violations. The availability of GAIN data has allowed for advances in both the understanding of the genetic underpinnings of mental-health disorders, diabetes, and psoriasis and the development and refinement of statistical methods for identifying genetic and environmental factors related to complex common diseases.
Project description:CircRNAs are a group of endogenous noncoding RNAs. The quickly developing high throughput RNA sequencing technologies along with novel bioinformatics approaches have enabled researchers to systematically identify circRNAs and their biological functions in cells. Deep sequencing of rRNA-depleted RNAs treated with RNase R, which digests linear RNAs and leaves circRNAs enriched, is an efficient way to identify circRNAs. However, very few of RNase R treated data are at hand but a large amount of total RNA-Seq data with no sequencing costs is available, for circRNA predictions. In this study, we systematically investigated the prediction bias from total RNA-Seq data as well as the influence of sequencing depth, sequencing quality and single-end or paired-end sequencing strategy on the predictions. We also identified circRNA properties that may contribute to the improved prediction performance. Our analysis shows that circRNA predictions from total RNA-Seq data gain ∼50% true positive. Sequencing error dramatically worsens the predictions, rather than single-end sequencing strategy or low sequencing depth. However, false positive can be carefully controlled by using data with good quality and narrowing down circRNAs guided by their properties.
Project description:The majority of embryos that are created through IVF do not implant. One possibility for this inefficiency is an incomplete understanding of the molecular factors affecting embryo competence. Currently, the process of selecting an embryo for uterine transfer utilizes an ad-hoc combination of morphological criteria, the kinetics of development, and genetic testing for aneuploidy. However, no single criterion can ensure selection of a viable embryo. In contrast, RNA-sequencing of embryos could yield highly dimensional data, which may provide additional insight and illuminate the discrepancies among current selection criteria. Indeed, recent advances enabling the production of RNA-sequencing (RNA-seq) libraries from single cells have facilitated the application of this technique to the study of some transcriptional events in early human development. However, these studies have not assessed the quality of their constituent embryos relative to commonly used embryological criteria. Here, we perform proof-of-principle advancement to clinical selection procedures by generating high quality RNA-seq libraries from a trophectoderm biopsy as well as the remaining whole embryo. We combine state-of-the-art embryological methods with low-input RNA-seq to develop the first transcriptome-wide approach for use in future predictive embryology studies. Specifically, we demonstrate the capacity of RNA-seq as a promising tool in preimplantation screening by showing that biopsies of an embryo can capture valuable information content available in the whole embryo from which they are derived. Furthermore, we show that this technique can be used to generate a RNA-based digital karyotype, and to develop a foundational dataset for identifying candidate competence-associated genes. Together, these data establish the foundation for a future RNA-based diagnostic in IVF.
Project description:RNA-seq data can be mined for sequence differences relative to the reference genome to identify both genomic SNPs and RNA editing events. We analyzed the long, polyA-selected, unstranded, deeply sequenced RNA-seq data from the ENCODE Project across 14 human cell lines for candidate RNA editing events. On average, 43% of the RNA sequencing variants that are not in dbSNP and are within gene boundaries are A-to-G(I) RNA editing candidates. The vast majority of A-to-G(I) edits are located in introns and 3' UTRs, with only 123 located in protein-coding sequence. In contrast, the majority of non-A-to-G variants (60%-80%) map near exon boundaries and have the characteristics of splice-mapping artifacts. After filtering out all candidates with evidence of private genomic variation using genome resequencing or ChIP-seq data, we find that up to 85% of the high-confidence RNA variants are A-to-G(I) editing candidates. Genes with A-to-G(I) edits are enriched in Gene Ontology terms involving cell division, viral defense, and translation. The distribution and character of the remaining non-A-to-G variants closely resemble known SNPs. We find no reproducible A-to-G(I) edits that result in nonsynonymous substitutions in all three lymphoblastoid cell lines in our study, unlike RNA editing in the brain. Given that only a fraction of sites are reproducibly edited in multiple cell lines and that we find a stronger association of editing and specific genes suggests that the editing of the transcript is more important than the editing of any individual site.