Project description:Low-pass sequencing (sequencing a genome to an average depth less than 1× coverage) combined with genotype imputation has been proposed as an alternative to genotyping arrays for trait mapping and calculation of polygenic scores. To empirically assess the relative performance of these technologies for different applications, we performed low-pass sequencing (targeting coverage levels of 0.5× and 1×) and array genotyping (using the Illumina Global Screening Array (GSA)) on 120 DNA samples derived from African and European-ancestry individuals that are part of the 1000 Genomes Project. We then imputed both the sequencing data and the genotyping array data to the 1000 Genomes Phase 3 haplotype reference panel using a leave- one-out design. We evaluated overall imputation accuracy from these different assays as well as overall power for GWAS from imputed data, and computed polygenic risk scores for coronary artery disease and breast cancer using previously derived weights. We conclude that low-pass sequencing plus imputation, in addition to providing a substantial increase in statistical power for genome wide association studies, provides increased accuracy for polygenic risk prediction at effective coverages of ∼ 0.5× and higher compared to the Illumina GSA.
Project description:Missing values in proteomic data sets have real consequences on downstream data analysis and reproducibility. Although several imputation methods exist to handle missing values, no single imputation method is best suited for a diverse range of data sets, and no clear strategy exists for evaluating imputation methods for large-scale DIA-MS data sets, especially at different levels of protein quantification. To navigate through the different imputation strategies available in the literature, we have established a workflow to assess imputation methods on large-scale label-free DIA-MS data sets. We used three DIA-MS data sets with real missing values to evaluate eight different imputation methods with multiple parameters at different levels of protein quantification; dilution series data set, a small pilot data set, and a larger proteomic data set.
Project description:Missing values in proteomic data sets have real consequences on downstream data analysis and reproducibility. Although several imputation methods exist to handle missing values, no single imputation method is best suited for a diverse range of data sets, and no clear strategy exists for evaluating imputation methods for large-scale DIA-MS data sets, especially at different levels of protein quantification. To navigate through the different imputation strategies available in the literature, we have established a workflow to assess imputation methods on large-scale label-free DIA-MS data sets. We used three DIA-MS data sets with real missing values to evaluate eight different imputation methods with multiple parameters at different levels of protein quantification; dilution series data set, a small pilot data set, and a larger proteomic data set.
Project description:Missing values in proteomic data sets have real consequences on downstream data analysis and reproducibility. Although several imputation methods exist to handle missing values, no single imputation method is best suited for a diverse range of data sets, and no clear strategy exists for evaluating imputation methods for large-scale DIA-MS data sets, especially at different levels of protein quantification. To navigate through the different imputation strategies available in the literature, we have established a workflow to assess imputation methods on large-scale label-free DIA-MS data sets. We used three DIA-MS data sets with real missing values to evaluate eight different imputation methods with multiple parameters at different levels of protein quantification; dilution series data set, a small pilot data set, and a larger proteomic data set of clinical ovarian cancer patient samples.
Project description:Popular rice mega varieties lack sufficient key micronutrients (e.g., Fe, Zn), vitamins and a balanced amino acid composition that are essential for a healthy diet. The major bottleneck for improving the nutritional quality of popular rice varieties through conventional breeding or gene technology is our lack of an integrated understanding of the biochemical and molecular processes that occur during rice grain filling (and their determining genes or loci). In this project, we will perform molecular expression profiling on specific tissue layers of the rice grain. To perform this experiment, the material will be developing rice seeds from plants grown hydroponically under controlled greenhouse conditions. Then, the laser microdissection approach will be applied to dissect different parts of the grain (i.e, vascular trace, aleurone, nucellar epidermis, etc). Total RNA will be extracted from these dissected parts and RNA sequencing will be performed. In this project, we will learn how the synthesis and deposition of grain nutrients is regulated, particularly, during grain filling.
Project description:In this project we are investigating the mechanism of drought tolerance in rice at early vegetative stage by looking into expression profile of DEGs and uniquely expressed genes