Project description:<p>We have built a new resource for imputation of SNPs for existing and future genome-wide association studies (GWAS), known as the Division of Cancer Epidemiology and Genetics (DCEG) Reference Set. The first build of the data set includes 728 cancer-free individuals of European descent from three large prospectively sampled studies, 98 African-American individuals from the Prostate, Lung, Colon, and Ovary Cancer Screening Trial (PLCO), 74 Chinese individuals from a Chinese clinical trial in Shanxi, China (SHNX), and 349 unrelated individuals from the HapMap Project (see Molecular Data Section for details on arrays used). The final harmonized dataset includes 2.8 million autosomal polymorphic SNPs on 1,249 subjects after rigorous quality control metrics were applied.</p>
Project description:Missing values in proteomic data sets have real consequences on downstream data analysis and reproducibility. Although several imputation methods exist to handle missing values, no single imputation method is best suited for a diverse range of data sets, and no clear strategy exists for evaluating imputation methods for large-scale DIA-MS data sets, especially at different levels of protein quantification. To navigate through the different imputation strategies available in the literature, we have established a workflow to assess imputation methods on large-scale label-free DIA-MS data sets. We used three DIA-MS data sets with real missing values to evaluate eight different imputation methods with multiple parameters at different levels of protein quantification; dilution series data set, a small pilot data set, and a larger proteomic data set.
Project description:Reference dataset used for the development of the EnCOUNTEr tool to both (i) score all characterized peptides using discriminant parameters to evidence mature protein N-termini and (ii) determine the N-terminus acetylation yield for the most reliable ones.
Project description:Missing values in proteomic data sets have real consequences on downstream data analysis and reproducibility. Although several imputation methods exist to handle missing values, no single imputation method is best suited for a diverse range of data sets, and no clear strategy exists for evaluating imputation methods for large-scale DIA-MS data sets, especially at different levels of protein quantification. To navigate through the different imputation strategies available in the literature, we have established a workflow to assess imputation methods on large-scale label-free DIA-MS data sets. We used three DIA-MS data sets with real missing values to evaluate eight different imputation methods with multiple parameters at different levels of protein quantification; dilution series data set, a small pilot data set, and a larger proteomic data set.
Project description:Missing values in proteomic data sets have real consequences on downstream data analysis and reproducibility. Although several imputation methods exist to handle missing values, no single imputation method is best suited for a diverse range of data sets, and no clear strategy exists for evaluating imputation methods for large-scale DIA-MS data sets, especially at different levels of protein quantification. To navigate through the different imputation strategies available in the literature, we have established a workflow to assess imputation methods on large-scale label-free DIA-MS data sets. We used three DIA-MS data sets with real missing values to evaluate eight different imputation methods with multiple parameters at different levels of protein quantification; dilution series data set, a small pilot data set, and a larger proteomic data set of clinical ovarian cancer patient samples.
Project description:The MMRC reference collection is a dataset of gene expression profiling, array comparative genomic hybridization, and re-sequencing created as a resource for the Multiple Myeloma research community.