Unknown,Transcriptomics,Genomics,Proteomics

Dataset Information

0

Preprocessed reduced representation bisulfite sequencing (RRBS) data from 173 human umbilical cord blood samples, collected for a study on associations between perinatal DNA methylation marks and progression to type 1 diabetes by age 15


ABSTRACT: The samples were collected from the participants of the Finnish Diabetes Prediction and Prevention (DIPP) Study, born between 1995 and 2006. DIPP is a prospective follow-up cohort of children with a moderate or high risk of type 1 diabetes, based on the HLA-DR-DQ genotype. Islet cell autoantibodies (ICA, GADA, IAA, IA2A and ZnT8A) were measured 1 - 4 times per year until age 15 or year 2018. The aim was to study associations between perinatal DNA methylation marks and later progression to type 1 diabetes. Case individuals who became persistently positive for at least two biochemical autoantibodies (GADA, IAA, IA2A or ZnT8A) and/or were diagnosed with type 1 diabetes during the follow-up were compared to the control individuals who remained autoantibody-negative throughout the follow-up. These data were also used in the development of data analysis methodology in bisulfite sequencing studies. To protect the privacy of the study participants, the sequence read data are not publicly available. However, the processed data can be downloaded here. These include two count matrices: \\"methylated_reads\\" and \\"total_reads\\". The matrix \\"methylated_reads\\" contains methylated read counts at each high-coverage CpG site (altogether approximately 2.5 million rows) at each of the 173 samples (173 columns), and the matrix \\"total_reads\\" contains the corresponding total read counts (coverage). Please notice that the methylated read counts are read counts, not percentages. Methylation proportions can be calculated as methylated_reads/total_reads. The row names are the genomic locations of these CpG sites in hg19 (GRCh37) coordinates (1,2). For privacy reasons, all potential SNPs were excluded from these publicly available count matrices. Specifically, we removed all common (minor allele frequency > 1 %) human SNPs, as listed in dbSNP (3). We also removed all SNPs that were detected in one or more samples even with \\"low\\" evidence by BS-SNPer, which is a software for detecting SNPs from bisulfite sequencing data (4). Altogether 204443 out of 2752981 rows were removed from the original coverage-filtered count matrices that were analyzed in the present study. Description of the sample attributes: Individual: The individual-specific identifiers, such as “Subject1”. Since each sample is from a different individual, these correspond to the sample identifiers (Subject1 == Sample 1 etc.) Experimental Group: The variable of interest (called \\"class\\" in the associated publications) with three possible values: 1) case, 2) control and 3) NA (neither case nor control). 1) Case: became persistently positive for at least two biochemical autoantibodies (GADA, IAA, IA2A or ZnT8A) and/or diagnosed with type 1 diabetes during the follow-up. 2) Control: remained autoantibody-negative throughout the follow-up. 3) NA: The remaining 51 individuals with a missing value (“NA”) did not qualify as cases or controls, since they were either persistently positive for only 1 biochemical autoantibody or transiently positive for one or more autoantibodies. We excluded these 51 individuals from the case-control-comparison but included them in the comparison between the sexes. Library preparation batch: The sequencing libraries were prepared in 7 batches. The names of the batches do not have any special meaning. That is, \\"1A\\" is not necessarily more similar to \\"1B\\" than it is to \\"3B\\". We treated this as a categorical technical variable with 7 categories. PC1 and PC2: Projections of the sample-specific methylation proportion vectors on the first two orthonormal principal components. The principal component analysis (PCA) was performed on the original coverage-filtered methylation proportion matrix (methylated/total reads), where missing values at each CpG site were imputed by the median over samples with non-missing values. The original methylation proportion matrix included 2752981 rows, whereas these publicly available matrices include 2548538 rows (all potential SNPs excluded). Hence, PCA on the publicly available data would result in slightly different values for PC1 and PC2. We included these as covariates in the differential methylation analysis to represent technical variation (in addition to the library preparation batches). References 1. Church DM, Schneider VA, Graves T, Auger K, Cunningham F, Bouk N, et al. Modernizing reference genome assemblies. PLoS Biol. 2011 Jul;9(7):e1001091. 2. Genome Reference Consortium. NCBI downloads: https://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz, accessed Feb 10th, 2019 3. NCBI. dbSNP: https://ftp.ncbi.nih.gov/snp/organisms/human_9606_b151_GRCh37p13/VCF/common_all_20180423.vcf.gz, accessed April 29th, 2021 4. Gao S, Zou D, Mao L, Liu H, Song P, Chen Y, et al. BS-SNPer: SNP calling in bisulfite-seq data. Bioinformatics. 2015 Dec 15;31(24):4006–8.

INSTRUMENT(S): Illumina HiSeq 2500

ORGANISM(S): Homo sapiens

SUBMITTER: Essi Laajala 

PROVIDER: E-MTAB-10530 | biostudies-arrayexpress |

REPOSITORIES: biostudies-arrayexpress

Similar Datasets

2014-02-20 | E-GEOD-30208 | biostudies-arrayexpress
2014-02-20 | E-GEOD-30209 | biostudies-arrayexpress
2014-02-20 | E-GEOD-30210 | biostudies-arrayexpress
2014-02-20 | E-GEOD-43488 | biostudies-arrayexpress
2014-07-30 | GSE50866 | GEO
2015-09-01 | E-GEOD-60657 | biostudies-arrayexpress
2014-05-09 | E-GEOD-57471 | biostudies-arrayexpress
2010-03-31 | E-TABM-666 | biostudies-arrayexpress
2015-07-28 | E-GEOD-70317 | biostudies-arrayexpress
2016-03-30 | E-GEOD-66732 | biostudies-arrayexpress