Project description:The non-tumourigenic human breast epithelial cell line MCF10A is the cell line most commonly used as a model for normal human breast cells. This dataset provides a reference genome for MCF10A. The whole genome, high-throughput sequencing was performed using the Illumina NovaSeq 6000 PE150 system. Both NGS and bioinformatic analysis were performed by Novogene (UK).
Project description:Ehrlichia chaffeensis is an obligatory intracellular organism, and causes human monocytic ehrlichiosis (HME), a tick-borne illness. Clinical signs of HME vary from asymptomatic infection to severe mobility that requires hospitalization or death. After whole genome sequencing of this bacterium, 425 open reading frames (38%) of 1115 in the genome were found as hypothetical gene with unknown function. A little is known about pathogenicity gene(s) and strain variation of this bacterium to date. Previously several E. chaffeensis strains were shown to be different based on comparison of limited gene sequences (p120, OMP-1[p28]s, and VLPT). To elucidate strain genetic variations and associated virulency, we carried out comparative genome hybridization (CGH) of E. chaffeensis Arkansas, Wakulla and Liberty strains, since these strains are most distinct in the previous studies. The tiling array containing approximately 300,000 probes of 29 nt spaced as dense as 8 bp on both strands of the genome of E. chaffeensis Arkansas was used. Keywords: Comparative genome hybridization
Project description:We used Candida albicans lab strain SC5314 to obtain tunicamycin adaptors. We did whole genome sequencing of the adaptors and the parent as well.
Project description:Although a standard genome-wide significance level has been accepted for the testing of association between common genetic variants and disease, the era of whole-genome sequencing (WGS) requires a new threshold. The allele frequency spectrum of sequence-identified variants is very different from common variants, and the identified rare genetic variation is usually jointly analyzed in a series of genomic windows or regions. In nearby or overlapping windows, these test statistics will be correlated, and the degree of correlation is likely to depend on the choice of window size, overlap, and the test statistic. Furthermore, multiple analyses may be performed using different windows or test statistics. Here we propose an empirical approach for estimating genome-wide significance thresholds for data arising from WGS studies, and we demonstrate that the empirical threshold can be efficiently estimated by extrapolating from calculations performed on a small genomic region. Because analysis of WGS may need to be repeated with different choices of test statistics or windows, this prediction approach makes it computationally feasible to estimate genome-wide significance thresholds for different analysis choices. Based on UK10K whole-genome sequence data, we derive genome-wide significance thresholds ranging between 2.5 × 10(-8) and 8 × 10(-8) for our analytic choices in window-based testing, and thresholds of 0.6 × 10(-8) -1.5 × 10(-8) for a combined analytic strategy of testing common variants using single-SNP tests together with rare variants analyzed with our sliding-window test strategy.