Project description:As whole genome sequence (WGS) data sets have become abundant and widely available, so has the need for variant detection and scoring. The aim of this study was to compare the accuracy of commonly used variant calling programs, Freebayes and GATK HaplotypeCaller (GATK-HC), and to use U.S. sheep WGS data sets to identify novel breed-associated SNPs. Sequence data from 145 sheep consisting of 14 U.S. breeds were filtered and biallelic single nucleotide polymorphisms (SNPs) were retained for genotyping analyses. Genotypes from both programs were compared to each other and to genotypes from bead arrays. The SNPs from WGS were compared to the bead array data with breed heterozygosity, principal component analysis and identifying breed associated SNPs to analyze genetic diversity. The average sequence read depth was 2.78 reads greater with 6.11% more SNPs being identified in Freebayes compared to GATK-HC. The genotype concordance of the variant callers to bead array data was 96.0% and 95.5% for Freebayes and GATK-HC, respectively. Genotyping with WGS identified 10.5 million SNPs from all 145 sheep. This resulted in an 8% increase in measured heterozygosity and greater breed separation in the principal component analysis compared to the bead array analysis. There were 1,849 SNPs identified in only the Romanov sheep where all 10 rams were homozygous for one allele and the remaining 135 sheep from 13 breeds were homozygous for the opposite allele. Both variant calling programs had greater than 95% concordance of SNPs with bead array data, and either was suitably accurate for ovine WGS data sets. The use of WGS SNPs improved the resolution of PCA analysis and was critical for identifying Romanov breed-associated SNPs. Subsets of such SNPs could be used to estimate germplasm composition in animals without pedigree information.
Project description:Structural variations (SVs) play important roles in human evolution and diseases, but there is a lack of data resources concerning representative samples, especially for East Asians. Taking advantage of both next-generation sequencing and third-generation sequencing data at the whole-genome level, we developed the database PGG.SV to provide a practical platform for both regionally and globally representative structural variants. In its current version, PGG.SV archives 584 277 SVs obtained from whole-genome sequencing data of 6048 samples, including 1030 long-read sequencing genomes representing 177 global populations. PGG.SV provides (i) high-quality SVs with fine-scale and precise genomic locations in both GRCh37 and GRCh38, covering underrepresented SVs in existing sequencing and microarray data; (ii) hierarchical estimation of SV prevalence in geographical populations; (iii) informative annotations of SV-related genes, potential functions and clinical effects; (iv) an analysis platform to facilitate SV-based case-control association studies and (v) various visualization tools for understanding the SV structures in the human genome. Taken together, PGG.SV provides a user-friendly online interface, easy-to-use analysis tools and a detailed presentation of results. PGG.SV is freely accessible via https://www.biosino.org/pggsv.
Project description:Listeria monocytogenes is an opportunistic foodborne pathogen responsible for listeriosis, the third most common foodborne disease. Many different Listeria strains and seroptypes exist, however a proteogenomic resource which would provide a basis for bridging the gap in the molecular understanding between the Listeria genotype and phenotypes via proteotypes is still missing. Here we devised a next-generation proteogenomics strategy which enables the community now to rapidly proteotype Listeria strains and relate the information back to the genotype. Based on sequencing and de novo assembly of the two most commonly used Listeria strain model systems, EGD-e and ScottA, we established a comprehensive Listeria proteogenomic database. A genome comparison established core and strain-specific genes with potential relevance for virulence differences. Next we established a DIA/SWATH-based proteotyping strategy, including a new and robust sample preparation workflow, enabling the reproducible, sensitive and relative quantitative measurement of Listeria proteotypes. This re-usable DIA/SWATH library and new public resource covers 70% of the potentially expressed ORFs of Listeria and represents the most extensive spectral library for Listeria proteotype analysis to date. We used these two new resources to investigate the Listeria proteotype in three states mimicking the upper gastrointestinal passage. Exposure of Listeria to bile salts at 37 °C, mimicking conditions encountered in the duodenum, showed significant proteotype perturbations including an increase of FlaA, the structural protein of flagella. Given that Listeria is known to lose its flagella above 30 °C, this was an unexpected finding. The formation of flagella, which might have implications within the infectivity cycle, was validated by parallel reaction monitoring, light and scanning electron microscopy. QPCR data of flaA transcripts showed no significant differences suggesting a regulation at the post-transcriptional level. Together, we provide a comprehensive proteogenomic resource and toolbox for the Listeria community enabling the analysis of Listeria genotype-proteotype-phenotype relationships.
Project description:Listeria monocytogenes is an opportunistic foodborne pathogen responsible for listeriosis, a potentially fatal foodborne disease. Many different Listeria strains and serotypes exist, buthowever a proteogenomic resource that which would allow to bridges the gap in ourthe molecular understanding of the relationships between the Listeria genotypes and phenotypes via proteotypes is still missing. Here we devised a next-generation proteogenomics strategy that enables the community now to rapidly proteotype Listeria strains and relate this information back to the genotype. Based on sequencing and de novo assembly of the two most commonly used Listeria model strains, EGD-e and ScottA, we established two comprehensive Listeria proteogenomic databases. A genome comparison established core- and strain-specific genes with potentially responsible relevance for virulence differences. Next, we established a DIA/SWATH-based proteotyping strategy, including a new and robust sample preparation workflow, that enablesing the reproducible, sensitive, and relatively quantitative measurement of Listeria proteotypes. This re-usable and publically available DIA/SWATH library and new public resource covers 70% of the potentially expressed open reading frames (ORFs) of Listeria and represents the most extensive spectral library for Listeria proteotype analysis to date. We used these two new resources to investigate the Listeria proteotype in three states mimicking the upper gastrointestinal passage. Exposure of Listeria to bile salts at 37 oC, which simulatesmimicking conditions encountered in the duodenum, showed significant proteotype perturbations including an increase of FlaA, the structural protein of flagella. Given that Listeria is known to lose its flagella above 30 oC, this was an unexpected finding. The formation of flagella, which might have implications onwithin the infectivity cycle, was validated by parallel reaction monitoring and, light and scanning electron microscopiesy. Q-PCR data of flaA transcripts levels were not impacted showed no significantly differentces with and without exposure to conditions mimicking the duodenum, suggesting a regulation at the post-transcriptional level. Together, these analyseswe provide a comprehensive proteogenomic resource and toolbox for the Listeria community enabling the analysis of Listeria genotype-proteotype-phenotype relationships.
Project description:Sharing of research data in public repositories has become best practice in academia. With the accumulation of massive data, network bandwidth and storage requirements are rapidly increasing. The ProteomeXchange (PX) consortium implements a mode of centralized metadata and distributed raw data management, which promotes effective data sharing. To facilitate open access of proteome data worldwide, we have developed the integrated proteome resource iProX (http://www.iprox.org) as a public platform for collecting and sharing raw data, analysis results and metadata obtained from proteomics experiments. The iProX repository employs a web-based proteome data submission process and open sharing of mass spectrometry-based proteomics datasets. Also, it deploys extensive controlled vocabularies and ontologies to annotate proteomics datasets. Users can use a GUI to provide and access data through a fast Aspera-based transfer tool. iProX is a full member of the PX consortium; all released datasets are freely accessible to the public. iProX is based on a high availability architecture and has been deployed as part of the proteomics infrastructure of China, ensuring long-term and stable resource support. iProX will facilitate worldwide data analysis and sharing of proteomics experiments.
Project description:BackgroundThe introduction of high-throughput genome sequencing and post-genome analysis technologies, e.g. DNA microarray approaches, has created the potential to unravel and scrutinize complex gene-regulatory networks on a large scale. The discovery of transcriptional regulatory interactions has become a major topic in modern functional genomics.ResultsTo facilitate the analysis of gene-regulatory networks, we have developed CoryneCenter, a web-based resource for the systematic integration and analysis of genome, transcriptome, and gene regulatory information for prokaryotes, especially corynebacteria. For this purpose, we extended and combined the following systems into a common platform: (1) GenDB, an open source genome annotation system, (2) EMMA, a MAGE compliant application for high-throughput transcriptome data storage and analysis, and (3) CoryneRegNet, an ontology-based data warehouse designed to facilitate the reconstruction and analysis of gene regulatory interactions. We demonstrate the potential of CoryneCenter by means of an application example. Using microarray hybridization data, we compare the gene expression of Corynebacterium glutamicum under acetate and glucose feeding conditions: Known regulatory networks are confirmed, but moreover CoryneCenter points out additional regulatory interactions.ConclusionCoryneCenter provides more than the sum of its parts. Its novel analysis and visualization features significantly simplify the process of obtaining new biological insights into complex regulatory systems. Although the platform currently focusses on corynebacteria, the integrated tools are by no means restricted to these species, and the presented approach offers a general strategy for the analysis and verification of gene regulatory networks. CoryneCenter provides freely accessible projects with the underlying genome annotation, gene expression, and gene regulation data. The system is publicly available at http://www.CoryneCenter.de.
Project description:Polycystic ovary syndrome (PCOS) is a fertility disorder affecting 5-7% of reproductive-aged women. Women with PCOS manifest both reproductive and metabolic defects. Several animal models have evolved, which implicate excess steroid exposure during fetal life in the development of the PCOS phenotype. This review addresses the fetal and adult reproductive and metabolic consequences of prenatal steroid excess in sheep and the translational relevance of these findings to PCOS. By comparing findings in various breeds of sheep, the review targets the role of genetic susceptibility to fetal insults. Disruptions induced by prenatal testosterone excess are evident at both the reproductive and metabolic level with each influencing the other thus creating a self-perpetuating vicious cycle. The review highlights the need for identifying a common mediator of the dysfunctions at the reproductive and metabolic levels and developing prevention and treatment interventions targeting all sites of disruption in unison for achieving optimal success.
Project description:ImportanceGenetic disorders are historically defined through phenotype-first approaches. However, risk estimates derived from phenotype-linked ascertainment may overestimate severity and penetrance. Pathogenic variants in DICER1 are associated with increased risks of rare and common neoplasms and thyroid disease in adults and children. This study explored how effectively a genome-first approach could characterize the clinical traits associated with germline DICER1 putative loss-of-function (pLOF) variants in an unselected clinical cohort.ObjectiveTo examine the prevalence, penetrance, and phenotypic characteristics of carriers of germline DICER1 pLOF variants via genome-first ascertainment.Design, setting, and participantsThis cohort study classifies DICER1 variants in germline exome sequence data from 92 296 participants of the Geisinger MyCode Community Health Initiative. Data for each MyCode participant were used from the start of the Geisinger electronic health record to February 1, 2018.Main outcomes and measuresPrevalence of germline DICER1 variation; penetrance of malignant tumors and thyroid disease in carriers of germline DICER1 variation; structured, manual review of electronic health records; and DICER1 sequencing of available tumors from an associated cancer registry.ResultsA total of 92?296 adults (mean [SD] age, 59 [18] years; 98% white; 60% female) participated in the study. Germline DICER1 pLOF variants were observed in 1 in 3700?to 1 in 4600 participants, more than double the expected prevalence. Malignant tumors (primarily thyroid carcinoma) were observed in 4 of 25 participants (16%) with DICER1 pLOF variants, which is comparable (by 50 years of age) to the frequency of neoplasms in the largest registry- and clinic-based (phenotype-first) DICER1 studies published to date. DICER1 pLOF variants were significantly associated with risks of thyroidectomy (odds ratio [OR], 6.0; 95% CI, 2.2-16.3; P?=?.007) and thyroid cancer (OR, 9.2; 95% CI, 2.1-34.7; P?=?.02) compared with controls, but there was not a significant increase in the risk of goiter (OR, 1.8; 95% CI, 0.7-4.9). A female patient in her 80s who was a carrier of a germline DICER1 hotspot variant was apparently healthy on electronic health record review. The term DICER1 did not appear in any of the medical records of the 25 participants with a pLOF DICER1 variant, even in those affected with a known DICER1-associated tumor or thyroid phenotype.Conclusions and relevanceThis cohort study was able to ascertain individuals with germline DICER1 variants based on a genome-first approach rather than through a previously established DICER1-related phenotype. Use of the genome-first approach may complement more traditional approaches to syndrome delineation and may be an efficient approach for risk estimation.
Project description:Advances in modern sequencing technologies allow us to generate sufficient data to analyze hundreds of bacterial genomes from a single machine in a single day. This potential for sequencing massive numbers of genomes calls for fully automated methods to produce high-quality assemblies and variant calls. We introduce Pilon, a fully automated, all-in-one tool for correcting draft assemblies and calling sequence variants of multiple sizes, including very large insertions and deletions. Pilon works with many types of sequence data, but is particularly strong when supplied with paired end data from two Illumina libraries with small e.g., 180 bp and large e.g., 3-5 Kb inserts. Pilon significantly improves draft genome assemblies by correcting bases, fixing mis-assemblies and filling gaps. For both haploid and diploid genomes, Pilon produces more contiguous genomes with fewer errors, enabling identification of more biologically relevant genes. Furthermore, Pilon identifies small variants with high accuracy as compared to state-of-the-art tools and is unique in its ability to accurately identify large sequence variants including duplications and resolve large insertions. Pilon is being used to improve the assemblies of thousands of new genomes and to identify variants from thousands of clinically relevant bacterial strains. Pilon is freely available as open source software.
Project description:Some sheep breeds are naturally prolific, and they are very informative for the studies of reproductive genetics and physiology. Major genes increasing litter size (LS) and ovulation rate (OR) were suspected in the French Grivette and the Polish Olkuska sheep populations, respectively. To identify genetic variants responsible for the highly prolific phenotype in these two breeds, genome-wide association studies (GWAS) followed by complementary genetic and functional analyses were performed. Highly prolific ewes (cases) and normal prolific ewes (controls) from each breed were genotyped using the Illumina OvineSNP50 Genotyping Beadchip. In both populations, an X chromosome region, close to the BMP15 gene, harbored clusters of markers with suggestive evidence of association at significance levels between 1E(-05) and 1E(-07). The BMP15 candidate gene was then sequenced, and two novel non-conservative mutations called FecX(Gr) and FecX(O) were identified in the Grivette and Olkuska breeds, respectively. The two mutations were associated with the highly prolific phenotype (p FecX (Gr) = 5.98E(-06) and p FecX(O) = 2.55E(-08)). Homozygous ewes for the mutated allele showed a significantly increased prolificacy (FecX(Gr)/FecX(Gr), LS = 2.50 ± 0.65 versus FecX(+)/FecX(Gr), LS = 1.93 ± 0.42, p<1E(-03) and FecX(O)/FecX(O), OR = 3.28 ± 0.85 versus FecX(+)/FecX(O), OR = 2.02 ± 0.47, p<1E(-03)). Both mutations are located in very well conserved motifs of the protein and altered the BMP15 signaling activity in vitro using a BMP-responsive luciferase test in COV434 granulosa cells. Thus, we have identified two novel mutations in the BMP15 gene associated with increased LS and OR. Notably, homozygous FecX(Gr)/FecX(Gr) Grivette and homozygous FecX(O)/FecX(O) Olkuska ewes are hyperprolific in striking contrast with the sterility exhibited by all other known homozygous BMP15 mutations. Our results bring new insights into the key role played by the BMP15 protein in ovarian function and could contribute to a better understanding of the pathogenesis of women's fertility disorders.