Dataset Information

Finding disease variants in Mendelian disorders by using sequence data: methods and applications.

ABSTRACT: Many sequencing studies are now underway to identify the genetic causes for both Mendelian and complex traits. Via exome-sequencing, genes harboring variants implicated in several Mendelian traits have already been identified. The underlying methodology in these studies is a multistep algorithm based on filtering variants identified in a small number of affected individuals and depends on whether they are novel (not yet seen in public resources such as dbSNP), shared among affected individuals, and other external functional information on the variants. Although intuitive, these filter-based methods are nonoptimal and do not provide any measure of statistical uncertainty. We describe here a formal statistical approach that has several distinct advantages: (1) it provides fast computation of approximate p values for individual genes, (2) it adjusts for the background variation in each gene, (3) it allows for incorporation of functional or linkage-based information, and (4) it accommodates designs based on both affected relative pairs and unrelated affected individuals. We show via simulations that the proposed approach can be used in conjunction with the existing filter-based methods to achieve a substantially better ranking of a gene relevant for disease when compared to currently used filter-based approaches, this is especially so in the presence of disease locus heterogeneity. We revisit recent studies on three Mendelian diseases and show that the proposed approach results in the implicated gene being ranked first in all studies, and approximate p values of 10(-6) for the Miller Syndrome gene, 1.0 × 10(-4) for the Freeman-Sheldon Syndrome gene, and 3.5 × 10(-5) for the Kabuki Syndrome gene.

SUBMITTER: Ionita-Laza I

PROVIDER: S-EPMC3234377 | biostudies-literature | 2011 Dec

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Finding disease variants in Mendelian disorders by using sequence data: methods and applications.

Ionita-Laza Iuliana I Makarov Vlad V Yoon Seungtai S Raby Benjamin B Buxbaum Joseph J Nicolae Dan L DL Lin Xihong X

American journal of human genetics 20111201 6

Many sequencing studies are now underway to identify the genetic causes for both Mendelian and complex traits. Via exome-sequencing, genes harboring variants implicated in several Mendelian traits have already been identified. The underlying methodology in these studies is a multistep algorithm based on filtering variants identified in a small number of affected individuals and depends on whether they are novel (not yet seen in public resources such as dbSNP), shared among affected individuals, ...[more]

PMID: 22137099

Similar Datasets

Project description:A population of human immunodeficiency virus (HIV) within a host often descends from a single transmitted/founder virus. The high mutation rate of HIV, coupled with long delays between infection and diagnosis, make isolating and characterizing this strain a challenge. In theory, ancestral reconstruction could be used to recover this strain from sequences sampled in chronic infection; however, the accuracy of phylogenetic techniques in this context is unknown. To evaluate the accuracy of these methods, we applied ancestral reconstruction to a large panel of published longitudinal clonal and/or single-genome-amplification HIV sequence data sets with at least one intrapatient sequence set sampled within 6 months of infection or seroconversion (n = 19,486 sequences, median [interquartile range] = 49 [20 to 86] sequences/set). The consensus of the earliest sequences was used as the best possible estimate of the transmitted/founder. These sequences were compared to ancestral reconstructions from sequences sampled at later time points using both phylogenetic and phylogeny-naive methods. Overall, phylogenetic methods conferred a 16% improvement in reproducing the consensus of early sequences, compared to phylogeny-naive methods. This relative advantage increased with intrapatient sequence diversity (P < 10(-5)) and the time elapsed between the earliest and subsequent samples (P < 10(-5)). However, neither approach performed well for reconstructing ancestral indel variation, especially within indel-rich regions of the HIV genome. Although further improvements are needed, our results indicate that phylogenetic methods for ancestral reconstruction significantly outperform phylogeny-naive alternatives, and we identify experimental conditions and study designs that can enhance accuracy of transmitted/founder virus reconstruction.When HIV is transmitted into a new host, most of the viruses fail to infect host cells. Consequently, an HIV infection tends to be descended from a single "founder" virus. A priority target for the vaccine research, these transmitted/founder viruses are difficult to isolate since newly infected individuals are often unaware of their status for months or years, by which time the virus population has evolved substantially. Here, we report on the potential use of evolutionary methods to reconstruct the genetic sequence of the transmitted/founder virus from its descendants at later stages of an infection. These methods can recover this ancestral sequence with an overall error rate of about 2.3%-about 15% more information than if we had ignored the evolutionary relationships among viruses. Although there is no substitute for sampling infections at earlier points in time, these methods can provide useful information about the genetic makeup of transmitted/founder HIV.

Dataset Information

Finding disease variants in Mendelian disorders by using sequence data: methods and applications.

Publications

Finding disease variants in Mendelian disorders by using sequence data: methods and applications.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets