Dataset Information

The impact of outgroup choice and missing data on major seed plant phylogenetics using genome-wide EST data.

ABSTRACT:

Background

Genome level analyses have enhanced our view of phylogenetics in many areas of the tree of life. With the production of whole genome DNA sequences of hundreds of organisms and large-scale EST databases a large number of candidate genes for inclusion into phylogenetic analysis have become available. In this work, we exploit the burgeoning genomic data being generated for plant genomes to address one of the more important plant phylogenetic questions concerning the hierarchical relationships of the several major seed plant lineages (angiosperms, Cycadales, Gingkoales, Gnetales, and Coniferales), which continues to be a work in progress, despite numerous studies using single, few or several genes and morphology datasets. Although most recent studies support the notion that gymnosperms and angiosperms are monophyletic and sister groups, they differ on the topological arrangements within each major group.

Methodology

We exploited the EST database to construct a supermatrix of DNA sequences (over 1,200 concatenated orthologous gene partitions for 17 taxa) to examine non-flowering seed plant relationships. This analysis employed programs that offer rapid and robust orthology determination of novel, short sequences from plant ESTs based on reference seed plant genomes. Our phylogenetic analysis retrieved an unbiased (with respect to gene choice), well-resolved and highly supported phylogenetic hypothesis that was robust to various outgroup combinations.

Conclusions

We evaluated character support and the relative contribution of numerous variables (e.g. gene number, missing data, partitioning schemes, taxon sampling and outgroup choice) on tree topology, stability and support metrics. Our results indicate that while missing characters and order of addition of genes to an analysis do not influence branch support, inadequate taxon sampling and limited choice of outgroup(s) can lead to spurious inference of phylogeny when dealing with phylogenomic scale data sets. As expected, support and resolution increases significantly as more informative characters are added, until reaching a threshold, beyond which support metrics stabilize, and the effect of adding conflicting characters is minimized.

SUBMITTER: de la Torre-Barcena JE

PROVIDER: S-EPMC2685480 | biostudies-literature | 2009 Jun

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

The impact of outgroup choice and missing data on major seed plant phylogenetics using genome-wide EST data.

de la Torre-Bárcena Jose Eduardo JE Kolokotronis Sergios-Orestis SO Lee Ernest K EK Stevenson Dennis Wm DW Brenner Eric D ED Katari Manpreet S MS Coruzzi Gloria M GM DeSalle Rob R

PloS one 20090602 6

<h4>Background</h4>Genome level analyses have enhanced our view of phylogenetics in many areas of the tree of life. With the production of whole genome DNA sequences of hundreds of organisms and large-scale EST databases a large number of candidate genes for inclusion into phylogenetic analysis have become available. In this work, we exploit the burgeoning genomic data being generated for plant genomes to address one of the more important plant phylogenetic questions concerning the hierarchical ...[more]

PMID: 19503618

Similar Datasets

Project description:Resolving deep divergences in the tree of life is challenging even for analyses of genome-scale phylogenetic data sets. Relationships between Basidiomycota subphyla, the rusts and allies (Pucciniomycotina), smuts and allies (Ustilaginomycotina), and mushroom-forming fungi and allies (Agaricomycotina) were found particularly recalcitrant both to traditional multigene and genome-scale phylogenetics. Here, we address basal Basidiomycota relationships using concatenated and gene tree-based analyses of various phylogenomic data sets to examine the contribution of several potential sources of bias. We evaluate the contribution of biological causes (hard polytomy, incomplete lineage sorting) versus unmodeled evolutionary processes and factors that exacerbate their effects (e.g., fast-evolving sites and long-branch taxa) to inferences of basal Basidiomycota relationships. Bayesian Markov Chain Monte Carlo and likelihood mapping analyses reject the hard polytomy with confidence. In concatenated analyses, fast-evolving sites and oversimplified models of amino acid substitution favored the grouping of smuts with mushroom-forming fungi, often leading to maximal bootstrap support in both concatenation and coalescent analyses. On the contrary, the most conserved data subsets grouped rusts and allies with mushroom-forming fungi, although this relationship proved labile, sensitive to model choice, to different data subsets and to missing data. Excluding putative long-branch taxa, genes with high proportions of missing data and/or with strong signal failed to reveal a consistent trend toward one or the other topology, suggesting that additional sources of conflict are at play. While concatenated analyses yielded strong but conflicting support, individual gene trees mostly provided poor support for any resolution of rusts, smuts, and mushroom-forming fungi, suggesting that the true Basidiomycota tree might be in a part of tree space that is difficult to access using both concatenation and gene tree-based approaches. Inference-based assessments of absolute model fit strongly reject best-fit models for the vast majority of genes, indicating a poor fit of even the most commonly used models. While this is consistent with previous assessments of site-homogenous models of amino acid evolution, this does not appear to be the sole source of confounding signal. Our analyses suggest that topologies uniting smuts with mushroom-forming fungi can arise as a result of inappropriate modeling of amino acid sites that might be prone to systematic bias. We speculate that improved models of sequence evolution could shed more light on basal splits in the Basidiomycota, which, for now, remain unresolved despite the use of whole genome data.

Project description:BackgroundBreeding programs benefit from information about marker-trait associations for many traits, whether the goal is to place those traits under active selection or to maintain them through background selection. Association studies are also important for identifying accessions bearing potentially useful alleles by characterizing marker-trait associations and allelic states across germplasm collections. This study reports the results of a genome-wide association study and evaluation of epistatic interactions for four agronomic and seed-related traits in soybean.ResultsUsing 419 diverse soybean accessions, together with genotyping data from the SoySNP50K Illumina Infinium BeadChip, we identified marker-trait associations for internode number (IN), plant height (PH), seed weight (SW), and seed yield per plant (SYP). We conducted a genome-wide epistatic study (GWES), identifying candidate genes that show evidence of SNP-SNP interactions. Although these candidate genes will require further experimental validation, several appear to be involved in developmental processes related to the respective traits. For IN and PH, these include the Dt1 determinacy locus (a soybean meristematic transcription factor), as well as a pectinesterase gene and a squamosa promoter binding gene that in other plants are involved in cell elongation and the vegetative-to-reproductive transition, respectively. For SW, candidate genes include an ortholog of the AP2 gene, which in other species is involved in maintaining seed size, embryo size, seed weight and seed yield. Another SW candidate gene is a histidine phosphotransfer protein - orthologs of which are involved in cytokinin-mediated seed weight regulating pathways. The SYP association loci overlap with regions reported in previous QTL studies to be involved in seed yield.ConclusionsThis study further confirms the utility of GWAS and GWES approaches for identifying marker-trait associations and interactions within a diverse germplasm collection.

Dataset Information

The impact of outgroup choice and missing data on major seed plant phylogenetics using genome-wide EST data.

Background

Methodology

Conclusions

Publications

The impact of outgroup choice and missing data on major seed plant phylogenetics using genome-wide EST data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets