Dataset Information

Improving phylogenetic analyses by incorporating additional information from genetic sequence databases.

ABSTRACT: Statistical analyses of phylogenetic data culminate in uncertain estimates of underlying model parameters. Lack of additional data hinders the ability to reduce this uncertainty, as the original phylogenetic dataset is often complete, containing the entire gene or genome information available for the given set of taxa. Informative priors in a Bayesian analysis can reduce posterior uncertainty; however, publicly available phylogenetic software specifies vague priors for model parameters by default. We build objective and informative priors using hierarchical random effect models that combine additional datasets whose parameters are not of direct interest but are similar to the analysis of interest.We propose principled statistical methods that permit more precise parameter estimates in phylogenetic analyses by creating informative priors for parameters of interest. Using additional sequence datasets from our lab or public databases, we construct a fully Bayesian semiparametric hierarchical model to combine datasets. A dynamic iteratively reweighted Markov chain Monte Carlo algorithm conveniently recycles posterior samples from the individual analyses. We demonstrate the value of our approach by examining the insertion-deletion (indel) process in the enolase gene across the Tree of Life using the phylogenetic software BALI-PHY; we incorporate prior information about indels from 82 curated alignments downloaded from the BAliBASE database.

SUBMITTER: Liang LJ

PROVIDER: S-EPMC2800350 | biostudies-literature | 2009 Oct

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Improving phylogenetic analyses by incorporating additional information from genetic sequence databases.

Liang Li-Jung LJ Weiss Robert E RE Redelings Benjamin B Suchard Marc A MA

Bioinformatics (Oxford, England) 20090806 19

<h4>Motivation</h4>Statistical analyses of phylogenetic data culminate in uncertain estimates of underlying model parameters. Lack of additional data hinders the ability to reduce this uncertainty, as the original phylogenetic dataset is often complete, containing the entire gene or genome information available for the given set of taxa. Informative priors in a Bayesian analysis can reduce posterior uncertainty; however, publicly available phylogenetic software specifies vague priors for model p ...[more]

PMID: 19661240

Similar Datasets

Project description:Common vetch is one of the most profitable forage legumes due to its versatility in end-use which includes grain, hay, green manure, and silage. Furthermore, common vetch is one of the best crops to rotate with cereals as it can increase soil fertility which results in higher yield in cereal crops. The National Vetch Breeding Program located in South Australia is focused on developing new vetch varieties with higher grain and dry matter yields, better resistance to major diseases, and wider adaptability to Australian cropping environments. As part of this program, a study was conducted with 35 field trials from 2015 to 2021 in South Australia, Western Australia, Victoria, and New South Wales with the objective of determining the best parents for future crosses and the vetch lines with highest commercial value in terms of grain yield production. A total of 392 varieties were evaluated. The individual field trials were combined in a multi-environment trial data, where each trial is identified as an environment. Multiplicative mixed models were used to analyze the data and a factor analytic approach to model the genetic by environment interaction effects. The pedigree of the lines was then assembled and incorporated into the analysis. This approach allowed to partition the total effects into additive and non-additive components. The total and additive genetic effects were inspected across and within environments for broad and specific selections of the lines with the best commercial value and the best parents. Summary measures of overall performance and stability were used to aid with selection of parents. To the best of our knowledge, this is the first study which used the pedigree information to breed common vetch. In this paper, the application of this statistical methodology has been successfully implemented with the inclusion of the pedigree improving the fit of the models to the data with most of the total genetic variation explained by the additive heritable component. The results of this study have shown the importance of including the pedigree information for common vetch breeding programs and have improved the ability of breeders to select superior commercial lines and parents.

Project description:The discharge summary (DS) is a document that contains the diagnosis, comorbidities, procedures, complications, and future treatment plan for a particular patient after an inpatient hospital stay. The DS is completed by junior medical staff and is delivered to the general practitioner (GP). DS completion is time consuming and tedious, and DSs are usually not completed within the recommended time frame after a patient is discharged. Time spent completing DSs correlate to junior doctor overtime, which costs the hospital money in overtime pay. Information that is required in the DS is generally already entered into numerous electronic information systems in the hospital, including the "electronic patient journey board" which lists all the patients in a given ward with their clinical information. This information is constantly updated by all staff in the hospital. A program was developed that transferred this information directly into the patient DS. Ten junior doctors in two departments kept daily records for one week of the time spent compiling DSs, the time at work and the actual overtime claimed, before and after the introduction of the intervention. The mean (± SD) time for DS compilation per week reduced by 2.8 (± 2.4) hours from 10.0 (±3.5) hours (p<0.01) and the mean overtime worked per week reduced by 2.8 (± 3.1) hours from 8.5 (± 4.4) hours (p<0.05). The mean overtime claimed reduced by 1.8 (± 2.8) hours from 5.3 (± 5.4) hours per week (p<0.05), resulting in reduction in mean overtime payment of $114.95 from $290.57 per doctor, per week. Extrapolating to the 60 ward based junior doctors, the potential annual savings for the hospital budget are over $350,000. Additionally, the number of DSs completed within 48 hours increased from 45% to 58%. In summary, the transfer of electronic data from the electronic patient journey board to the discharge summary program has yielded improvements in DS completion rates and overtime worked by medical staff, resulting in significant reduction in overtime costs.

Project description:Genetic researchers often collect disease related quantitative traits in addition to disease status because they are interested in understanding the pathophysiology of disease processes. In genome-wide association (GWA) studies, these quantitative phenotypes may be relevant to disease development and serve as intermediate phenotypes or they could be behavioral or other risk factors that predict disease risk. Statistical tests combining both disease status and quantitative risk factors should be more powerful than case-control studies, as the former incorporates more information about the disease. In this paper, we proposed a modified inverse-variance weighted meta-analysis method to combine disease status and quantitative intermediate phenotype information. The simulation results showed that when an intermediate phenotype was available, the inverse-variance weighted method had more power than did a case-control study of complex diseases, especially in identifying susceptibility loci having minor effects. We further applied this modified meta-analysis to a study of imputed lung cancer genotypes with smoking data in 1154 cases and 1137 matched controls. The most significant SNPs came from the CHRNA3-CHRNA5-CHRNB4 region on chromosome 15q24-25.1, which has been replicated in many other studies. Our results confirm that this CHRNA region is associated with both lung cancer development and smoking behavior. We also detected three significant SNPs--rs1800469, rs1982072, and rs2241714--in the promoter region of the TGFB1 gene on chromosome 19 (p?=?1.46×10(-5), 1.18×10(-5), and 6.57×10(-6), respectively). The SNP rs1800469 is reported to be associated with chronic obstructive pulmonary disease and lung cancer in cigarette smokers. The present study is the first GWA study to replicate this result. Signals in the 3q26 region were also identified in the meta-analysis. We demonstrate the intermediate phenotype can potentially enhance the power of complex disease association analysis and the modified meta-analysis method is robust to incorporate intermediate phenotype or other quantitative risk factor in the analysis.

Dataset Information

Improving phylogenetic analyses by incorporating additional information from genetic sequence databases.

Publications

Improving phylogenetic analyses by incorporating additional information from genetic sequence databases.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets