Actions

Difference between revisions of "Phylogenetics Workshop -Abstracts"

From Santa Fe Institute Events Wiki

 
Line 38: Line 38:
  
 
Current methods of phylogeny reconstruction may be undermined by the unreliability of the molecular sequences alignments on which they depend.  In order to solve this problem and deliver robust phylogeny estimates when the alignment is uncertain, we describe a method for simultaneously estimating multiple alignments of biological sequences and the phylogenetic trees that relate the sequences.  Unlike current techniques that base phylogeny estimates on a single best estimate of the alignment, we take into consideration the myriads of near-optimal alignments.  We also avoid the trap of conditioning on an inaccurate external guide tree in constructing the alignment by estimating the alignment and phylogeny simultaneously. This eliminates the bias towards the guide tree and other tunable parameters that is inherent in phylogenies based on alignments constructed with progressive alignment.  Furthermore, the availability of the phylogeny during alignment construction allows us to use shared indels as evidence in clustering taxa on the tree.  Incorporating indel information in this way may be especially helpful in improving phylogenetic resolution for phylogenies of rapidly emerging diseases such as HIV, when the number of substitutions that have accumulated in the short times since divergence is small.  Finally, we note that accurate indel models and improved substitution models, such as those allowing invariant sites and rate variation between sites, may substantially improve alignment estimates in the joint estimation framework.
 
Current methods of phylogeny reconstruction may be undermined by the unreliability of the molecular sequences alignments on which they depend.  In order to solve this problem and deliver robust phylogeny estimates when the alignment is uncertain, we describe a method for simultaneously estimating multiple alignments of biological sequences and the phylogenetic trees that relate the sequences.  Unlike current techniques that base phylogeny estimates on a single best estimate of the alignment, we take into consideration the myriads of near-optimal alignments.  We also avoid the trap of conditioning on an inaccurate external guide tree in constructing the alignment by estimating the alignment and phylogeny simultaneously. This eliminates the bias towards the guide tree and other tunable parameters that is inherent in phylogenies based on alignments constructed with progressive alignment.  Furthermore, the availability of the phylogeny during alignment construction allows us to use shared indels as evidence in clustering taxa on the tree.  Incorporating indel information in this way may be especially helpful in improving phylogenetic resolution for phylogenies of rapidly emerging diseases such as HIV, when the number of substitutions that have accumulated in the short times since divergence is small.  Finally, we note that accurate indel models and improved substitution models, such as those allowing invariant sites and rate variation between sites, may substantially improve alignment estimates in the joint estimation framework.
 +
 +
==Vladimir Minin, University of Washington==
 +
 +
==="Robust estimation of genetic distances with applications to the convergent evolution problem" ===
 +
 +
Distance-based phylogenetic reconstruction methods first condense raw sequence data
 +
into a matrix of pairwise genetic distances and then use this matrix to arrive at a phylogenetic relationship among sequences. Flexibility in defining genetic distances allows distance-based methods to tackle problems that currently can not be solved by likelihood-based phylogenetic techniques. For example, defining genetic distance as the expected number of synonymous mutations between two sequences, we can eliminate the convergent evolution bias from phylogenetic reconstruction. However, synonymous distances represent only one possible labeling of mutations between sequences. Here, we present a new framework for calculating distances defined via an arbitrary labeling of mutations. Importantly, the proposed estimation algorithm
 +
is robust to model misspecification. In a simulation study, we demonstrate that our method allows one to estimate synonymous distances without resorting to codon models, but instead using easy-to-fit models of nucleotide evolution. We proceed with the phylogenetic analysis of an HIV transmission network, where we show advantages of using robust synonymous distances. We then turn to the evolution of Rubisco protein family in plants. The paralogs of these protein family undergo homogenization through either recombination or point mutation-driven convergent evolution. We use robust synonymous distances to disentangle effects of recombination and concerted evolution on the homogenization of Rubisco.

Latest revision as of 22:05, 17 April 2008

Workshop Navigation


Andrew Rambaut, University of Edinburgh

The Genomic and Epidemiological Dynamics of Human Influenza A Virus

"Influenza is one of the most common, and serious, respiratory infections of humans, with 3 to 5 million cases occurring annually worldwide, resulting in 250,000 to 500,000 deaths. Of the three types of influenza virus, those assigned to type A are the most virulent, and are associated with seasonal (winter) epidemics in temperate regions, year long transmission in the tropics, and occasional large-scale global pandemics that are characterized by substantial increases in morbidity and mortality. Whilst the adaptive interaction between the haemagglutinin (HA) protein of influenza A virus and the host immune system, ‘antigenic drift’, is one of the best described patterns in molecular evolution, studies have largely focused on the HA in isolation. Little attention has been paid to evolutionary dynamics at the genomic level, particularly the relationship between natural selection, reassortment, and the functional interactions among segments. We discuss the development and application of new phylogenetic techniques to analyse genomic sequences of IVA sampled globally over time. The aim is to investigate the rate of molecular evolution, the degree of selection acting on the genomic segments, the interactions between the segments and the rate of genetic exchange through the processes of reassortment and to characterize the relative importance of these forces to the evolutionary dynamics of this important human virus."

Hirohisa Kishino, University of Tokyo

Using a Database and a Prior for the Estimation of Structural Evolution and Recombination (PDF)

Mike Steel, University of Canterbury

Phylogenetics: Interactions Between Mathematics and Evolution

This talk provides an overview of phylogenetics, and the mathematics behind it. I will focus on how combinatorics and probability theory can help address fundamental questions in evolutionary systematics -- from how much data is required to resolve a divergence in the distant past, to quantifying future biodiversity given current high rates of extinction.

Joe Felsenstein, University of Washington

"Variation of rates among sites: a problem for distance matrix methods?"

Distance matrix methods can adapt to rate variation among sites by allowing for it in the calculation of the distances. This allows for the variation in each distance. But it does not carry over the information from one pair to another about which sites are the ones with the highest rates of evolution. Likelihood (and Bayesian) methods do carry over this information properly. We would therefore expect distance methods to seriously lose power when compared to likelihood methods, as the amount of variation of rates among sites becomes greater. A small simulation study is under way, and by the time of this workshop we should know whether this expectation is met.

Benjamin Redelings, North Carolina State University

Robust Phylogeny Reconstruction from Ambiguous Alignments via Joint Bayesian Estimation

Current methods of phylogeny reconstruction may be undermined by the unreliability of the molecular sequences alignments on which they depend. In order to solve this problem and deliver robust phylogeny estimates when the alignment is uncertain, we describe a method for simultaneously estimating multiple alignments of biological sequences and the phylogenetic trees that relate the sequences. Unlike current techniques that base phylogeny estimates on a single best estimate of the alignment, we take into consideration the myriads of near-optimal alignments. We also avoid the trap of conditioning on an inaccurate external guide tree in constructing the alignment by estimating the alignment and phylogeny simultaneously. This eliminates the bias towards the guide tree and other tunable parameters that is inherent in phylogenies based on alignments constructed with progressive alignment. Furthermore, the availability of the phylogeny during alignment construction allows us to use shared indels as evidence in clustering taxa on the tree. Incorporating indel information in this way may be especially helpful in improving phylogenetic resolution for phylogenies of rapidly emerging diseases such as HIV, when the number of substitutions that have accumulated in the short times since divergence is small. Finally, we note that accurate indel models and improved substitution models, such as those allowing invariant sites and rate variation between sites, may substantially improve alignment estimates in the joint estimation framework.

Vladimir Minin, University of Washington

"Robust estimation of genetic distances with applications to the convergent evolution problem"

Distance-based phylogenetic reconstruction methods first condense raw sequence data into a matrix of pairwise genetic distances and then use this matrix to arrive at a phylogenetic relationship among sequences. Flexibility in defining genetic distances allows distance-based methods to tackle problems that currently can not be solved by likelihood-based phylogenetic techniques. For example, defining genetic distance as the expected number of synonymous mutations between two sequences, we can eliminate the convergent evolution bias from phylogenetic reconstruction. However, synonymous distances represent only one possible labeling of mutations between sequences. Here, we present a new framework for calculating distances defined via an arbitrary labeling of mutations. Importantly, the proposed estimation algorithm is robust to model misspecification. In a simulation study, we demonstrate that our method allows one to estimate synonymous distances without resorting to codon models, but instead using easy-to-fit models of nucleotide evolution. We proceed with the phylogenetic analysis of an HIV transmission network, where we show advantages of using robust synonymous distances. We then turn to the evolution of Rubisco protein family in plants. The paralogs of these protein family undergo homogenization through either recombination or point mutation-driven convergent evolution. We use robust synonymous distances to disentangle effects of recombination and concerted evolution on the homogenization of Rubisco.