ML Analysis

Goal

To compare several methods of obtaining an initial tree on which to perform a maximum likelihood hill-climbing search.

Before we begin a full experiment, we have several preliminary tasks. 
  1. First we need to estimate how long it take to calculate the all parameters for a model of evolution on a single tree.  That for a fixed topology, how long does it take to estimate optimal branchlegnths, substitution matrix (M), proportion of invariant sites (I), and shape parameter (GAMMA).
  2. Next we need to estimate how long it takes to estimate only the optimal branchlegnths with  the tree topology, M, I and GAMMA all fixed.
  3. Lastly, we need to run preliminary experiment on a subset of the methods explored in the full experiment.

Preliminary Experiment (3 phases)

Phase 1
  1. Estimate M+I+GAMMA parameters on a tree obtained by Fast MP (Greedy MP plus one round of TBR swaping).
  2. Fix M+I+gamma parameters (M+I+GAMMA parameters will not be re-estimated in any of the subsequend phases.)
  3. Record the time, T1, it takes to calculate parameters. 
Here is the pseudo code and Perl script for obtaining initial tree.

Phase2

  1. Get starting tree using each of the following methods
  2. Record the time, T2, it takes to obtain the starting tree.
Phase3:

For each starting tree obtained in Phase 2 (except for parsimony ratchet) perform a ML hill-climbing search for time, T, where
T
= (1week-T1-T2).
    1. Record log likelihood scores (each hour of the search)
    2. Record the topology of the best ML Tree after 1 week
    3. Record the topology of the tree with the best log likelihood score after each hour for each analysis.
Experiment will be run on a fixed data set: 228 taxa data set used in Hillis 2002 parallelizing GAML paper.