Confusing Similarity with Evolutionary History

By Ann Gauger

If common descent, or  “descent with modification” as Darwin called it, is true, there should be traces of this process in the genomes of organisms. By comparing similar sequences we should be able to reconstruct phylogenetic trees that reveal the history of evolution. So the theory went, and so it seemed to be for quite a while. When we were limited to sequencing a few hard won genes from a few species, we found the expected pattern of nested hierarchies of similarity.  But because of the rapid advance of DNA sequencing technology, the number of genes and species that have been sequenced has exploded. And with that explosion have come surprises.

In an article called The Coming of Age of Molecular Systematics written in 1998, Maley and Marshall described difficulties  in phylogenetic analysis that had been encountered, particularly in the deep branches of evolutionary history. I list them here because they are still of fundamental importance:

  1. Alignment of distantly related sequences is difficult because of substantial change, including insertions and deletions. Unambiguous alignment may not be possible, or statistical support for the alignments may be weak. Yet which alignment is chosen can significantly affect inferred relationships.
  2. The choice of genes or species chosen to represent each group can profoundly affect the results.
  3. Statistical results are inconsistent, or as the authors state, there exists “the disconcerting situation where, as the amount of data analyzed increases, so does the apparent statistical support for an incorrect phylogenetic tree.”  This can be caused by two things, they say: a) homoplasy, similarities in sequence not due to common descent, or b) long branch attraction, a problem of algorithmic methods used to recover trees that tend to produce false relationships when tree branch lengths are long.

Maley and Marshall’s solution:

To be confident in our hypotheses of relations among the animal phyla we need to gather more DNA sequences, especially from undersampled phyla; develop better methods of DNA analysis on the basis of more realistic models of DNA evolution; and develop independent data sets using morphological, developmental, and other molecular data (to corroborate or falsify specific hypotheses or to combine in total-evidence analyses. Work is currently under way on all these fronts, which promise more secure hypotheses of the relationships among the animal phyla and, through them, a better understanding of the causes of major morphological innovation. [References removed for clarity.]

Did their solution “lead to more secure hypotheses of the relationships among the animal phyla”? Apparently not, based on the abstract from this 2010 essay by Rokas and Carroll, called Bushes in the Tree of Life:

Genome analyses are delivering unprecedented amounts of data from an abundance of organisms, raising expectations that in the near future, resolving the tree of life (TOL) will simply be a matter of data collection. However, recent analyses of some key clades in life’s history have produced bushes and not resolved trees. The patterns observed in these clades are both important signals of biological history and symptoms of fundamental challenges that must be confronted. Here we examine how the combination of the spacing of cladogenetic events and the high frequency of independently evolved characters (homoplasy) limit the resolution of ancient divergences. Because some histories may not be resolvable by even vast increases in amounts of conventional data, the identification of new molecular characters will be crucial to future progress.

Might the persistence of this problem be because the basic premise of descent with modification as the explanation for the evolution of phyla is false? New molecular characters may provide no relief, particularly if homoplasy, defined as similarity of independently evolved characters, is widespread. Similarity loses its usefulness as an evolutionary trace when homoplasy enters the picture.

We already have good evidence that linear descent with modification is false for prokaryotes, because their tree looks like a thicket. At what point will molecular systematists be willing to say that at the root of the animal phyla we have not one tree but a grove?