Correcting Four Misconceptions about my 2004 Article in JMB

By Doug Axe

In August of 2004 I received an email inquiry from plant biologist Art Hunt. He had written a draft for a blog piece aimed at reviewing a research article of mine that had just appeared in the Journal of Molecular Biology [1], and he wanted to know whether he had understood my work correctly. He clearly aimed to refute claims that were beginning to surface that my paper supported intelligent design, but he also wanted to make sure he wasn’t misconstruing my work in the process. He didn’t expect me to oblige—“I will understand if you decline; in fact, I would probably do the same…”—but I did.

His summary of the experiments I reported in the JMB paper was largely correct, though his understanding of my analysis was off in several respects. As a result, many of the conclusions he drew were, in my judgment, not well grounded. I gave him feedback to this effect, but since it was a blog entry rather than a peer-reviewed scientific publication, I wasn’t particularly concerned to see the final version, or to respond to it.

Two things have happened since then. One is that my 2004 JMB paper has, as Hunt predicted, been cited in support of intelligent design in several prominent places, and the other is that Hunt’s finished blog piece [2] has become the favored argument against using my paper in support of ID.

Here I summarize what appear to be the four most common objections to using my 2004 paper in support of ID, three of which trace back to Hunt’s blog entry, the fourth perhaps originating with Steve Matheson, who joined Art Hunt last summer for critical dialog with Steve Meyer in front of an audience at Biola University. [3]

First, let’s be clear what the question is. We know that living cells depend on the functions of thousands of proteins, and that these proteins have a great variety of distinct structural forms. These distinct forms are referred to as folds, and there are well over a thousand of them known today, with more being discovered all the time. The big question is: Does the Darwinian mechanism explain the origin of these folds?

One way to approach this question is to reframe it slightly by viewing the Darwinian mechanism as a simple search algorithm. If Darwin’s theory is correct, then natural populations find solutions to difficult problems having to do with survival and reproduction by conducting successful searches. Of course we use the words search and find here figuratively, because nothing is intentionally looking for solutions. Rather, the solutions are thought to be the inevitable consequence of the Darwinian mechanism in operation.

If we view Darwinism in this way—as a natural search mechanism—we can restate the big question as follows: Are new protein folds discoverable by Darwinian searches? Recasting the question in this way turns out to be helpful in that it moves us from a subjective form of explanation to an objective one, where an affirmative answer needs to be backed up by a convincing probabilistic analysis.

With that background, let’s look at the four objections.

Objection 1: Axe’s paper doesn’t claim to support intelligent design or challenge Darwinism, so it’s a mistake to use it for those purposes.

Hunt seems to be reasoning this way when he repeatedly criticizes ID proponents for using my work in a misguided way, as in: “it is not appropriate to use the numbers Axe obtains to make inferences about the evolution of proteins and enzymes”, and “the claims of ID proponents vis-à-vis Axe 2004 are exaggerated and wrong.” [2] The idea seems to be that since I didn’t make an explicit argument for ID in the paper, any such use of my work is misguided.

To Hunt’s credit, he also reasons from the science, offering his opinion as to why my work doesn’t support ID or undermine Darwinism (see below). Barbara Forrest and Paul Gross, on the other hand, applied Objection 1 to my earlier work without making any attempt to understand the science. Rather, their book Creationism’s Trojan Horse [4] tries to make its anti-ID case with statements like, “Axe’s reference list does not include any known ID proponents”, and “The key words accompanying the abstract, which Axe himself supplied, do not include ‘intelligent design’ or anything remotely suggestive of it.” I guess there’s no need to use a scientific approach if you’re committed to the belief that there’s no connection between ID and science.

Two points need to be made in response to Objection 1. First, my paper does actually claim to be directly relevant to the origin of protein folds. The first three sentences of the abstract read [1]:

Proteins employ a wide variety of folds to perform their biological functions. How are these folds first acquired? An important step toward answering this is to obtain an estimate of the overall prevalence of sequences adopting functional folds.

As becomes clear from the rest of the abstract, the paper takes this first step by using experimental data to calculate a value for the prevalence of functional sequences. Then, under the heading Implications, the final two paragraphs of the discussion focus on what this value means for the evolution of new protein folds, concluding with this critique of the evolutionary model: “generating new folds from parts of old ones may be much less feasible than has been supposed.” [1] Keep in mind that three experts in enzyme function and evolution approved of the paper in order for it to be published, so my opinion that the work is relevant to protein evolution is shared by others who know the subject very well.

Not only does the paper claim to have evolutionary implications, but it should be quite obvious that it has evolutionary implications. This, my second point in response to Objection 1, becomes clear when we frame the main question as suggested above: Are new protein folds discoverable by Darwinian searches? The first thing we need to know in order to answer this question is the size of what needs to be found relative to the whole search space, which is precisely what my JMB paper of 2004 refers to as the prevalence of functional sequences. As I’ve argued in detail elsewhere [5], this isn’t the only thing we need to know in order to decide whether Darwin’s theory is up to the task, but it certainly is a key thing to know.

And of course, if Darwin’s theory turns out not to be up to the task, it would be reasonable to make use of this in making a case for ID.

Objection 2: Although Axe makes a case that functional sequences are rare in sequence space, this has no bearing on whether new protein folds can evolve. The evolution of new protein folds simply requires that functional sequences not be isolated in sequence space, which has nothing to do with how rare functional sequences are.

Hunt stated that my results “reveal little… about the ‘isolation’ or evolution of TEM-1 penicillinase (or any other enzyme, for that matter).” [2] Steve Matheson put the same objection this way [6]:

in order to challenge evolutionary explanation, one must demonstrate that proteins are isolated in sequence space, such that there is no stepwise trajectory that can lead from one protein to another. Isolation and rarity are not the same thing. But all Meyer claims (and all Doug Axe has tried to show) is that functional proteins are rare in sequence space.

Dennis Venema has also picked this objection up, stating it this way [7]:

The most obvious issue is that the rarity or commonality of function in protein sequence space is irrelevant to the discussion [of protein origins]. What counts is whether functional sequences in protein space are isolated from each other in a way that random mutation and natural selection cannot bridge.

It’s kind of like insisting that the height from which a person has an accidental fall has nothing to do with their chance of surviving because it’s the speed of impact that really matters. One could equally insist that the speed of impact is irrelevant—it’s the force of rapid deceleration that really matters. In truth they all matter, and they do so for closely related reasons. The confusion comes from overlooking the causal links between them.

Yes, the Darwinian mechanism requires that the different protein folds and functions not be isolated, and yes the rarity of functional sequences has a great deal to do with whether they are isolated. I’ve already rebutted Objection 2 in response to Steve Matheson [3], but let’s look at it again.

The above quote from Venema appears in a section titled: “No biological information by natural means?” Now, I claim that this section title is linguistically isolated. By that I do not mean that it cannot be altered and remain readable, or even that it cannot be altered and remain meaningful. What I mean is that there is virtually no freedom to generate new meaningful phrases from this one if we restrict ourselves to single-character changes and we require a meaningful phrase at every step. In other words, that title cannot undergo what is referred to as open-ended evolution.

We can get to “To biological information by natural means?”, and that’s about it. “To biological information by natural meats?” retains word-level meaning, but it loses all coherence as a phrase. So we appear to be stuck on an isolated island with a population of two—and not much difference between the two at that.

Why do we get stuck so quickly? If we take something much smaller, like a single three-letter word, we seem to be able to keep the transitions going: red, rod, mod, mad, mat, cat, car, tar, bar, ban, bun, sun, sin, tin, ton, won, win, gin, gun, guy, buy, boy, toy, try, pry, pay, say, shy, why … Good question. Why do three-letter words behave so differently?

Perhaps the ease of transitioning among three-letter words explains the rationale behind Objection 2. By one way of estimating, I find that about 3–4% of the possible three-letter combinations are English words. [8] This makes three-letter words rare in the sense that they are well outnumbered by nonsense three-letter combinations, and yet they do not seem to be isolated. Objection 2 seems to be motivated by the thought that proteins may behave the same way. John Maynard Smith suggested as much. [9]

On the other hand, maybe we need a more precise definition of ‘rare’. Since there are twenty five ways to change a given letter into some other letter, this means there are seventy five ways to ‘mutate’ any three-letter word with a single letter change. Consequently, even if most of these mutants are nonsense, a success rate of 3–4% means that two or three of them are expected (on average) to be English words, and this is boosted by the fact that certain patterns of vowels and consonants are more word-rich than others.

Returning to Venema’s section title, there are about a thousand ways to make a single-letter change to it: 42 positions × 26 alternatives at each position = 1092 possible mutants (including spaces but not punctuation). So, we might expect to avoid isolation if meaningful phrases among all 42-character combinations are more prevalent than about 1 in 1000. Conversely, if meaningful phrases are much less common than this, isolation is expected. Perhaps this is the relevant understanding of ‘rare’.

In fact it is. And as you’ve probably guessed, meaningful 42-character combinations are far more rare than 1 in 1000, which explains why the islands of meaning are always isolated—mere dots in an enormously vast ocean of nonsense. Billions of sensible things can be said in 42 characters, and almost all of them can be said in many different ways, but none of that amounts to anything compared to the quadrillion quadrillion quadrillion quadrillion possible character combinations of that length (27^42 = 10^60).

That is the sense in which functional protein sequences appear to be rare, and it has everything to do with their isolation.

Objection 3: Because Axe measured mutational sensitivity from a weakly functional starting sequence rather than the fully functional natural enzyme, the mutants he generated were inappropriately disadvantaged, and this is why he arrived at such a low value for the prevalence of functional sequences.

According to Hunt, I “molded a variant that would be exquisitely sensitive to mutation.” [2] Venema expressed the same concern, that the starting sequence I used was “intentionally ‘hamstrung’ with multiple mutations to render it far less functional than its natural counterpart.” [7]

Both Hunt and Venema seem to think the outcome would have been more favorable (i.e., functional sequences would have been more prevalent) had I used the highly proficient natural enzyme as a starting point rather than the handicapped version. Actually, as a demonstration will show, the opposite is true.

Returning to Venema’s subtitle, suppose we want to estimate the proportion of 42-character strings that can replace it. One way to approach this is to generate a large collection of strings that are randomized at the first seven positions, and another randomized at the second run of seven positions, and so on, for a total of six collections.

Here are a few examples of ‘mutant’ strings from the first collection:

npfzbifogical information by natural means
tnagyllogical information by natural means
zkjubdbogical information by natural means
sodlwdjogical information by natural means

and here are a few from the second collection:

no biolfaryewrinformation by natural means
no biolupbmjmginformation by natural means
no biolurxacryinformation by natural means
no biolfjrqgatinformation by natural means

Assuming we have an unlimited pool of readers who can examine the mutant strings, how should we proceed? If we instruct the readers to accept only those sequences that match the original, rejecting all others, then we know what the result will be. Of ten billion possible variants in each collection (27^7 = 10^10), only one will match the original. If we raise this proportion to the power six (because there are six collections) we get the expected answer: Of all possible 42-character strings, only one in 10^60 match the original.

But suppose we use a less strict approach. Suppose we simply ask the readers whether they can discern a meaningful reading, and we count all cases where the discerned reading is correct. This will increase the number of accepted mutants dramatically. If, with a bit of squinting, most mutants with two typos can be interpreted correctly—mutants like these:

ng bhological information by natural means
no biolojijal information by natural means
no biological infosmntion by natural means

then each collection will have about fifteen thousand accepted mutants instead of only one. This makes the prevalence of functional sequences in each collection about one in a million, which is much higher than the previous value of one in ten billion.

But we want to know how prevalent acceptable sequences are among all possible sequences. Let’s call this fraction P. The question is, how do we calculate P now that we are using this less strict approach to accepting mutants? In answering this question we will discover two major problems with Objection 3.

It’s certainly true that we will arrive at a much higher value for P if we simply raise the prevalence found for the individual collections to the power six. That calculation gives us one in (10^6)^6 , which is one in 10^36. This is a tiny fraction, but it’s a whole lot larger than the one we got using the strict approach (one in 10^60).

Here’s the problem, though. In calculating this higher P value, we have effectively assumed that the typo rate we found to be tolerable in the individual collections (two typos per seven positions) remains tolerable when we apply it to the full 42-character string. However, when we try this by generating full-length strings with twelve typos (maintaining the 2-in-7 ratio), we get completely unreadable gibberish:

nohbimlogicaemicrxrmation synnaludalnmeans
no giolotida binuprmazion by nktumai me ns
go hiozfgicac infeemadion zyknatural weans

Evidently we have made a mistake, because mutants that ought to be readable according to our calculation clearly aren’t readable. As you may have guessed, the mistake is that the 2-in-7 typo rate was tolerable only because it was restricted to a narrow section of seven positions. The fact that the remaining 35 positions were without error compensated in large measure for the errors in the mutated section.

The remedy is to use a different starting sequence. Specifically, we need the starting sequence to be of the same quality that we intend to require of mutant sequences in order for them to be accepted. Otherwise the mismatch in quality will skew our results.

The first problem with Objection 3, then, is that it fails to recognize the importance of applying the same quality standard to the starting sequence that will be applied to the mutants derived from it. Objection 3 focuses on the fact that I used a weakly functional starting sequence without recognizing that this was called for by the fact that weakly functional variants of that sequence were accepted as ‘functional’.

But there’s another problem with Objection 3, having to do with the choice of a quality standard. We now know that the same standard has to be applied consistently if our results are to be meaningful, but we are still free to set that standard at any level. So, what difference does the level make?

As we’ve seen, if we take perfection to be the standard (i.e., no typos are tolerated) then P has a value of one in 10^60. If we lower the standard by allowing, say, four mutations per string, then mutants like these are considered acceptable:

no biologycaa ioformation by natutal means
no biologicaljinfommation by natcrll means
no biolojjcal information by natiral myans

and if we further lower the standard to accept five mutations, we allow strings like these to pass:

no ziolrgicgl informationpby natural muans
no biilogicab infjrmation by naturalnmaans
no biologilah informazion by n turalimeans

The readability deteriorates quickly, and while we might disagree by one or two mutations as to where we think the line should be drawn, we can all see that it needs to be drawn well below twelve mutations. If we draw the line at four mutations, we find P to have a value of about one in 10^50, whereas if we draw it at five mutations, the P value increases about a thousand-fold, becoming one in 10^47.

Notice two things. First, both of these P values are far more favorable than one in 10^60, and second, lowering the standard always increases the P value. This makes perfect sense—it has to become easier to meet the standard as the standard is lowered.

Having understood this, we now see that Objection 3 has things inverted. In the work described in the 2004 JMB paper, I chose to apply the lowest reasonable standard of function knowing this would produce the highest reasonable value for P, which in turn provides the most optimistic assessment of the feasibility of evolving new protein folds. Had I used the wild-type level of function as the standard, the result would have been a much lower P value which would present an even greater challenge for Darwinism. In other words, contrary to Objection 3, the method I used was deliberately generous toward Darwinism.

Objection 4: Axe’s experiment doesn’t reflect how evolution really works. He mutated amino acids in groups of ten, whereas evolution sifts mutations one at a time. Consequently, Axe’s results tell us nothing about whether protein folds can or cannot evolve.

Combining this with the previous objection, Steve Matheson coined the phrase “whopping mutations on crippled proteins.” [6] We’ve now seen that ‘crippled’ proteins are precisely what we should be looking at if we want our results to be as favorable to Darwinism as they reasonably can be. So Matheson got that one wrong. How about the whopping mutations?

Venema expressed the same objection, but seems to equivocate on its significance. In his words [7]:

…Axe did not mutate his test protein with single point mutations, but rather by adding partially randomized groups of ten amino acids at a time, something that does not resemble natural processes. While these features of Axe’s work are useful standardizations for estimating the relative rarity of function[al] protein folds in his specific experimental setup, they render his work irrelevant to the issue of evolutionary isolation of functional sequences.

But if the methods I used, including mutations in groups, were useful for estimating the rarity of functional sequences, and their rarity is relevant to their isolation (see Objection 2), then it’s hard to see how my methods render the results irrelevant to the evolution of new protein folds.

The first word of my paper’s title is estimating, which is quite different from simulating. I had absolutely no intention of simulating the evolution of a new protein fold. Instead I set out to estimate how rare the amino acid sequences are that produce a working version of a particular fold.

Ideally the mutations would have been even more whopping. Had it been technically feasible to shuffle the mutants across their entire lengths, to generate a massive library containing all possible variants, and to test the gargantuan number of variants needed to find several functional variants, this definitely would have been the way to go. So the ‘whoppingness’ of the mutations really has nothing to do with how relevant the results are to protein evolution. It’s merely a feasibility issue.

The estimate I got from experiments performed on a more modest scale explains why that more modest scale had to be used. I estimated that one in 10^64 of the variants that would have been produced in the whoppingly ideal experiment (where all positions are mutated simultaneously) would have had low-level function. But the library of protein variants in this ideal experiment would have had to be about a hundred billion times more massive than the sun in order for one functional variant to exist out there—somewhere. Whopping indeed. But not very practical.

So in the end Matheson’s complaint about “whopping mutations on crippled proteins” is doubly wrong.

And more could be said in critique of Art Hunt’s blog piece, but enough is enough. Suffice it to say that after his first section, which as I said does a reasonable job of summarizing my experimental approach, things get very confused.

[1] doi:10.1016/j.jmb.2004.06.058

[2] Axe (2004) and the evolution of enzyme function

[3]  Darwinian tree-huggers—You gotta love their devotion

[4] Creationism’s trojan horse

[5] doi:10.5048/BIO-C.2010.1

[6] Bread and circus: Signature in the cell at Biola (Part II)

[7] Seeking a signature

[8] I used the DictionaryLookup command in Mathematica, selecting all three-letter entries and then removing ones that begin with an upper-case letter.

[9] Natural selection and the concept of a protein space