Biologic
Perspectives

Why Proteins Aren't Easily Recombined, Part 2

By Ann Gauger

In a previous post, I described how proteins are composed of units of secondary structure called alpha helices and beta sheets. These units of secondary structure assemble into  the wide range of protein structures that exist in nature, a few of which are shown below in cartoon form. Beta strands are shown as the flattened arrows and helices as coils. Remember, proteins don’t actually look like this, but it helps us to make comparisons between protein structures when they are drawn this way.

I also described why alpha helices and beta sheets (multiple beta strands grouped into sheets) are sequence-specific in their interactions. They do not function as interchangeable lego bricks, because the exterior of each helix or strand has a unique set of side chains, with unique chemical properties. 

This context-dependent behavior of structural modules, like helices and sheets, has been demonstrated experimentally.  The general rule is that modules that function as independent units, with little interaction with other structural elements, can sometimes be recombined. Modules with multiple external interactions cannot. These principles are illustrated in a series of experiments using the the protein beta lactamase, an enzyme  that breaks down penicillin in penicillin-resistant bacteria. 

First I will describe series of experiments demonstrating that modular recombination is possible, but only in certain circumstances. Meyer and coworkers identified blocks of sequence that should function as self-contained modules in three natural beta lactamase enzymes, using a computer algorithm that analyzed the exterior surfaces of the proposed modules for side chain interactions. These enzymes had between 34-42% sequence identity to start, allowing their sequences to be aligned with some confidence. They then mixed and matched the identified self-contained modules from these enzymes into chimeric enzymes and tested them for function. Even with carefully designed splice sites, and optimal sequence independence, four out of five chimeras had no detectable function, and only one out of ten had function approaching that of the natural enzymes. Thus, even under carefully engineered, ideal circumstances, with a great deal of sequence independence, most recombinations failed.

Another test of modularity reported by Doug Axe examined whether the main structural domains of beta lactamase enzymes could be swapped. In the picture below, part (A) shows the domain structure of the two beta lactamase enzymes he used, with the two domains in each protein colored differently. In (B) the backbones of the similar domains from each protein are aligned with each other to show their 3-D similarity. Both enzymes carry out exactly the same chemical reaction, but have only 26% identical amino acid sequence.

In spite of the structural and functional similarity, the domains of the two proteins are not interchangeable. Swapping domains between enzymes kills enzyme activity. Why? It’s because the active site of the enzyme lies at the interface between the two domains, and the substantially different side chain interactions at the interface disrupt the chimeric enzyme’s function. So this is a case where the extensive, specific side-chain interactions prevent any functional recombination between domains.

This same sequence specificity can be shown even at the level of individual amino acids.  Using two other variants of beta lactamase whose structures are nearly identical (shown below), and whose sequences are 50% identical, another study by Axe swapped non-matching but positionally equivalent amino acids between the two proteins to see if they could substitute for one  another.  

As Doug Axe described in his 2010 paper,

If aligned but non-matching residues are part for part equivalents, then we should be able to substitute freely among these equivalent pairs without impairment. Yet when protein sequences were even partially scrambled in this way, such that the hybrids were about 90% identical to one of the parents, none of them had detectable function.

In other words, even if only 10% of non-matching residues were changed, the resulting hybrid enzyme no longer functioned. Why? Because the substitution of different amino acids into the existing protein structure destabilized the fold, even though those same amino acids worked well in another context. Thus, each protein’s amino acid sequence works as a whole to help generate a proper stable fold, in a context-dependent fashion.

So we have context-dependent effects on protein function at the level of primary sequence, secondary structure, and tertiary (domain-level) structure. This does not bode well for successful, random recombination of bits of sequence into functional, stable protein folds, or even for domain-level recombinations where significant interaction is required. More to follow.

Nature as a Guide for Efficient Design

By Ann Gauger

A few blog posts back we provided a link to a beautiful video describing the Fibonacci series. The image above is from that video.

Now researchers have found that one of the patterns derived from the Fibonacci series that is present in sunflowers provides the most efficient arrangement for mirrors in solar power generation. 

The title of the article from The Economist? “In matters of clever design, nature has often got there first.” 

Two researchers from the Massachusetts Institute of Technology have now devised a better and more compact way of laying out arrays of mirrors. Slightly to their chagrin, however, and somehow appropriately, they found when they had done the calculations that sunflowers had got there first.

Pretty cool, huh?

Why Proteins Aren't Easily Recombined

By Ann Gauger

There seems to be an idea floating about among some biologists that it is easy to recombine protein domains or swap bits of protein structure to generate new function. I suppose it comes from looking at simplified drawings of protein structure, and forgetting about the detailed atomic interactions required. 

For non-biologists, let me explain why proteins aren’t easily recombined. A protein fold is typically composed of smaller structural elements called alpha helices or beta sheets, with unstructured loops of protein connecting them. These elements adopt a stereotyped pattern of folding because of hydrogen bonding patterns between amino acids. The illustration below from Axe (2010) shows these hydrogen bonding patterns as red dashed lines between the linked amino acids. For clarity, the side chains of each amino acid are faded out, while the backbone trace is in full color.

Below each helix (a) or sheet (b) is a simplified geometric shape that illustrates how the element assembles and what edges are available for extension (magenta faces).  We see each kind of structure from the side (on the left) and face on (on the right).

It is important to know that different amino acid combinations can form each of these elements—many different sequence combinations can form alpha helices or beta sheets. As a result, each particular helix or sheet has a distinct set of side chains sticking out from it, requiring a distinct set of chemical interactions with any nearby protein sequence. Thus, helices and sheets are sequence-dependent structural elements within protein folds. You can’t swap them around like lego bricks. 

This necessarily means that when you bring new secondary structure elements into contact by some sort of rearrangement, they will be unlikely to form a stable three dimensional fold without significant modification.

But you don’t have to take my word for it—it is possible to test these things. Our next post will introduce one such experiment.

Intricate Coordination

By Ann Gauger

Meet carbamoyl phosphate synthetase (CPS), a remarkably complex enzyme. This enzyme uses bicarbonate, glutamine, ATP, and water to make carbamoyl phosphate via a multi-step reaction at three separate active sites, involving several unstable intermediates. Go here for details.

CPS is made of two protein chains with a combined length of over 1,400 amino acid residues. We now know from extensive biochemical data that a fully coupled CPS requires the hydrolysis of one glutamine and  two molecules of MgATP for every molecule of carbamoyl phosphate formed. The three active sites of the enzyme maintain the overall stoichiometry of the reaction, without wasteful hydrolysis of glutamine and/or MgATP. In order to couple the reactions efficiently, the enzyme uses internal molecular tunnels for sequestering and rapid transfer of reactants between active sites, and allosteric conformational changes to synchronize their activity. This is remarkable, given that the first and last active sites are separated by nearly 100 Å. But even more remarkable is the fact that this enzyme carries out a series of reactions involving unstable intermediates with a half-life of seconds to milliseconds. How does a neo-Darwinian process evolve an enzyme like this? Even if enzymes that carried out the various partial reactions could have evolved separately, the coordination and combining of those domains into one huge enzyme is a feat of engineering beyond anything we can do. 

For more information about the challenge to Darwinism presented by protein folds, go here.

A beautiful video about Fibonacci series in nature. Mesmerizing!

(Source: vimeo.com)

Exquisite Design

By Ann Gauger

Proteins are the building blocks of life. They are the structural parts that give cells shape, the enzymes that build or break down the molecules of life, the motors that transport things, the agents that send signals and regulate the activity of other proteins and genes, and the morphogens that help determine the development of the organism.

What determines a protein’s activity and properties? Its shape. And what determines its shape? The way its one dimensional string of amino acids folds together. This is a complex process involving many interactions, so complex that we cannot reliably predict a protein’s structure based on its sequence.

To get an idea of the problem, take a look at the picture above on the left. This is an illustration of a single protein called porin, whose structure has been determined experimentally. This protein has around 300 amino acids. What you see here is the arrangement of all its covalent bonds between atoms, shown as sticks. 

Making sense of that tangle of bonds is difficult. So scientists often depict proteins in a simplified cartoon form that shows the secondary structure of the protein fold. These secondary structures are motifs within the protein that form either alpha helical coils or flat beta sheets, and are therefore drawn as coils or flat arrows in cartoon illustrations. The above middle picture shows porin, in the same orientation and size as the first picture, but now drawn in cartoon form. Porin is composed mostly of antiparallel beta sheets, arrayed in a barrel-like shape, with an opening in the center.

But neither of these pictures illustrates porin as it would look if we could take a snapshot. The figure above on the right is a surface view of porin, showing its many knobs and hollows and a hole in the middle. Those knobs and hollows allow porin to assemble in its final functional form in the membrane, a trimer composed of three porin chains tightly coupled together. The image below is of trimeric sucrose-specific porin found in E coli. The artist has rendered it so that we can see the secondary structure within the assembled trimer, and it is colored as a rainbow according to the threading of the amino acid chains, from first (blue) to last (red).

What does porin do? It serves as a pore to let specific molecules enter the periplasmic space of bacteria. Therefore the pore has to have a specific shape and polarity to allow the right chemicals admittance, but exclude others, and the outside of the protein has to be able to interact stably either with other porins or with the membrane. That all depends on the information contained in porin’s primary sequence. 

Research indicates that sequences that fold into a particular functional shape are rare. Only about 1 in 10^77 possible sequences will adopt a functional fold 150 amino acids in length. How rare is porin as a functional sequence? Based on its functional constraints, it is probably as rare as the enzymatic fold already tested.

My point? Proteins exhibit exquisite design, with extraordinary specified complexity embedded in their sequences. Too much to be the result of random processes.

More examples to follow.

Image credit: Wikimedia Creative Commons for sucrose-specific porin

The Real Barrier to Unguided Human Evolution

By Ann Gauger

Comparing DNA sequences and estimating by how many nucleotides we differ from chimps doesn’t tell us much about what makes us human. Many of those nucleotide differences have no effect, because they are the product of neutral mutation and genetic drift. While these neutral mutations may affect the over-all mutation count, they don’t answer how many mutations are required for the transition from chimp-like to human.

This problem is analogous to one we examined concerning protein evolution last year in the BIO-Complexity journal (Gauger and Axe 2011). Converting one protein to another’s function can be viewed as a mini version of converting one species to another. But it is much easier to convert proteins than species.

We began by identifying two proteins that are close together in structure, but that have distinct functions. We examined what the minimal number of mutations to convert one protein to the other were. If all the places where they differed had to be changed, that would mean we would have to switch 70 % of one protein to achieve conversion to the other’s function. It’s unlikely that all those mutations are required, however, since many if not most of those changes are due to neutral mutation and drift, just like in the chimp-like to human case.

So to estimate the minimal number of mutations required for conversion to a new function, we identified and tested the most likely amino acid candidates using structural and sequence comparisons, one by one and in combination. We ended up changing nearly the entire active site to look like the target protein, but failed to achieve conversion.

Based on the number of groups we changed, we made a minimum estimate that seven specific mutations would be required for a functional shift to be observed. To get seven coordinated mutations takes way too long, even for bacteria, with their high mutation rate and large population sizes. 10^27 years is our estimate, based on Doug Axe’s population genetics model, also published in BIO-Complexity.

Personally, I think the chimp-like to human conversion would have to have taken many more years than any protein conversion, if it happened at all. A few years back Durrett and Schmidt published two papers where they estimated how long it would take to get a  single mutation, and then a second mutation to produce an eight base DNA binding site somewhere within a thousand base region near a gene. They stipulated that within the thousand base region there already was a sequence with six out of eight bases matching the target. The reason they chose to examine DNA binding sites? Many evolutionists think that evolution happens by changing gene expression, and changing gene expression most often requires changes to the regulatory regions around genes.

Their results? They calculated it would take six million years for a single base change to match the target and spread throughout the population, and 216 million years to get both base changes necessary to complete the eight base binding site. Note that the entire time span for our evolution from last common ancestor with chimps to estimated to be about six million years. Time enough for one mutation to occur and be fixed, by their account.

To be sure, they did say that since there are some 20,000 genes that could be evolving simultaneously, the problem is not impossible. But they overlooked this point. Mutations occur at random and most of the time independently, but their effects are not independent. Mutations that benefit one trait may inhibit another. In addition, many if not all these traits are complex adaptations. Each trait requires multiple mutations to achieve a beneficial change. And many of the traits must occur together to be of any benefit. Take for example the changes required for upright bipedalism. Hips, legs, feet, spine, ribcage, and skull all need to work together to allow free and efficient motion. All must be changed. But changing the hips before changing the angle of the legs would not be helpful. Changing to upright posture without lengthening the neck and setting the skull atop the spine would not work.

The point is this. There are hundreds of traits that distinguish us from chimps, probably requiring tens of thousands of mutations in total. But even if it takes only 30 or 40 specific trait changes to move from primate to human, and hundred of mutations, the time required would be astronomical. Longer than the age of the universe, actually.

Like Sternberg’s argument about whales, the argument from what is required to what is possible shows there just isn’t time enough for it to have happened by unguided means.

A Puzzle about Human Uniqueness

By Ann Gauger

Here’s a puzzle from Varki et al (2008). All simians and primates *except* humans are infected with a retrovirus called simian foamy virus (SFV). SFV is normally harmless, and reproduces by infecting cells, inserting itself into the host genome, and then at some point expressing the genes necessary to make new virus. The virus is ubiquitous, and newborns apparently catch it from their kin. The same is probably true for another group of lineage-specific viruses called simian infectious retroviruses (SIVs).

by ucumari, under Creative Commons license

Photo credit: ucumari, Creative Commons license

Humans do not have these viruses, but we can catch them from chimps. So how did we lose these viruses if we are descended from primate ancestors who had them? 

From the paper:

Indeed, given the remarkable corroboration between the phylogenetic trees of primates and their lineage-specific simian foamy viruses (SFVs), our common ancestors with other hominids almost certainly had SFVs. The same is probably true of the lineage-specific simian infectious retroviruses (SIVs) found in most non-human primates (NHPs). Assuming that the common ancestors of hominids carried multiple endemic infectious retroviruses, how did the human lineage eliminate them? Given that humans remain susceptible to re-infection with both SFVs and SIVs from other hominids, this seems unlikely to be explained solely on the basis of more efficient host restriction systems. Rather, there seems to have been an episode in which the ancestral human lineage was somehow ‘purged’ of these endemic viruses.

Now wait, before you get all excited, there is more to the story. Varki et al. suggest that humans lost the viruses because we lost the enzyme for making a particular glycoprotein that SIVs and SFVs use to make their envelope (outer coating of the infectious particle). These viruses are harmless, though, so there would be no selective benefit to eliminating that glycoprotein just to eliminate the viruses.

Rather, there might be a significant cost to inactivating the enzyme, since the glycoprotein we lost was important for cell recognition and immune responses ( based on its distribution in other mammals). Its loss would have to be compensated for by any lectins (proteins that bind to glycoproteins) that interacted with it, requiring multiple downstream changes, and likely affecting multiple biological processes

Varki proposes that some other unknown lethal virus used the same glycoprotein to infect cells, so eliminating the glycoprotein to defend against infection was advantageous enough to overcome any other costs. (Alternatively, the loss could have been a random mutation that became fixed in the population due to a sudden population bottleneck.) The glycoprotein that we lost would then have to be replaced by another, similar glycoprotein that had some affinity for the same lectins. But that compensation would take time, whereas the loss of the glycoprotein would have an immediate effect on many cellular processes.

According to published work, 10 out of 60 genes involved in recognizing the replacement glycoprotein show evidence of rapid evolution (multiple mutations), dating back to around 1 to 1.2 million years ago. Even more interesting, the expression pattern of the replacement glycoprotein has changed—now it is expressed in the brain as well as other places. Furthermore, the inactivation of the enzyme to make the original glycoprotein apparently happened after our lineage split from chimps, and before the most rapid enlargement of our brains. Is there a causal relationship? Who knows?

Unfortunately, the loss of the enzyme and subsequent change to our glycoprotein repertoire may have left us with some immune-related vulnerabilities.

We have a complex story here, one that is mostly speculation with some indirect data. Would the loss of the enzyme, and thus the glycoprotein it made, have been enough to eliminate the endemic SIVs and SFVs? Was the loss of the glycoprotein due to drift, or some selective advantage against an unknown virus? Does the changed expression pattern of the substitute glycoprotein have anything to do with our larger brains and increased connectivity compared to chimps? Is the proposed scenario even feasible? 

Photo credit:Sara Ross Photography, Creative Commons

To survive such a sudden change is unlikely, but not impossible. It would depend on what systems were affected by the loss of the glycoprotein. These probably included fertility, the immune system, and as evidence suggests, the nervous system. It may have also affected protein trafficking through the Golgi apparatus. The compensatory adaptive mutations to the interacting lectins would have to have occurred rapidly, but given our population size and mutation rate this is unlikely, unless the process was guided somehow. 

It should be possible to examine the feasibility of some of these transitions. By doing so, we may arrive at a better understanding of what such a series of adaptations would really require. And whether we were wiped clean of SFVs and SIVs by design.