Halbert's Cubicle: On Origins and the Molecular Basis of Life

I've said on a number of posts, mostly about the possibility of life on other planets, that I don't particularly buy into the idea of a chemical origin of life. This often leads to some awkwardness in my professional life. I have advanced degrees in life sciences; how can I disregard actual science in favor of a purely religious point of view?

I don't reject a naturalistic explanation of the origin of life on purely religious grounds. Even in the absence of a motivating faith, the ideas regarding the chemical origin of life don't inspire confidence. Frankly, I find it requires more faith to believe that life arose out of a primordial soup than not, a conclusion in search of evidence to support it, and the evidence is wanting.

In all of the posts where I've mentioned this, I've said that I ought to explain why at some point. This is an attempt to do so, and like the theory itself, this explanation is complicated.

The Molecular Basis of Life

I can't assume that my audience has a textbook understanding of the chemical basis of life, so I will try to make a very brief explanation of life, the universe, and everything.

Everything we know about the function of life starts with the chemicals that make up the cell. Although there are many that are critical for life to function, the most important of these are DNA, RNA, and proteins.

The three molecule paradigm

It starts with DNA, which contains all of the genetic information for life, much like computer code. However, DNA is largely a storage medium. To get to functionality, it has to first be converted into functional formats. First, it gets transcribed by cellular machinery, using the DNA as a template to create an RNA copy of the nucleic acid sequence of the DNA.

RNA has a variety of different functionalities, but the primary interest here is messenger RNA (mRNA), which then gets translated into proteins. The mRNA is fed through different cellular machinery, which keys the three-letter codes of the nucleic acid sequence into different amino acids. The amino acid chain, also known as a polypeptide, is then folded, often with assistance, into a functional shape. If necessary, it is also transported to the location where its functions, such as the nucleus or within the cell membrane.

A brief overview of the molecular basis of life

Primordial Soup, Primitive Cells

The hypothesis for the chemical origin of life, or at least my understanding of it, holds that cellular life began from simple organic molecules collecting and aggregating in the "primordial soup" of ancient Earth. What this looked like is highly speculative. Ideas range from undersea thermal vents, pockets of water within ice crystals, or even soupy mud formations.

As the molecules assembled, they gained functionality and spawned new molecules. Eventually, many of these would be isolated within simple lipid membranes which facilitated compartmentalization of function, and the close proximity allowed for the development of collaborative functions and the transmission of information frome one generation of protocell to the next.

RNA World Hypothesis

One weakness in the chemical origin of life is the complexity of current life. It's a chicken-and-egg problem: Given the interconnectedness of the molecules necessary for cellular life, how could all three simultaneously, spontaneously arise? This problem resulted in the RNA World hypothesis. The idea is that RNA was the very first molecule, acting both as storage medium of genetic information and functional molecule. Given the existence of RNA with functional properties (i.e. ribozymes), it is a plausible idea, although it is not without critics or competing hypotheses.

Now, I've taken an entire semester of molecular biology and summarized it in a few paragraphs. Despite this gross simplification, it should be evident that life at its basic level is intricate, and tremendous effort has gone into exploring the origin of life. What makes me think these explanations are insufficient?

Doing the Math

Consider a simple scenario. Imagine you have one of each letter of the alphabet, and you toss them in a bag. Then, without looking, you pull out one letter at a time. What are the odds that you pull them out in alphabetical order?

No peeking

The answer, as it turns out, is about 1 in 4 x 10²⁶chance. That is a four with 26 zeroes behind it. As a reference point, it's roughly estimated there are about 7.5 x 10¹⁸ grains of sand on Earth. If I were to mark a single grain of sand, hide it somewhere on planet Earth, then blind fold you and ask you to pick it up, you'd be a billion times more likely to find that grain of sand than to get the alphabet out of the bag.

These outcomes are not impossible, but are so unlikely as to be essentially impossible. If you had 4 x 10²⁶ people doing this at the same time, would it be more likely to happen? Not exactly. There's certainly more opportunities for it to occur, but every one of those people pulling letters still have that same low (very, very low) probability of getting alphabetical order. A rare event doesn't become more likely just because you have more attempts at it.

What's the point of this example?

As I explained above, the macromolecules involved in the basis of life function off of templates; the DNA double-helix replicates against itself, RNA is a mirror of DNA, and then the RNA acts as a template for proteins.

How did the first macromolecules come about if they didn't have anything to read for a template?

The answer would probably be very similar to the alphabet-in-a-bag scenario. Nucleotides or amino acids spontaneously forming chains until functional molecules appear.

If this is the method by which functional proteins first appeared, then what are the odds of a specific molecule forming?

The smallest proteins, or protein sub-units, are roughly 50 amino acids in length. We'll use the 50aa sequence for this consideration. If the amino acids are randomly assembling, like pulling letters out of the bag, then the odds of a specific protein sequence appearing is about 1 in 5.6 x 10⁷⁰.

Since the RNA World hypothesis is the most popular understanding of chemical origin, if you build a polynucleotide in this way, with the same length, then the odds of any given specific sequence appearing become 1 in 1.3 x 10³⁰. Definitely more likely than peptides, but still extraordinarily unlikely.

You're a trillion times more likely to find that grain of sand.

Those numbers are for the assembly of a single molecule; one protein, one specific RNA molecule. Although the first protocell would presumably have been very simple, multiple copies of each protein or RNA would be still be needed, so those long odds become all the more daunting. Unless the particular machinery of replication was the first on the scene, these molecules would have to be self-replicating. That's certainly a tall order, since self-replication is not generally a feature of either peptides or nucleic acids. (Although there's been some fuss about the success in this field, the gap between our current understanding and what would have been required under these circumstances is substantial.)

A single self-replicating molecule doesn't account for cellular life, though. Many different molecules would be needed for a functional organism to arise.

Minimal Essential Organism

I wrote about the work of Craig Venter on the Minimal Essential Organism for a class back in 2007. In short, they took the simplest bacterium known to man, with a genome of roughly 480 protein-coding genes, and introduced mutations in order to figure out which genes were necessary for the life of the organism. In other words, how much could they simplify the organism before it couldn't survive? In the end, they found that nearly 380 of those genes were essential to life for the bacterium. This is the "minimal essential organism," as they understood it.

On top of that, 29% of those essential genes were of "unknown function." This indicates that nearly a third of the functions essential to life at the molecular level remain unknown to us. (This may have changed in the intervening decade; it's unclear what progress has been made on that front.) Perhaps the first protocell was simpler, but it seems nearly impossible that we could determine how much so when we don't even understand the functions that are essential to life now.

Given the various types of processes involved in molecular life, how do you determine which ones are superfluous?

That's quite a departure from the usual story about the very first cell, in which a few initial molecules self-replicate, errors in replication result in molecules of differing function, and then those functional molecules become trapped in a liposome, resulting in a functional entity which benefits from the segregated space for those molecular functions.

The numbers above for the appearance of the molecules is staggeringly low, but it becomes absolutely mind-boggling when you consider that you have to have 380 specific sequences show up for life to be feasible. That ratio for the likelihood of an RNA-cell appearing becomes less 1 in 10^11,000, a number so ridiculously low I can't even come up with a proper analogy for expressing just how insane that number is.

Conclusions

There are certainly objections to the way I've laid this out. Although the experiments I referenced above for the "simplest" cell describe ideal conditions, the first cell could have been a much simpler molecule; in an environment in which time is not a factor and there are no competing organisms to contend with, the first cell could possibly make do with very few functions. I find this unlikley for the reasons listed above, but it's not a baseless objection.

I've also done a lot of "back of the envelope" calculations, so the math is also suspect; after all, these are numbers for the appearance of a single molecule, but when you have chemicals in solution, you get a very large number of molecular interactions every second. Multiply that by the very large volume of the ocean and the millions of years for life to appear and it seems almost inevitable that life should appear, right?

Not as far as I'm concerned. Although accounting for reaction rate would give more opportunities for functional molecules to appear, that doesn't make improbable events any more probable. While a consideration of the various factors involved in this can quickly spiral into obscuring complexity, consider just the volume feature. The ocean is indeed a very large "reaction vessel," but its size prevents consideration of it as a singular entity. After all, two molecules formed on opposite sides of the planet are unlikely to ever interact. Further, the various hypotheses about the location of the origin of life, such as around thermal vents on the ocean floor or within the pockets of ice crystals, require the practical volume of where life originated to be considerably smaller, limited to the areas conducive to the chemistry at work here.

The time feature isn't really helpful, either. If the Earth is 4.5 billion years old, life is estimated to have shown up anywhere from 400-750 million years after the Earth was formed; perhaps even earlier. Given that water isn't thought to have formed for the first 100 million years, that leaves a window as small as 300 million years for life to form, or 9.5 x 10¹⁵ seconds. That seems like a lot, but given the very small probabilities calculated above, the reaction rate per volume for the formation of these molecules would have to be . . . fast. Very fast.

It could also be said that my understanding of the biochemistry of the RNA world would probably be considered simplistic by experts in the field. It's a broad field with a great deal of published work to support the hypotheses laid out by researchers, and I've barely skimmed the surface. On the other hand, many of the complications I bring up here are acknowledged in the literature. The proposed solutions to these problems often involve hypothetical, unknown nucleic acids that only existed in the primordial world, intermediate molecules that no we haven't seen, or chemical conditions and reactions that we haven't discovered and don't exist in nature any longer. When a hypothesis has to be papered over with imaginary molecules or black box chemistry, I think a measure of doubt is not unreasonable.

In short, you have to have hundreds of very rare molecules forming in a relatively short period of time in close proximity, under very exacting (and possibly unknowable) conditions. Now, I've taken a very brief view of some incredibly complicated topics. I can't say that those events happening are absolutely impossible. I can't pretend that I have a comprehensive understanding of the fields. However, just based on the numbers alone, I think it takes at least as much faith to say that life spawned in the primordial soup as to hold to a religious view of human origins. I would say that these explanations only work when you've started from the conclusion that the answer to these questions must be naturalistic. As Frank Turek says in the very title of his book, I don't have enough faith to believe that this was the origin of life.

Monday, May 08, 2017

On Origins and the Molecular Basis of Life

The Molecular Basis of Life

Doing the Math

Minimal Essential Organism

Other problems

Conclusions

1 comment: