Hey folks, grad school is tough business. So, in lieu of a real post (not yet, anyhow), I thought I'd share my project for the last few weeks or so. Enjoy!
The minimal genome is a concept that has garnered significant scientific interest in recent years, for a variety of reasons. It is defined as the core, essential genes that an organism requires to sustain life. This is typically a greatly reduced gene set from the organism’s natural genome, although this will vary by conditions. A sulfur-reducing bacterium will have a different minimal genome than one which reduces oxygen, and these will be exceedingly different from any multi-cellular organism. Growth conditions such as nutrients provided by the medium may also play a role. The addition of amino acids to a bacterial medium would render said synthetic genes unnecessary, while providing one type of sugar would render transporters or metabolic genes for other sugars unnecessary. Thus, while the concept of a minimal genome can be generalized for total life processes, it is still dependent on the defined experimental conditions under which the list of essential genes is determined.
Various species of bacteria have been examined for their minimal genome, though some other organisms have been the subject of such studies as well, such as S. cerevisiae and C. elegans. Bacteria remain the simplest organisms to study, as their genomes are usually in a state of reduction already. Many will have less than 8,000 protein coding genes, though those with less than 1,000 genes already have a metabolically simplistic lifestyle, making them ideal candidates for minimal genome studies. Examination of such organisms has led to tentative estimates of minimal gene sets, ranging between 200-500 genes.1
Why the minimal genome?
There is much information that can be derived from an understanding of the minimal genome. With an understanding of what essential enzymatic functions are necessary for life to take place, progress towards understanding pre-biotic existence and the first bacterial organisms can be made. Additionally, species specific experiments in deriving the minimal genome can grant further clues as to the developmental path a given species has taken.1,2
A more specific use for the minimal genome, however, is the Mycoplasma laboratorium envisioned by Craig Venter. Venter hopes to use the determination of a minimal genome to synthesize a minimal bacterial organism. By utilizing a “blank slate” bacterium, researchers could introduce genes of their choosing with, purportedly, minimal systems interactions. This would allow for the efficient production of materials through a biological system, according to Venter. Biofuels such as ethanol are the most popular possibility at the moment, but other potential M. laboratorium inserts include genes for plastics or pharmaceutical drugs. It is quite possible that the next generation of biologically synthesized materials will come from plugging genes into a minimal organism.3
Venter recently reported an interesting step towards the creation of a synthetic organism. Researchers extracted the genome of M. mycoides, with little accompanying protein, and utilized PEG-mediated insertion to transplant it within M. capricolum cells. Antibiotic resistance coded in the donor genome selected for successful transformants. The result was recipient microbes with both the phenotype and genotype of the donor organism. This is a promising result, showing that a complete genome can be inserted into an organism and can still functionally express its own genes.4
Researching the minimal genome
There are many potential ways to study and understand the minimal genome. Some would start by defining the categories of either essential functions or essential genes. At the most basic level, cellular processes can be broken down into metabolism, compartmentalization, and information transfer. Whether the groupings are defined by specific functional categories (Figure 1) or more generalized categories of function (Figure 2), proper division of the genes is essential for understanding what properly contributes to life, as different categories may shape our understanding of what constitutes an “essential” function or gene.2,5
In general, two different approaches may be taken towards establishing a minimal genome. The first of these is a bioinformatics approach. Researchers have compared the genomes of various organisms, particularly bacteria, and examined which genes, or at least gene functions, are conserved across a wide variety of species. Those functions which are nearly universal could be considered essential, as nearly every form of life retains them in some form. Some researchers distinguish between “persistent” genes, those present in a select number of genomes, and “conserved” genes, those seen everywhere. This is usually an attempt to minimize the possibility of missing an essential function because it is not structurally conserved.1,2
A different approach to this is to attempt to use computational models of cellular pathways to determine a minimal genome. Though a system-wide approach is thought to be unfeasible, combining models of individual systems, especially metabolic networks, has seen some success. However, one method of approach at a system-wide calculation uses a minimal model and then derives the genetic pathways backwards, looking at biochemical “modules” as they interact with the components of the model. As with any computational approach, however, experimental verification of any data is necessary.1
The other approach to determining the minimal genome is experimental, by reducing the genome of existing organisms. Knock out a gene, and the survival of the clone will verify a non-essential gene, while knockouts of essential genes will be fatal. The method of distinguishing transformants is important, as extra gene copies may allow a normally fatal knockout to be survivable. For example, if a protein is secreted from the bacteria, a knockout of that gene may not be fatal as long as there is another bacterium in the population that has a wild-type copy of the gene.1
There are three general approaches to gene knockout, all of which are somewhat similar.
In this method of deletion, a plasmid is introduced to the cell with two homologous “arms” whose sequences flank the sequence to be deleted in the genome. After the plasmid is integrated into the host genome, intramolecular recombination can occur between either pair of homologous arms, resulting in either the expulsion of the plasmid DNA or the “scarless” deletion of the sequence of interest (Figure 3). Suitable screening for the appropriate event is dependent upon the deleted gene and the original marker used for clonal selection.1
This method of deletion utilizes linear DNA fragments rather than plasmids. The principle remains the same, with flanking homology arms allowing recombination events to excise genomic DNA in favor of the insert. There are a variety of methods which follow from that in order to remove the sequences between the homology arms in the inserted fragment. Regardless, the end result is a knockout with little to no “junk DNA” left over from the insertion, leaving a clean knockout (Figure 4). One attraction to this method is that large fragments of DNA can be excised by this method, eliminating multiple genes at the same time if desired.1
The final deletion method utilizes transposons, DNA sequences which can excise themselves from a plasmid or genome and reinsert into another location by using a specialized set of enzymes. This method is random, lacking the possibility of a targeted knockout. In addition, it adds new DNA to the genome, precluding multiple knockouts in one transformant due to the undesirability of introducing large amounts of new material into the genome. However, this method is likely the simplest of the three, not requiring specially designed plasmids or PCR products for each gene to be tested.1
Essential genes of a minimal bacterium
In their 2006 paper, Venter et al examined Mycoplasma genitalium for a minimal genome experiment. M. genitalium is a pathogen of the human urogenital tract, a living arrangement mild enough to be conducive to a reduced lifestyle. It has a genome 580kb in size, coding for 482 proteins, giving it one of the smallest genomes discovered. This already reduced set-up, along with the low genetic redundancy characterized by few of the genes being found in paralogous families (6%), makes M. genitalium a suitable candidate for minimal genome studies.5,6
The study in question is a follow-up of a 1999 study attempting to examine the same question. Having determined that their previous results were not reliable due to the labeling as non-essential of genes known to perform essential functions, the group sought to correct the mistakes of the 1999 methods and analyses.5
The researchers introduced a plasmid containing a transposon (Tn4001) via electroporation in the M. genitalium liquid cultures. These transformants would then be spread on a plate and monitored for colony growth, with allowances of up to four weeks for colonies to develop so as to catch slow-growing clones. Antibacterial resistance coded by the plasmid allowed for selection of successful transformations.5
To verify insertions, a single-primer PCR strategy was utilized. After isolating the genomic DNA of a colony, a primer of the sequence in the transposon was elongated and then sequenced. By BLASTing the sequence found, the researchers were able to verify successful transposon insertions and determine which gene, and the location within the gene, the transposon disrupted.5
If there was any genetic redundancy, such as multiple copies of a gene within the colony, an essential gene knockout could be non-fatal. To verify that no extra copies of the knockout genes were present, the researchers performed PCR with primers for the gene found to be knocked out in the BLAST searches. If there is only one copy of the gene, there would be no PCR product. The group found, unfortunately, that each colony tested seemed to display wild-type genotype.5
To analyze this occurrence, the group performed quantitative PCR on the colony DNA using one primer for the knockout gene and one for the transposon. This led to the discovery that most of the colonies contained at least two different knockouts. The researchers concluded that the M. genitalium, coming from the liquid culture, had both a high transformation rate and a tendency to aggregate when spread to solid media.5
To correct for this, the group performed “filter cloning,” running the colonies through a 0.22μm filter and then replating the clones. This successfully led to the isolation of clones from the original colony into subcolonies. Transposon insertion was then verified as with the primary colonies. Colonies that were considered useful for analysis contained <1%>5
The researchers examined 3,321 colonies and subcolonies, with 62% of the primary and 82% of the secondary colonies providing sequence data allowing them to map a transposon insertion. From these colonies, 2,462 insertion sites were mapped to the genome. In order for an in order for an insertion to be considered a knockout, it had to appear between the first three codons of the 5’ end of the gene and before the last 20% of the 3’ end of the coding sequence. This was to minimize the identification of a gene as non-essential which was merely tolerant to the insertion.5,6
As the researchers continued to examine colonies and subcolonies, the number of new insertion sites and successful knockouts reached a plateau, indicating that they were reaching a “saturation” point of non-lethal inserts (Figure 5). Of the 2,462 insertion sites identified, 84% were mapped to protein coding genes. No RNA coding sequence (rRNA, tRNA, etc.) insertions were found. From this, an initial estimate of 100 non-essential genes, out of 482 protein-coding genes, was determined.5
Several sites of transposon insertion were encountered much more frequently than others (Figure 6). Though the original authors examine this amongst all colonies as well as with only primary or secondary colonies, the data still shows a very high rate of insertion within very specific genes. Four of the genes listed in figure 6 (MG339, 414, 415, 428) represent nearly 31% of the total clonal pool. Correlation of these genetic “hot spots” to figure 1 shows that the size of the genes is not an issue.5
The authors do not offer much in terms of an explanation for this phenomenon. They suggest that the clones for those insertions grow fast enough that they are discovered often. They also suggest that the transposons preferentially jump to the locations in question, though without any explanation as to why that might occur. Is there some sequential or structural feature of the loci in question that makes them attractive to the transposons? It is an area for possible further work.5
While the authors did find 100 non-fatal knockouts, they bring adjustments to this list because of uncertainties in their findings, preferring to offer a conservative estimate of the minimal genome. Several factors went into ruling out genes from the list. Since they did not verify that a knockout created no transcript or protein product, they relied on their insertion criteria listed above. However, if a knockout appeared in a primary colony as part of a mixed population but did not appear in a secondary colony after filter cloning, the authors ruled it to be an essential gene. It is likely that the knockout, in those situations, was due to a gene acting in trans, either from an excreted metabolite or an excreted protein.5
Additionally, three proteins deemed critical for cell survival were listed as essential even though successful knockouts were achieved. Knockouts for three genes encoding phosphate transporters were discovered, though this is an unusual result. A cell must import phosphate, so it would be unlikely that all three genes could together be unessential. Knockouts were also found for three genes which putatively encode phosphonate transporters. The authors surmise that the various transporters might take on the role of the knockout due to relaxed substrate specificity. Another possibility they suggest is a pathway where phosphate and phosphonate are freely converted, though they do not assign a gene to this function. However, because of this they discrepancy the authors assign all three phosphate transporter genes essential status.5
The final measure used in ruling out knockouts was limited redundancy. Though M. genitalium does have few genes shared across paralogous families, two paralogous families were completely knocked out. This included families for lipoproteins (MG185, 260) and glycerophosphoryl diester phosphodiesterases (MG293, 385). Again, the successful knockouts might be due to the ability of the remaining family member to fill the role of the knockout. Still, the authors included one protein from each family as a conservative estimate of the essential genome. This increases their count of essential genes to 387 out of 482.5
In the 1999 study, the researchers found around 130 non-essential genes. Only 67 of those genes were part of the list of knockouts found in the 2006 study. This is a significant difference, one which the authors attempt to explain. Previously, they found knockouts for seemingly essential genes such as tRNA synthetases (MG345, 455) or DNA polymerase III subunit α (MG261). The difference in procedure for the current study seems to account for this discrepancy, however. In the 1999 study, the authors did not plate their transformants on solid media but grew them solely in liquid pools. This would allow for genes to work in trans and show essential genes to be survivable knockouts, and is thought to be the case for a group of lipoproteins, as well as an extracellular nuclease. While this is the most significant change, a few other minor alterations to the previous experimental set-up are thought to have factored into the different list. The use of a different medium, as well as a different antibiotic used for selection, could have led to the requirement of a few genes such as lipoprotein MG395 or lipase/esterase MG310.5,6
On the other hand, the current study also found “non-essential” genes which would seem to be crucial to the metabolism of the genome. The genes in question are related to DNA recombination and repair, and their absence should be detrimental to the bacteria over time. It is thought that the colonies were not examined over a long enough period of time for these missing genes to become critical, or that another enzyme was able to fulfill the role of each individual knockout.5
Several other strange results arose as well. No insertions were found for any of the cytoskeletal proteins, which is at odds with previously reported work. Another group apparently used a recombination method to excise a cytoskeletal protein HMW2 (MG218), and did not find the deletion to be fatal. The authors of the current study do not have an explanation for the discrepancy. The size of the gene is not the issue, as it is relatively large compared to most other M. genitalium genes. The authors guess that the knockout could hamper growth speed enough that it was not found in their mutant screens. This seems unlikely, as their protocol involved watching for colony growth for a period of four weeks. There is no mention of unusual replication time for the transformant in the previous work as well.5,7
There were several mutations which did affect growth speed of the colonies, though the authors are unable to explain the reasons for these. Knockouts for dihydrolipoyl dehydrogenase (MG271), a component of the pyruvate dehydrogenase complex, grew ~20% more slowly than wild-type M. genitalium. The enzyme normally oxidizes dihydrolipoamide to lipoamide, a redox component for the generation of acetyl-CoA. It seems unlikely that a cell would be able to function if this portion of the TCA cycle were to be absent. Since the knockouts are not verified by gene products, it is possible that the insertion results in an enzyme of reduced function rather than a total knockout. It is also possible that, once again, reduced substrate specificity allows another dehydrogenase to fill the role of the knockout.5
Other insertions resulted in growth speeds ~20% faster than wild-type. These included lactate/malate dehydrogenase and two conserved, hypothetical proteins (MG460, 414, 415). The authors do not suggest any sort of function for the hypothetical proteins. They do mention that there are enzymatic functions known to be performed by the bacteria which have not be assigned to specific genes yet, although it would be difficult to assign any function on the basis of altered growth speed alone. The lactate/malate dehydrogenase is an unusual case. There is nothing outstanding about the reaction catalyzed by the enzyme which suggests that it would slow down the growth of the organism so severely. It also seems unlikely that these reactions are redundant or unnecessary for cellular proliferation. The authors suggest that the gene acts as a metabolic “brake” on bacterial growth. Such a mechanism may have evolved to protect the organism, as fast growth would attract the attention of the host immune system and destroy the bacteria.5
Based on the listings of essential and non-essential genes, it would seem that there is still much to be discovered and understood about cellular life at the molecular level. The authors list all of the genes found in each list as supplemental data, but they provide a summarizing figure (Figure 7) for several metabolic pathways, indicating with black boxes those that were non-essential in this study. Many proteins end up in this category when a cursory analysis might indicate otherwise. For example, a subunit of ATPase (F0F1-ε) was found to be non-essential, while conventional wisdom would say that all the subunits of such a critical enzymatic complex would be necessary for cell survival. In the figure, several pathways are listed with purported functions that ought to be present but have not been assigned to specific genes or proteins. From the 100 non-essential genes, 48 were from hypothetical or unknown genes. It is possible that those genes represent non-existent products and thus would be unnecessary for the cell. However, 110 of the 382 essential genes came from unassigned or hypothetical proteins. Though it is possible that many of these are simply proteins with known but unassigned function, it is also possible that many of these perform functions that have yet to be characterized. When nearly a third of the minimal gene set has unknown function, the question naturally arises as to how much is actually known about the fundamental metabolic functions of a cell. Further study into this, as well as the other metabolic phenomena, could greatly enhance our understanding of the biochemical lifestyle of the cell.5
The authors produced a list of 382 essential genes not found in insertion knockouts. This list is increased to 387 through conservative estimate of gene necessity. This brings the group much closer to the minimal genome than previous studies have, by means of the stringency of the knockout criteria and the saturation of their mutations. Unfortunately, the list is merely putative. A thorough examination would examine each knockout for the presence or absence of protein product or enzymatic activity. Additionally, sequential knockouts would be more helpful in identifying the actual list of essential genes. As seen above, many of these knockouts may be tolerable when only one is gone, but if a second “non-essential” gene takes up the role of an absent protein, the two do not belong on the list together. A cell may not actually function in the absence of all 95 proteins designated non-essential in this study.5
Venter’s goal of a synthetic organism is brought closer with this, as his next logical step would be to synthesize a genome containing the list of essential genes found in this study. Venter’s group actually accomplished this goal, announcing it to media sources in early October. Combined with the successful demonstration of the ability to transplant a genome into a recipient bacterium, it would seem that the creation of Mycoplasma laboratorium is imminent.8 It remains to be seen, however, whether or not it will become the industrial powerhouse Venter purports it to be.
1. Fehér, T., Papp, B., Pál, C., & Pósfai, G. (2007) Chem. Rev. 107, 3498-3513.
2. Danchin, A., Fang, G., & Noria, S. (2007) Proteomics 7, 875-889.
3. Highfield, R. “Man-made microbe ‘to create endless bio-fuel.’” Telegraph.co.uk. August 6, 2007. October 9, 2007. http://www.telegraph.co.uk/news/main.jhtml?xml=/news/2007/06/08/nbiofuel108.xml
4. Lartigue, C., Glass, J.I., Alperovich, N., Pieper, R., Parmar, P.P., Hutchison III, C.A., Smith, H.O., & Venter, J.C. (2007) Science 317, 632-638.
5. Glass, J.I., Assad-Garcia, N., Alperovich, N., Yooseph, S., Lewis, M.R., Maruf, M., Hutchison III, C.A., Smith, H.O., & Venter, J.C. (2006) Proc. Natl. Acad. Sci. USA 103, 425-430.
6. Hutchison III, C.A., Peterson, S.N., Gill, S.R., Cline, R.T., White, O., Fraser, C.M., Smith, H.O., & Venter, J.C. (1999) Science 286, 2165-2169.
7. Dhandayuthapani, S., Rasmussen, W.G., & Baseman, J.B. (1999) Proc. Natl. Acad. Sci. USA 96, 5227-5232.
8. Pilkington, E. “I am creating artificial life, declares US gene pioneer.” The Guardian October 6, 2007. October 9, 2007. http://www.guardian.co.uk/science/2007/oct/06/genetics.climatechange
Figure 1.5 Physical map of insertions found within the M. genitalium genome. Genes are colored by the designated key. Arrows pointing down indicate an insertion from the 1999 study, while arrows pointing up indicate an insertion from the 2006 study. Red, filled arrows indicate a location where 10 or more insertions were found.
Figure 2.2 An alternative system for classification of basic life functions.
Figure 3.1 Suicide plasmid-mediated gene deletion. See Reference 1 for more details.
Figure 4.1 Linear DNA-mediated gene deletion. There are several different means of achieving complete deletion depending on the nature of the flanking sequences used. See Reference 1 for more details.
Figure 5.5 Graph showing the number of insertion sites and disrupted genes versus total acquired sequences. The data indicates a saturation point for the insertions.
Figure 6.5 The frequency of insertions (y-axis) at physical locations within the genome for both colonies and subcolonies. Genes with a particularly high frequency of insertion are labeled.
Figure 7.5 A summary of metabolic pathways utilized by M. genitalium. Black boxes indicate non-essential genes. Orange names are known proteins, while green names are known functions not yet assigned to a gene. Transporters are color-coded by system. See Reference 5 for more details.