Sunday, June 01, 2014

A Gene Heatmap

Lately I've been using the great tools at genomevolution.org plus custom Canvas API scripts to render colorful heatmaps of aligned genes from phylogeneticaly diverse microorganisms. The following graphic is one such.
Glyceraldehyde-3-phosphate dehydrogenase genes of N=134 bacterial species, arranged in order of gene G+C content (high GC at the top). Hot colors are G and C. Cool colors are A and T.

What are we looking at? This is actually a composite rendering of the glyceraldehyde-3-phosphate dehydrogenase genes (DNA sequence info) from 134 bacterial species. Each gene is painted left-to-right (5' to 3') in a strip 4 pixels tall, with hot colors assigned to DNA bases G and C (guanine and cytosine), and cool colors assigned to bases A and T (adenine and thymine). Wherever there's a G or C, it gets painted red or red-orange. Wherever there's A or T, blue or blue-green. Same gene, 134 versions, varying significantly in G+C content. (The gene GC content ranges from a maximum of 69.2% at the top to 29.4% at the bottom.)

Why glyceraldehyde-3-phosphate dehydrogenase (GAPDH)? No real reason, except that it's a fairly universal (indeed, quite ancient) metabolic enzyme, reasonably compact (making possible a rendering that's not super-wide, as it would be for a larger gene), well-delineated genetically (not a fusion protein or an enzyme with multiple isoforms), and probably representative of a good many core metabolic enzymes. This is the enzyme that catalyzes the sixth step of glycolysis (sugar-breakdown). You may recall from Biochem 101 that the breakdown of glucose proceeds by splitting the twice phosphorylated molecule into two 3-carbon pieces. The triose phosphates in turn get phosphorylated by GAPDH before they transfer a phosphate to ADP to yield ATP, the 5-hour energy drink of all cells everywhere.

Alignment of genes was done via ClustalW in MEGA6 freeware. Rendering of the alignment FASTA file took about two seconds, in the browser, using 133 lines of custom JavaScript.