A line I picked up from one of my high school Computer Science teachers: “Graphics are memory hogs.” Computers represent pictures as a big grid of colors. Each cell in this grid is called a pixel (short for “picture element”), and usually has an ordered triple of numbers representing what color it should be. When these pixels are very small and numerous, the picture looks less like a collection of squares, and more like the picture represented. A similar effect is used to shade and color graphics in newspapers; look closely enough and you’ll notice that the colors are made up of lots of tiny colored dots. Let’s look at an example…
Here, we see trees, bushes, Mario, his friend, and a monster, all pretty clearly. Now, let’s look closer at Mario’s head:
Looking closer, we clearly see the grid. Each of these squares shows the average of all the colors that are underneath it in the “actual” picture being rendered.
Why do we represent pictures like this? For one, it’s mathematically easy to deal with pictures like this when programming. In most of today’s computing, we only represent colors in images with a set of three numbers between 0 and 255, one for red, one for green, and one for blue. So (255, 0, 0) would be red, (0, 0, 0) would be black, (255, 128, 128) would be pink, and so on. I pose a similar question for representing colors: why do we limit the resolution of each frequency to 256 digits?
The answer to both of these questions, of course, is that “the human eye isn’t bothered by it.” We can’t see much of a difference between the shades of blue that would be represented by (0, 0, 190) and (0, 0, 190.5), so we don’t bother with it. Note: This isn’t entirely true; in some cases, like in the 3D programming interface “OpenGL,” we represent the strength of the red, green, and blue channels by decimal numbers between 0.0 and 1.0. So (0.743, 0.16513, 0.1665) would be valid for a “faded red” sort of color. We still don’t have infinite precision with this method, but, like I’ve said, we don’t really need it.
Similarly, we’re usually okay with seeing a grid of square pixels to represent pictures, text, and so on, as long as the pixels are really tiny. Even a modern “high definition” television, which is sold to make everything look realistic, uses a grid 1,920 pixels wide and 1,080 pixels tall to represent its picture–note that this is certainly an improvement from the old “standard definition” of 640 pixels wide and 480 pixels tall, but you might not have noticed the blockiness in it because most standard definition televisions blurred the lines between pixels by nature (there’s also interleaving, but I won’t get into that here).
But is a big matrix of 3-tuples of bytes the most efficient way to represent an image, never mind being easy to work with? The example picture I gave above is 800 pixels wide and 600 pixels tall. Each color takes three bytes to store its (red, green, blue) values. So 800×600 pixels × 3 bytes/pixel = 1,440,000 bytes. That’s one and a half million integers, each between 0 and 255, to give us a decent-sized picture. Certainly, a picture being “worth a thousand words” holds in cases like these.
Thankfully, data compression can help. Compression is a way to “summarize” a file, cutting out redundancies wherever practical, to achieve a smaller file that still sufficiently represents the original file. An image compression algorithm might say, “there’s a rectangle at coordinates (63, 75) that is 104 pixels wide and 52 pixels tall filled with the color (37, 160, 67).” Roughly, that’s two bytes to store the position, two bytes to store the dimensions, and three bytes to store the color–let’s say it takes one byte to store the instruction to draw a rectangle, and we are using 1+2+2+3 = 8 bytes to store a piece of an image that would’ve normally taken 104×52×3 = 16,224 bytes.
If the image uses 256 or fewer colors, we can compress even better, using a “palette” of sorts. Basically, all the different colors that are used in the image are stored in the palette, which will be 256×3 bytes long, at most. Then, in storing the image, instead of storing the color over and over again for each pixel, using three bytes apiece, we can simply use one byte to refer to which color in the palette. So instead of saying (26, 45, 23), we can say “the 14th color in the palette” and, in some cases, save some space.
We can go even further on that idea. Say the image uses more than 256 colors. We can “merge” many similar colors to an average color in the palette, losing some information about the picture, but letting it fit the “256 color palette” paradigm from the paragraph above. Nothing is particularly special about the number 256, mind you, other than that it’s the size of the palettes in the popular image compression formats GIF and PNG-8 (and that it’s the upper bound of numbers that fit in one byte). 16-color palettes are commonly used in video games on consoles like the SNES, Sega Genesis, and Nintendo DS.
Another interesting way to save space is to only save half the colors in the picture. Perhaps surprisingly, you can still keep the compressed image looking reasonably close to the original in this way. Using a method called S3TC (sometimes called “DXTn”), only two colors out of every 2×2 pixel block are stored, and the remaining two colors are “guessed” based on an average of the two that were stored. This is a very popular way to compress textures used in modern video games; just about every modern graphics processor has support for S3TC built into its hardware. In fact, Paper Mario 2 uses this format to store most of its textures:
There are even more things we can do to compress a picture. The JPEG format, developed by the Joint Photographic Experts Group, uses some advanced tricks that work well on photograph-like images to save space. It often introduces artefacts, which are most noticeable when more aggressive settings are used to compress. Often enough, though, these aren’t too noticeable on photographs and photograph-like images. JPEG is horribly distorting for non-photographic images, especially those containing sharp transitions between colors, like text.
All of these schemes are retrofitted over the existing “grid” I described earlier. The “lossy” schemes (shrinking the palette, guessing pixels, and using JPEG-like tricks) lose even more information on top of the existing loss of any information that didn’t fit into w×h discrete squares of ℤ256 triplets. Only images that are intended to be made of such squares to begin with can be represented “losslessly” in such a grid. Note that these do exist, many artists produce what is called “pixel art” designed exactly for this medium.
However, in some cases, we can represent a picture even more accurately than raster graphics, often using even fewer bytes to do so. This is possible when the picture is drawn as a “vector” image. A vector art program stores the geometry of what is being drawn mathematically, so that curves, gradients, and so on scale nicely to any size when rendered. Vector formats store exactly what was drawn, describing the curves, edges, and colors exactly as the artist intended, rather than storing an approximation of it in pixels.
Such an approach is only possible, of course, for graphics drawn on a computer. Digital scanners and cameras with limited DPI and resolution can only store a certain number of “dots.” Perhaps in the future, a more advanced sort of camera could store a lower level of information, describing how the light passed through its lens differently, in such a way that the picture could be reproduced in a process similar to rasterizing a vector image. This is probably rather far-fetched, though; I’m not exactly a photography expert.
TODO: Discuss similarities to audio and music.



Thought this was interesting, but didn’t really fit in the post: Consider each pixel in the grid to be a special case of that “palette-reduction” sort of compression I described, for everything in the scene that appears through the window of its “square,” in which the palette is reduced to a size of 1.
Not sure I follow that comment madcs…
(oo, and when is part 2 coming out!??!?)
Next time I have time to breathe. Probably after my Operating Systems midterm.