Unreasonable effectiveness of JPEG
A colleague at the Institute is working on compressive sensing and its applications in the sensing of the radio frequency spectrum. Since what ever he comes up with is ultimately meant to be implemented on VESNA and the radio hardware I'm working on, I did a bit of reading on this topic as well. And as it occasionally happens, it led me to a completely different line of thought.
The basic idea of compressive sensing is similar to lossy compression of images. You make use of the fact that images (or many other real-world signals) have a very sparse representation in some forms. For images in particular the most well known forms are discrete cosine and wavelet transforms. For instance, you can ignore all but a few percent of the largest coefficients and a photograph will be unnoticeably different from the original. That fact is exploited with great success in common image formats like JPEG.
Image by Michael Gäbler CC BY 3.0
What caught my attention though is that even after some searching around, I couldn't find any good explanation of why common images have this nice property. Every text on the topic I saw simply seemed to state that if you transform an image so-and-so, you will find out that a lot of coefficients are near zero and you can ignore them.
As Dušan says, images that make sense to humans only form a very small subset of all possible images. But it's not clear to me why this subset should be so special in this regard.
This discussion is about photographs which are ultimately two-dimensional projections of light reflected from objects around the camera. Is there some kind of a succinct physical explanation why such projections should have in the common case sparse representations in frequency domain? It must be that this somehow follows from macroscopic properties of the world around us. I don't think it has a physiological background as Wikipedia implies - mathematical definition of sparseness doesn't seem to be based on the way human eye or visual processing works.
I say common case, because that certainly doesn't hold for all photographs. I can quite simply take a photograph of a sheet of paper where I printed some random noise and that won't really compress that well.
It's interesting if you think about compressing video, most common tricks I know can be easily connected to the physical properties of the world around us. For instance, differential encoding works well because you have distinct objects and in most scenes only a few objects will move at a time. This means only a part of the image will change against an unmoving background. Storing just differences between frames will obviously be more efficient than working on individual frames. Same goes for example for motion compensation where camera movement in the first approximation just shifts the image in some direction without affecting shapes too much.
However, I see no such easy connection between the physical world and the sparseness in the frequency domain, which is arguably much more basic to compressing images and video than tricks above (after all, as far as I know most video formats still use some form of frequency domain encoding underneath all other processing).