Double pendulum simulation
One evening a few years ago I was sitting behind a desk with a new, fairly powerful computer at my fingertips. The kind of a system where you run top and the list of CPUs doesn't fit in the default terminal window. Whatever serious business I was using it for at the time didn't parallelize very well and I felt most of its potential remained unused. I was curious how well the hardware would truly perform if I could throw at it some computing problem that would be better suited for a massively parallel machine.
Somewhere around that time I also stumbled upon a gallery of some nice videos of chaotic pendulums. These were done in Mathematica and simulated a group of double-pendulums with slightly different initial conditions. I really liked the visualization. Each pendulum is given a different color. They first move in sync, but after a while their movements deviate and the line they trace falls apart into a rainbow.
Image by aWakeOfBuzzards
The simulations published by aWakeOfBuzzards included only 42 pendulums. I guess it's a reference to the Hitchhiker's Guide, but I thought, why not do better than that? Would it be possible to eliminate visual gaps between the traces? Since each simulation of a pendulum is independent, this should be a really nice, embarrassingly parallel problem I was looking for.
I didn't want to spend a lot of time writing code. This was just another crazy idea and I could only rationalize avoiding more important tasks for so long. Since I couldn't run Mathematica on that machine, I couldn't re-use aWakeOfBuzzards's code and rewriting it to Numpy seemed non-trivial. Nonetheless, I still managed to shamelessly copy most of the code from various other sources on the web. For a start, I found a usable piece of physics simulation code in a Matplotlib example.
aWakeOfBuzzards' simulations simply draw the pendulum traces opaquely on top of each other. It appears that the code draws the red trace last, since when all the pendulums move closely together, all other traces get covered and the trail appears red. I wanted to do better. I had CPU cycles to spare after all.
Instead of rendering animation frames in standard red-green-blue color planes, I instead worked with wavelengths of visible light. I assigned each pendulum a specific wavelength and added that emission line to the spectrum for each pixel it occupied. Only when I had a complete spectrum for each pixel I converted that to a RGB tuple. This meant that when all the pendulums were on top of each other, they would be seen as white, since white light is a sum of all wavelengths. When they diverged, the white line would naturally break into a rainbow.
For parallelization, I simply used a process pool from Python's multiprocessing package with N - 1 worker processes, where N was the number of processors in the system. The worker processes solved the Runge-Kutta and returned a list of vertex positions. The master process then rendered the pendulums and wavelength data to an RGB framebuffer by abusing the ImageDraw.line from the Pillow library. Since drawing traces behind the pendulums meant that animation frames were not independent of each other, I dropped that idea and instead only rendered the pendulums themselves.
For 30 seconds of simulation this resulted in an approximately 10 GB binary .npy file with raw framebuffer data. I then used another, non-parallel step that used Pillow and FFmpeg to compress it to a more reasonably sized MPEG video file.

(Click to watch Double pendulum Monte Carlo video)
Of course, it took several attempts to fine-tune various simulation parameters to get a nice looking result you can find above. This final video is rendered from 200.000 individual pendulum simulations. Initial conditions only differed in the angular velocity of the second pendulum segment, which was chosen from a uniform distribution.
200.000 is not an insanely high number. It manages to blur most of the gaps between the pendulums, but you can still see the cloud occasionally fall apart into individual lines. Unfortunately I didn't seem to note down at the time what bottleneck exactly caused me not to go higher than that. Looking at the code now, it was most likely the non-parallel rendering of the final frames. I was also beyond the point of diminishing returns and probably something like interpolation between the individual pendulum solutions would yield better results than just increasing the number of solutions.
I was recently reminded of this old hack I did and I thought I might share it. It was a reminder of a different time and a trip down the memories to piece the story back together. The project that funded that machine is long concluded and I now spend evenings behind a different desk. I guess using symmetric multiprocessing was getting out of fashion even back then. I would like to imagine that these days someone else is sitting in that lab and wondering similar things next to a GPU cluster.