My first somewhat serious look into the ARM architecture was at the CCC Camp, where I attempted to extract the secret signing keys via a buffer overflow exploit in the r0ket. Recently I started working on a bit more serious project that also involves copious amount of C code compiled for ARM and I thought I might give a quick overview of the little important details I learned in the last few weeks.
First of all, targeting ARM is tricky business. As the Wikipedia article nicely explains, you have multiple families (versions) of ARM cores. They are slightly incompatible, since instructions and features have been added and deprecated over time. Recent cores are divided between application, real-time and microcontroller profiles. You find application cores (say Cortex-A9) in iPads, phones and other gadgets that need a fancy CPU with cache, SMP and so on, while microcontrollers obviously omit most of that. The most popular microcontroller profile core right now seems to be Cortex-M3 - where Cortex is a code name for ARM version 7 and M3 stands for the third microcontroller profile. However, this is still a very broad term, since these cores get licensed by different vendors that put all kinds of their own proprietary periphery (RAM, non-volatile storage, timers, ...) around them. So stating the fact that you target Cortex-M3 is much less specific than, say, targeting x86 where you have de-facto standards for at least some basic components around the CPU. That's why the current Linux kernel lists 64 ARM machine types compared with practically just one for x86.
Speaking of Intel's x86, people like to say how it is full of weird legacy quirks, ARM has some of its own. One thing that caught me completely by surprise is the Thumb instruction set. It turns out the that original dream of a clean, simple design with a fixed instruction length didn't play out that well. So ARM today supports two different instruction sets, the old fixed-length one and a new variable-length "Thumb" set that mostly has a one-to-one relation to the old one. In fact, the CPU is capable of switching between instruction sets on function call boundary, so you can even mix both in a single binary.
To add to the confusion, there are two function call conventions, meaning you have to be careful when mixing binaries from different compilers. There's also a mess regarding hardware floating point instructions, but thankfully not many microcontrollers include those. Debian Wiki nicely documents some of the finer details concerning ARM binary compatibility.
All of these issues mean there's a similar state of chaos regarding cross-compiler toolchains. After what seems like many discussions Debian developers haven't yet come to a consensus on what kind flavor to include in the distribution. That means you can't simply apt-get install arm-gcc, like you can for AVR. Embedian provides some cross-compiler binaries, but those are targeted at compiling binaries for ARM port of Linux, not bare-bones systems for microcontrollers. You can build stripped-down binaries with such a compiler, but expect trouble since build systems for microcontroller software don't expect that. From what I've seen the most popular option by far appears to be CodeSourcery, a binary distribution of the GNU compiler toolchain. The large binary blob installer for Linux made my hair stand up. Luckily there's an alternative called summon-arm-toolchain which pieces together a rough equivalent after a couple of hours of downloading and compiling. If this particular one doesn't do the trick for you, there's a ton of slightly tweaked GitHub forks out there.
By the way, GCC has a weird naming policy for ARM-related targets. For instance, CodeSourcery contains arm-elf-gcc and arm-none-eabi-gcc (where arm-elf and arm-none-eabi refer to the --target configure option when compiling GCC). Both will in fact produce ELF object files and I'm not sure what the actual difference is. Some say that it's the ABI (function call convention) and in fact you can't link binaries from one with binaries from the other. However you can control both the ABI and the instruction set at run-time through the GCC's target machine options. Other than that, they seem to be freely interchangeable.
For uploading the software to the microcontroller's flash ROM I'm using the Olimex ARM-USB-OCD with the openocd, which also seems to be a popular option. It allows you to program the microcontroller through JTAG and is capable of interfacing with the GNU debugger. This means you can upload code, inspect memory and debug with single-stepping and break points from GDB's command line in much the same way you would a local process. There are some quirks to get it running though, namely OpenOCD 0.5.0 doesn't like recent versions of GDB (a version that works for sure is 7.0.1).
I think this kind of wraps it up. As always, a reference manual of the particular processor you use at the reach of a hand is a must. When things stubbornly refuse to work correctly it's good to fall back to a disassembled listing of your program (objdump -d) to check for any obvious linking snafus. And, when trust in all other parts of your toolchain fails and sharks a circling ever closer, a trusty LED is still sometimes an indispensable debugging equipment.