Here are two small benchmarks that I have just run on an Intel Atom N270, running at 1.6GHz, and a ARM Cortex A9 castrated by nvidia (the neon unit has been ablated in tegra2). The arm is a dual-core cpu running at 1GHz. Both are running ubuntu, the compiler used being gcc 4.4.

First bench: compilation of a very large (250000 lines) c++ file. This stresses the cpu and memory, but not the swap (the ARM one does not even have any swap). The compilation consumes ~300MB. This is of course a single-threaded test.

  • Atom: 1m15 sec.
  • Cortex: 1m49 sec.

That is a ratio of 1.44. gcc 4.4 on the Atom is much faster than on the cortex A9 (this is the gcc version of ubuntu maverick, so it is built for ARMv7).

The second test compares the anemic VFP floatting point unit of the Cortex A9 with the old Atom fpu. This is a typical linear system solve, in double precision, with pretty straighforward scalar code without any specific optimization or tricks, repeated 100 times. Standard compiler optimisations ( -O3 -ffast-math , with also -mfpu=vfpv3-d16 -march=cortex-a9 for the ARM)

  • Atom: t=4.68 sec.
  • Cortex: t=6.16 sec.

The ratio is 1.31 , so basically the same as the gcc compilation bench. Clock for clock, the cortex is slightly faster than Atom, but the Atom has a higher clock. And it still has its ballsSIMD unit, which will allow any Atom to humiliate the tegra2 cpu in any single precision floating point benchmark.