clock_gettime is slow

..on the pandaboard. I just measured it , and each call to clock_gettime(CLOCK_MONOTONIC) takes approximately 800ns . On a core 2 quad it takes 27ns , and it has also a muuuuchh better resolution that the ARM one, which has a resolution of 1/32787th second. Using CLOCK_REALTIME or whatever does not improve the timings.

NEON optimized single precision sin / cos / exp and log

This is a translation of the SSE version to ARM NEON intrinsics. Je profite de l'occasion pour dire que : the official ARM NEON documentation is quite shitty.