After a couple of tests, it turns out that the very simple
```C
#include <limits.h>
inline void memzap(void *dest, unsigned long count) {
asm( "cld"
# if ULONG_MAX == 0xffffffff
"\n" "andl $3, %%ecx"
"\n" "rep stosb"
"\n" "movl %%ebx, %%ecx"
"\n" "shrl $2, %%ecx"
"\n" "rep stosl"
# else
"\n" "andq $7, %%rcx"
"\n" "rep stosb"
"\n" "movq %%rbx, %%rcx"
"\n" "shrq $3, %%rcx"
"\n" "rep stosq"
# endif
: "=c" (count), "=D" (dest), "=b" (count)
: "c" (count), "D" (dest), "b" (count), "a" (0)
);
}
```
is the fastest way to zero out a large block of memory, which is not very surprising. It is about 4 to 5 times faster than `memset` and about as fast as `new []`, if I can trust @tobi on that matter. I tried using MMX registers, but anything that involves `actually` looping over the memory region will be about as fast as `memset`. The only way to get a bit of speed is using the `rep` opcode.
<b>Tiny Edit:</b> The above code is much more safe to compile on both 64 and 32 bit computers.
For reasons I have not yet been able to figure out, @tobi is making me implement a couple of very rudimentary routines in x86 GCC inline assembler because he wants them faster than possible for mere mortal C. The first was a routine to calculate $\lfloor\log_2(n)\rfloor$ for $n\in\mathbb{N}$ and the second one was to zero out a large block of memory. For instance,
```c
unsigned inline log2int(unsigned x) {
unsigned l;
asm("bsrl %1, %0" : "=r" (l) : "r" (x));
return ( 1 << l == x ) ? l : l + 1;
}
```
is about 50 times faster than the C-native Version
```c
unsigned inline log2int(unsigned x) {
unsigned l = 0;
while(x > (1<<l)) l++;
return l;
}
```
even after optimization. For some reason, I found it tricky to google up the official intel x86 opcode reference ((<a href="http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-2a-manual.pdf" target="_blank">Opcode Reference Part 1</a>)) ((<a href="http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-2b-manual.pdf" target="_blank">Opcode Reference Part 2</a>)), so I am linking these here.