After a couple of tests, it turns out that the very simple
#include <limits.h>
inline void memzap(void *dest, unsigned long count) {
asm( "cld"
# if ULONG_MAX == 0xffffffff
"\n" "andl $3, %%ecx"
"\n" "rep stosb"
"\n" "movl %%ebx, %%ecx"
"\n" "shrl $2, %%ecx"
"\n" "rep stosl"
# else
"\n" "andq $7, %%rcx"
"\n" "rep stosb"
"\n" "movq %%rbx, %%rcx"
"\n" "shrq $3, %%rcx"
"\n" "rep stosq"
# endif
: "=c" (count), "=D" (dest), "=b" (count)
: "c" (count), "D" (dest), "b" (count), "a" (0)
);
}
is the fastest way to zero out a large block of memory, which is not very surprising. It is about 4 to 5 times faster than memset
and about as fast as new []
, if I can trust @tobi on that matter. I tried using MMX registers, but anything that involves actually
looping over the memory region will be about as fast as memset
. The only way to get a bit of speed is using the rep
opcode.
Tiny Edit: The above code is much more safe to compile on both 64 and 32 bit computers.