I'd like to know which of the following is faster for getting the i'th rightmost bit of integer x, where i starts with 0:
x & (1 << i) x >> i % 2
Also curious about why one is faster.
preguntado el 30 de junio de 12 a las 23:06
As commented, this depends on many factors. Also, you shouldn't care. On any real program I don't believe you will have concern for such low level details. Premature optimization is a horrible waste of time.
Also, these are not equal operations unless your concept of equality is only the concept of zero/non-zero.
But it's a fun exercise
Using GCC with -O3 and disassembling I see:
x & (1 << i) The first version Dump of assembler code for function op1: 0x0000000000000000 <+0>: mov %esi,%ecx 0x0000000000000002 <+2>: mov $0x1,%eax 0x0000000000000007 <+7>: shl %cl,%eax 0x0000000000000009 <+9>: and %edi,%eax 0x000000000000000b <+11>: retq End of assembler dump.
x >> i % 2 Dump of assembler code for function op2: 0x0000000000000010 <+0>: mov %esi,%ecx 0x0000000000000012 <+2>: sar %cl,%edi 0x0000000000000014 <+4>: mov %edi,%edx 0x0000000000000016 <+6>: shr $0x1f,%edx 0x0000000000000019 <+9>: lea (%rdi,%rdx,1),%eax 0x000000000000001c <+12>: and $0x1,%eax 0x000000000000001f <+15>: sub %edx,%eax 0x0000000000000021 <+17>: retq
Entonces eso es un
shift left y una
and vs un
load effective address, Y un
and operation. It seems pretty obvious on this hardware what will be faster, but unless you're on a microcontroller what seems obvious is often not so clear. Let us test it.
I made a loop of something like ten million calls to the (inlined) operation and was sure to return the sum of the operation results so the compiler wouldn't throw it all away.
[tommd@mavlo Test]$ gcc -O3 so.c -o so [tommd@mavlo Test]$ time ./so real 0m0.388s user 0m0.384s sys 0m0.003s [tommd@mavlo Test]$ time ./so real 0m0.384s user 0m0.380s sys 0m0.003s [tommd@mavlo Test]$ vi so.c // I changed the function to the second one [tommd@mavlo Test]$ gcc -O3 so.c -o so [tommd@mavlo Test]$ time ./so real 0m0.380s user 0m0.377s sys 0m0.002s [tommd@mavlo Test]$ time ./so real 0m0.380s user 0m0.379s
Well shucks - the exact same. There's enough hardware in a modern super-scaler processor to hide any difference.
The idiomatic way to extract a bit is either
(x >> i) & 1
which would also work analogously for more than one bit, or
x & (1 << i)
if you just want to test a single bit.
Note that in C
x must not be negative (preferably declared unsigned), and if
x es más largo que un
int you need to specify that 1 is also that long in the second one.
% will confuse the reader and may have a much worse performance, depending on the compiler.