unsigned intmsb32(register unsigned int x){ x |= (x >> 1); x |= (x >> 2); x |= (x >> 4); x |= (x >> 8); x |= (x >> 16); return(x & ~(x >> 1));}
1 register, 13 instructions. Believe it or not, this is usually faster than the BSR instruction mentioned above, which operates in linear time. This is logarithmic time.
From http://aggregate.org/MAGIC/#Most%20Significant%201%20Bit