Answer by paperclip optimizer for What is the fastest/most efficient way to find the highest set bit (msb) in an integer in C?

Since I seemingly have nothing else to do, I dedicated an inordinate amount of time to this problem during the weekend.

Without direct hardware support, it SEEMED like it should be possible to do better than O(log(w)) for w=64bit. And indeed, it is possible to do it in O(log log w), except the performance crossover doesn't happen until w>=256bit.

Either way, I gave it a go and the best I could come up with was the following mix of techniques:

uint64_t msb64 (uint64_t n) {    const uint64_t  M1 = 0x1111111111111111;    // we need to clear blocks of b=4 bits: log(w/b) >= b    n |= (n>>1); n |= (n>>2);    // reverse prefix scan, compiles to 1 mulx    uint64_t s = ((M1<<4)*(__uint128_t)(n&M1))>>64;    // parallel-reduce each block    s |= (s>>1);    s |= (s>>2);    // parallel reduce, 1 imul    uint64_t c = (s&M1)*(M1<<4);    // collect last nibble, generate compute count - count%4    c = c >> (64-4-2); // move last nibble to lowest bits leaving two extra bits    c &= (0x0F<<2);    // zero the lowest 2 bits    // add the missing bits; this could be better solved with a bit of foresight    // by having the sum already stored    uint8_t b = (n >> c); // & 0x0F; // no need to zero the bits over the msb    const uint64_t  S = 0x3333333322221100; // last should give -1ul    return c | ((S>>(4*b)) & 0x03);}

This solution is branchless and doesn't require an external table that can generate cache misses. The two 64-bit multiplications aren't much of a performance issue in modern x86-64 architectures.

I benchmarked the 64-bit versions of some of the most common solutions presented here and elsewhere.Finding a consistent timing and ranking proved to be way harder than I expected. This has to do not only with the distribution of the inputs, but also with out-of-order execution, and other CPU shennanigans, which can sometimes overlap the computation of two or more cycles in a loop.

I ran the tests on an AMD Zen using RDTSC and taking a number of precautions such as running a warm-up, introducing artificial chain dependencies, and so on.

For a 64-bit pseudorandom even distribution the results are:

name	cycles	comment
clz	5.16	builtin intrinsic, fastest
cast	5.18	cast to double, extract exp
ulog2	7.50	reduction + deBrujin
msb64*	11.26	this version
unrolled	19.12	varying performance
obvious	110.49	"obviously" slowest for int64

Casting to double is always surprisingly close to the builtin intrinsic. The "obvious" way of adding the bits one at a time has the largest spread in performance of all, being comparable to the fastest methods for small numbers and 20x slower for the largest ones.

My method is around 50% slower than deBrujin, but has the advantage of using no extra memory and having a predictable performance. I might try to further optimize it if I ever have time.

Answer by paperclip optimizer for What is the fastest/most efficient way to find the highest set bit (msb) in an integer in C?

Trending Articles

Stalker hid in bushes leaving his ex 'terrified'

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

BREAKING NEWS: Pagan’s MC Boss Keith “Conan The Barbarian” Richter Released...

Notts men wanted over alleged cocaine smuggling plot

Black Angus Grilled Artichokes

Blackstone — Befi Mano (Throw Back Thursday)

Police confirm man stabbed to death in Selsdon was Andrew David Else of Croydon

Charlotte de Witte – One Mind – EP [iTunes Plus M4A]

Ko Droka na Bogi

Raj Panchayat 3rd / Third Grade Teacher Revised Result 2012 Level 1-2...

Teen Shot In Miami Drive-By Dies From Injuries

Police charge man, 23, with assault and criminal damage following incident in...

Hizia picha za utupu za meneja wa benki imekaaje?

Man arrested for threatening to shoot up police station

BO RUSSELL BENDER Arrested by Clackamas County Sheriff's Office on Mar 11, 2020

Best Suvichar in Hindi |बेस्ट सुविचार |शुभ विचार हिंदी में

D16 Group Phoscyon v1.9.5 Incl.Keygen WiN/MAC-R2R

Download: Mirraj Malifah – Chance Yako (Prod by_Bicko @Musiqhouse)

MCQ Questions for Class 12 History: Ch 10 Colonialism and the countryside

Azura Botanify v1.0 (For FL Studio)-FANTASTiC