标签云

微信群

扫码加入我们

WeChat QR Code

如何在Linux内核的工作likely()和unlikely()宏和他们的利益是什么?

I've been digging through some parts of the Linux kernel, and found calls like this:

if (unlikely(fd < 0))
{
    /* Do something */
}

or

if (likely(!err))
{
    /* Do something */
}

I've found the definition of them:

#define likely(x)       __builtin_expect((x),1)
#define unlikely(x)     __builtin_expect((x),0)

I know that they are for optimization, but how do they work? And how much performance/size decrease can be expected from using them? And is it worth the hassle (and losing the portability probably) at least in bottleneck code (in userspace, of course).


This really isn't specific to the Linux kernel or about macros, but a compiler optimization. Should this be retagged to reflect that?

2018年05月27日17分26秒

The paper What every Programmer should know about Memory (p. 57) contains an in-depth explanation.

2018年05月27日17分26秒

see also BOOST_LIKELY

2018年05月27日17分26秒

Related: a benchmark on the use of __builtin_expect on another question.

2018年05月27日17分26秒

There's no portability issue. You can trivially do things like #define likely(x) (x) and #define unlikely(x) (x) on platforms that don't support this kind of hinting.

2018年05月27日17分26秒

These macros mostly were used for error checking. Because error leaves less probably then normal operation. A few people make profiling or calculation to decide most used leaf...

2018年05月27日17分26秒

As regards the fragment "[...]that it is being run in a tight loop", many CPUs have a branch predictor, thus using these macros only helps the first time code is executed or when the history table is overwritten by a different branch with the same index into the branching table. In a tight loop, and assuming a branch goes one way most of the time, the branch predictor will likely begin guessing the correct branch very quickly. - your friend in pedantry.

2018年05月27日17分26秒

RossRogers: What really happens is the compiler arranges the branches so the common case is the not-taken one. This is faster even when branch prediction does work. Taken branches are problematic for instruction-fetch and decode even when they're predicted perfectly. Some CPUs statically predict branches that aren't in their history table, usually with assume not-taken for forward branches. Intel CPUs don't work that way: they don't try to check that the predictor table entry is for this branch, they just use it anyway. A hot branch and a cold branch might alias the same entry...

2018年05月27日17分26秒

This answer is mostly obsolete since the main claim is that it helps branch prediction, and as PeterCordes points out, in most modern hardware there is no implicit or explicit static branch prediction. In fact the hint is used by the compiler to optimize the code, whether that involves static branch hints, or any other type of optimization. For most architectures today, it is the "any other optimization" that matters, e.g., making hot paths contiguous, better scheduling the hot path, minimizing the size of the slow path, vectorizing only the expected path, etc, etc.

2018年05月27日17分26秒

BeeOnRope because of cache prefetch and word size, there is still an advantage to running a program linearly. The next memory location will already be fetched and in cache, the branch target maybe or maybe not. With a 64 bit CPU you grab at least 64 bits at a time. Depending on DRAM interleave, it may be 2x 3x or more bits that get grabbed.

2018年05月27日17分26秒

Also, it impacts icache footprint - by keeping unlikely snippets of code out of the hot path.

2018年05月27日17分26秒

More precisely, it can do it with gotos without repeating the return x: stackoverflow.com/a/31133787/895245

2018年05月27日17分26秒

For the record, x86 does take additional space for branch hints. You have to have a one-byte prefix on branches to specify the appropriate hint. Agreed that hinting is a Good Thing (TM), though.

2018年05月27日17分26秒

Dang CISC CPUs and their variable-length instructions ;)

2018年05月27日17分26秒

Dang RISC CPUs -- Stay away from my 15-byte instructions ;)

2018年05月27日17分26秒

CodyBrocious: branch hinting was introduced with P4, but was abandoned along with P4. All other x86 CPUs simply ignore those prefixes (because prefixes are always ignored in contexts where they're meaningless). These macros don't cause gcc to actually emit branch-hint prefixes on x86. They do help you get gcc to lay out your function with fewer taken branches on the fast-path.

2018年05月28日17分26秒

gcc never generates x86 branch hints - at least all Intel CPUs would ignore them anyway. It will try to limit code size in unlikely regions by avoiding inlining and loop unrolling, though.

2018年05月27日17分26秒

You don't use portability - the platforms that don't support them just define them to expand to empty strings.

2018年05月27日17分26秒

I think you two are actually agreeing with each other -- it's just phrased confusingly. (From the looks of it, Andrew's comment is saying "you can use them without losing portability" but sharptooth thought that he said "don't use them as they're not portable" and objected.)

2018年05月27日17分26秒