With the presented tools BTF and CO-RE, BPF gets more portable, not requiring the whole build chain with LLVM, Clang, and kernel header dependencies. In this blog post, Brendan Gregg explains how it works and what it means for BPF performance tools.
Today’s software systems are arguably robust at logging and recovering from fail-stop hardware – there is a clear,binary signal that is fairly easy to recognize a and interpret. We believe fail-slow hardware is a fundamentally harder problem to solve. It is very hard to distinguish such cases from ones that are caused by software performance issues. It is also evident that many modern,advanced deployed systems do not anticipate this failure mode. We hope that our study can influence vendors, operators, and systems designers to treat fail-slow hardware as a separate class of failures and start addressing them more robustly in future systems.