A reproducible build is one that produces the same byte for byte output when given the same input. These builds aren’t common. Mostly because of compiler defaults.
The things that can make a build non-reproducible are:
- Unique IDs
- Build paths
Why they matter
Reproducible builds have inherent security. They allow us to verify the source code a binary comes from. This makes detecting changes or tampering straightforward.
- Compile the program on at least two different systems
- Compare the checksums.
- If they match, that’s good. If they don’t match, something is wrong.
The builds have more integrity, which benefits everyone. For security-minded people, it means a straightforward way of detecting backdoors in the build process. For open source enthusiasts, it means a clear way of detecting GPL violations. For everyone else, we get safer software.
You can find out how to acheive reproducible builds at reproducible-builds.org/docs.
Attacks on build systems
Back-doors introduced in the build process are not easy to detect. Most of the detection happens too late when the damage is already done. The attacks can have a high impact in a short period, so early detection is important.
There have been many attacks like this in the past. Some of them are:
- SUNSPOT, An attack on SolarWinds that affected lots of important people
- XcodeGhost - Attack on an IDE for Mac. Lots of iOS apps were affected. Including Angry Birds
Defending against them
Reproducible builds are the best way to defend against these kinds of attacks. Attackers lose their incentive bacause they are detected quickly and need to compromise more systems.
Like everything there are downsides. Some of them are:
- Need to use a specific compiler version
- No Profile-guided optimization
What they don’t protect us from
Reproducible builds don’t protect us from malicious developers. A developer could knowingly write vulnerable code that looks like a mistake when discovered. This called underhanded code.
In the paper titled Trusting Trust Ken Thompson asks us:
To what extent should one trust a statement that a program is free of Trojan horses? Perhaps it is more important to trust the people who wrote the software.
Who does reproducible builds?
Many open-source projects have reproducible builds to assure users of their integrity. Some of them include:
- Tor Browser
You can find more at reproducible-builds
Around 80%-90% of the packages in Linux distributions ( Arch, Debian, OpenSUSE, NixOS, Guix) are already reproducible. You can find the exact numbers at reproducible-builds.org.
Digital signatures still have their place. They are useful when verifying who a document or message comes from. They aren’t useful when verifying the source code a binary comes from. Some forms of digital signatures can get in the way of reproducible builds, as explained by telegram developers in this article.
I’ll conclude by quoting Mike Perry of the Tor project. He described the importance of reproducible-builds as follows:
I don’t believe that software development models based on trusting a single party can be secure against serious adversaries anymore, given the current trends in computer security.
This statement is true to this day.
Many people have written about reproducible builds and have gone into more detail than I have in this post. Here are some of them.
A website with technical information on reproducible builds. It also has status updates on Linux distributions.
Reflections on Trusting Trust
A paper by Ken Thompson. He asks us what if compilers had backdoors. Would it possible to even detect & prevent such an attack?
Countering Compiler backdoors
David A. Wheeler answers the above question. He proposes a method called Diverse Double-Compiling.
The Octopus Scanner Malware
Writeup on the discovery of a supply chain attack that targeted developer’s machines.
Verifying the source code for binaries
An Lwn article on reproducible builds