Jan 13 2021

Reproducible Builds

A reproducible build is one that produces the same byte for byte output when given the same input. These builds aren’t common. Mostly because of compiler defaults.

The things that can make a build non-reproducible are:

Timestamps
Unique IDs
Build paths
etc

Why they matter

Reproducible builds have inherent security. They allow us to verify the source code a binary comes from. This makes detecting changes or tampering straightforward.

Compile the program on at least two different systems
Compare the checksums.
If they match, that’s good. If they don’t match, something is wrong.

The builds have more integrity, which benefits everyone. For security-minded people, it means a straightforward way of detecting backdoors in the build process. For open source enthusiasts, it means a clear way of detecting GPL violations. For everyone else, we get safer software.

You can find out how to acheive reproducible builds at reproducible-builds.org/docs.

Attacks on build systems

Back-doors introduced in the build process are not easy to detect. Most of the detection happens too late when the damage is already done. The attacks can have a high impact in a short period, so early detection is important.

There have been many attacks like this in the past. Some of them are:

SUNSPOT, An attack on SolarWinds that affected lots of important people
XcodeGhost - Attack on an IDE for Mac. Lots of iOS apps were affected. Including Angry Birds

Defending against them

Reproducible builds are the best way to defend against these kinds of attacks. Attackers lose their incentive bacause they are detected quickly and need to compromise more systems.

The disadvantages

Like everything there are downsides. Some of them are:

Need to use a specific compiler version
No Profile-guided optimization

What they don’t protect us from

Reproducible builds don’t protect us from malicious developers. A developer could knowingly write vulnerable code that looks like a mistake when discovered. This called underhanded code.

In the paper titled Trusting Trust Ken Thompson asks us:

To what extent should one trust a statement that a program is free of Trojan horses? Perhaps it is more important to trust the people who wrote the software.

Who does reproducible builds?

Many open-source projects have reproducible builds to assure users of their integrity. Some of them include:

You can find more at reproducible-builds

Linux distros

Around 80%-90% of the packages in Linux distributions ( Arch, Debian, OpenSUSE, NixOS, Guix) are already reproducible. You can find the exact numbers at reproducible-builds.org.

Digital signatures

Digital signatures still have their place. They are useful when verifying who a document or message comes from. They aren’t useful when verifying the source code a binary comes from. Some forms of digital signatures can get in the way of reproducible builds, as explained by telegram developers in this article.

Conclusion

I’ll conclude by quoting Mike Perry of the Tor project. He described the importance of reproducible-builds as follows:

I don’t believe that software development models based on trusting a single party can be secure against serious adversaries anymore, given the current trends in computer security.

This statement is true to this day.

Reproducible Builds

Why they matter

Attacks on build systems

Defending against them

The disadvantages

What they don’t protect us from

Who does reproducible builds?

Linux distros

Digital signatures

Conclusion

Further reading

reproducible-builds.org

Reflections on Trusting Trust

Countering Compiler backdoors

The Octopus Scanner Malware

Verifying the source code for binaries