Ian's blog
Jan 13 2021

Reproducible Builds

A reproducible build is one that produces the same byte for byte output when given the same input. These builds aren’t common. Mostly because of compiler defaults.

The things that can make a build non-reproducible are:

Why they matter

Reproducible builds have inherent security. They allow us to verify the source code a binary comes from. This makes detecting changes or tampering straightforward.

  1. Compile the program on at least two different systems
  2. Compare the checksums.
  3. If they match, that’s good. If they don’t match, something is wrong.

The builds have more integrity, which benefits everyone. For security-minded people, it means a straightforward way of detecting backdoors in the build process. For open source enthusiasts, it means a clear way of detecting GPL violations. For everyone else, we get safer software.

You can find out how to acheive reproducible builds at reproducible-builds.org/docs.

Attacks on build systems

Back-doors introduced in the build process are not easy to detect. Most of the detection happens too late when the damage is already done. The attacks can have a high impact in a short period, so early detection is important.

There have been many attacks like this in the past. Some of them are:

Defending against them

Reproducible builds are the best way to defend against these kinds of attacks. Attackers lose their incentive bacause they are detected quickly and need to compromise more systems.

The disadvantages

Like everything there are downsides. Some of them are:

What they don’t protect us from

Reproducible builds don’t protect us from malicious developers. A developer could knowingly write vulnerable code that looks like a mistake when discovered. This called underhanded code.

In the paper titled Trusting Trust Ken Thompson asks us:

To what extent should one trust a statement that a program is free of Trojan horses? Perhaps it is more important to trust the people who wrote the software.

Who does reproducible builds?

Many open-source projects have reproducible builds to assure users of their integrity. Some of them include:

  1. Bitcoin
  2. Tor Browser
  3. F-Droid
  4. Signal
  5. Telegram

You can find more at reproducible-builds

Linux distros

Around 80%-90% of the packages in Linux distributions ( Arch, Debian, OpenSUSE, NixOS, Guix) are already reproducible. You can find the exact numbers at reproducible-builds.org.

Digital signatures

Digital signatures still have their place. They are useful when verifying who a document or message comes from. They aren’t useful when verifying the source code a binary comes from. Some forms of digital signatures can get in the way of reproducible builds, as explained by telegram developers in this article.

Conclusion

I’ll conclude by quoting Mike Perry of the Tor project. He described the importance of reproducible-builds as follows:

I don’t believe that software development models based on trusting a single party can be secure against serious adversaries anymore, given the current trends in computer security.

This statement is true to this day.


Further reading

Many people have written about reproducible builds and have gone into more detail than I have in this post. Here are some of them.

reproducible-builds.org

A website with technical information on reproducible builds. It also has status updates on Linux distributions.

Reflections on Trusting Trust

A paper by Ken Thompson. He asks us what if compilers had backdoors. Would it possible to even detect & prevent such an attack?

Countering Compiler backdoors

David A. Wheeler answers the above question. He proposes a method called Diverse Double-Compiling.

The Octopus Scanner Malware

Writeup on the discovery of a supply chain attack that targeted developer’s machines.

Verifying the source code for binaries

An Lwn article on reproducible builds