mirror of
https://github.com/freedomofpress/dangerzone.git
synced 2025-05-17 18:51:50 +02:00
parent
e554a573e5
commit
9c0c880cd3
1 changed files with 90 additions and 0 deletions
90
docs/developer/reproducibility.md
Normal file
90
docs/developer/reproducibility.md
Normal file
|
@ -0,0 +1,90 @@
|
||||||
|
# Reproducible builds
|
||||||
|
|
||||||
|
We want to improve the transparency and auditability of our build artifacts, and
|
||||||
|
a way to achieve this is via reproducible builds. For a broader understanding of
|
||||||
|
what reproducible builds entail, check out https://reproducible-builds.org/.
|
||||||
|
|
||||||
|
Our build artifacts consist of:
|
||||||
|
* Container images (`amd64` and `arm64` architectures)
|
||||||
|
* macOS installers (for Intel and Apple Silicon CPUs)
|
||||||
|
* Windows installer
|
||||||
|
* Fedora packages (for regular Fedora distros and Qubes)
|
||||||
|
* Debian packages (for Debian and Ubuntu)
|
||||||
|
|
||||||
|
As of writing this, none of the above artifacts are reproducible. For this
|
||||||
|
reason, we purposefully build them in machines owned by FPF, since we can't
|
||||||
|
trust third-party servers. A security hole in GitHub, or
|
||||||
|
in our CI pipeline (check out the
|
||||||
|
[Ultralytics cryptominer saga](https://github.com/ultralytics/ultralytics/issues/18027)),
|
||||||
|
may allow attackers to plant a malicious artifact with no detection.
|
||||||
|
|
||||||
|
Still, building our artifacts in private is not ideal. Third parties cannot
|
||||||
|
easily audit if our artifacts have been built correctly or if they have been
|
||||||
|
tampered with. For instance, our Apple Silicon container image builds PyMuPDF
|
||||||
|
from source, and while the PyPI source package is hashed, the produced output
|
||||||
|
does not have a known hash. So, it's not easy to verify it's been built
|
||||||
|
correctly (read also the seminal
|
||||||
|
["Reflections on Trusting Trust"](https://www.cs.cmu.edu/~rdriley/487/papers/Thompson_1984_ReflectionsonTrustingTrust.pdf)
|
||||||
|
lecture by Ken Thompson on that subject).
|
||||||
|
|
||||||
|
In order to make our builds auditable and allow building artifacts in
|
||||||
|
third-party servers safely, we want to make each artifact build reproducible. In
|
||||||
|
the following sections, we'll lay down the plan to do so for each artifact type.
|
||||||
|
|
||||||
|
## Container image
|
||||||
|
|
||||||
|
### Current limitations
|
||||||
|
|
||||||
|
Our container image is currently not reproducible for the following main
|
||||||
|
reasons:
|
||||||
|
|
||||||
|
* We build PyMuPDF from source, since it's not available in Alpine Linux. The
|
||||||
|
result of this build is not reproducible. Note that PyMuPDF wheels are
|
||||||
|
available from PyPI, but there are no ARM wheels for the musl libc platforms.
|
||||||
|
* Alpine Linux does not have a way to pin packages and their dependencies, and
|
||||||
|
does not retain old packages. There's a
|
||||||
|
[workaround](https://github.com/reproducible-containers/repro-pkg-cache)
|
||||||
|
to download the required packages and store them elsewhere, but then the
|
||||||
|
cached package downloads cannot be easily audited.
|
||||||
|
|
||||||
|
## Proposed implementation
|
||||||
|
|
||||||
|
We can take advantage of the
|
||||||
|
[Debian snapshot archives](https://snapshot.debian.org/)
|
||||||
|
and pin our packages by specifying a date. There's already
|
||||||
|
[prior art](https://github.com/reproducible-containers/repro-sources-list.sh/)
|
||||||
|
for that, thanks to the incredible work of @AkihiroSuda on
|
||||||
|
[reproducible containers](https://github.com/reproducible-containers).
|
||||||
|
As for PyMuPDF, it is available from the Debian repos, so we won't have to build
|
||||||
|
it from source.
|
||||||
|
|
||||||
|
Here are a few other obstacles that we need to overcome:
|
||||||
|
* We currently download the
|
||||||
|
[latest gVisor version](https://gvisor.dev/docs/user_guide/install/#latest-release)
|
||||||
|
from a GCS bucket. Now that we have switched to Debian, we can take advantage
|
||||||
|
of their
|
||||||
|
[timestamped APT repos](https://gvisor.dev/docs/user_guide/install/#specific-release)
|
||||||
|
and download specific releases from those. An extra benefit is that such
|
||||||
|
releases are signed with their APT key.
|
||||||
|
* We can no longer update the packages in the container image by rebuilding it.
|
||||||
|
We have to bump the dates in the Dockerfile first, which is a minor hassle,
|
||||||
|
but much more declarative.
|
||||||
|
* The `repro-source-list-.sh` script uses the release date of the container
|
||||||
|
image. However, the Debian image is not updated daily (see
|
||||||
|
[newest tags](https://hub.docker.com/_/debian/tags)
|
||||||
|
in DockerHub). So, if we want to ship an emergency release, we have to
|
||||||
|
circumvent this limitation. A simple way is to trick the script by bumping the
|
||||||
|
date of the `/etc/apt/sources.list.d/debian.sources` and
|
||||||
|
`/etc/apt/sources.list` files.
|
||||||
|
* While we talk about image reproducibility, we can't actually achieve the exact
|
||||||
|
same SHA-256 hash for two different image builds. That's because the file
|
||||||
|
timestamps in the image layers will differ, depending on when the build took
|
||||||
|
place. The rest of the image though (file contents, permissions, manifest)
|
||||||
|
should be byte-for-byte the same. A simple way to check this is with the
|
||||||
|
[`diffoci`](https://github.com/reproducible-containers/diffoci) tool, and
|
||||||
|
specifically this invocation:
|
||||||
|
|
||||||
|
```
|
||||||
|
./diffoci diff podman://<new_image_tag> podman://<old_image_tag> \
|
||||||
|
--ignore-timestamps --ignore-image-name --verbose
|
||||||
|
```
|
Loading…
Reference in a new issue