diff --git a/docs/developer/reproducibility.md b/docs/developer/reproducibility.md index 917098f..25b3143 100644 --- a/docs/developer/reproducibility.md +++ b/docs/developer/reproducibility.md @@ -11,105 +11,38 @@ Our build artifacts consist of: * Fedora packages (for regular Fedora distros and Qubes) * Debian packages (for Debian and Ubuntu) -As of writing this, none of the above artifacts are reproducible. For this -reason, we purposefully build them in machines owned by FPF, since we can't -trust third-party servers. A security hole in GitHub, or -in our CI pipeline (check out the -[Ultralytics cryptominer saga](https://github.com/ultralytics/ultralytics/issues/18027)), -may allow attackers to plant a malicious artifact with no detection. +As of writing this, only the following artifacts are reproducible: +* Container images (see [#1047](https://github.com/freedomofpress/dangerzone/issues/1047)) -Still, building our artifacts in private is not ideal. Third parties cannot -easily audit if our artifacts have been built correctly or if they have been -tampered with. For instance, our Apple Silicon container image builds PyMuPDF -from source, and while the PyPI source package is hashed, the produced output -does not have a known hash. So, it's not easy to verify it's been built -correctly (read also the seminal -["Reflections on Trusting Trust"](https://www.cs.cmu.edu/~rdriley/487/papers/Thompson_1984_ReflectionsonTrustingTrust.pdf) -lecture by Ken Thompson on that subject). - -In order to make our builds auditable and allow building artifacts in -third-party servers safely, we want to make each artifact build reproducible. In -the following sections, we'll lay down the plan to do so for each artifact type. +In the following sections, we'll mention some specifics about enforcing +reproducibility for each artifact type. ## Container image -### Current limitations - -Our container image is currently not reproducible for the following main -reasons: - -* We build PyMuPDF from source, since it's not available in Alpine Linux. The - result of this build is not reproducible. Note that PyMuPDF wheels are - available from PyPI, but there are no ARM wheels for the musl libc platforms. -* Alpine Linux does not have a way to pin packages and their dependencies, and - does not retain old packages. There's a - [workaround](https://github.com/reproducible-containers/repro-pkg-cache) - to download the required packages and store them elsewhere, but then the - cached package downloads cannot be easily audited. - -## Proposed implementation - -We can take advantage of the -[Debian snapshot archives](https://snapshot.debian.org/) -and pin our packages by specifying a date. There's already -[prior art](https://github.com/reproducible-containers/repro-sources-list.sh/) -for that, thanks to the incredible work of @AkihiroSuda on -[reproducible containers](https://github.com/reproducible-containers). -As for PyMuPDF, it is available from the Debian repos, so we won't have to build -it from source. - -Here are a few other obstacles that we need to overcome: -* We currently download the - [latest gVisor version](https://gvisor.dev/docs/user_guide/install/#latest-release) - from a GCS bucket. Now that we have switched to Debian, we can take advantage - of their - [timestamped APT repos](https://gvisor.dev/docs/user_guide/install/#specific-release) - and download specific releases from those. An extra benefit is that such - releases are signed with their APT key. -* We can no longer update the packages in the container image by rebuilding it. - We have to bump the dates in the Dockerfile first, which is a minor hassle, - but much more declarative. -* The `repro-source-list-.sh` script uses the release date of the container - image. However, the Debian image is not updated daily (see - [newest tags](https://hub.docker.com/_/debian/tags) - in DockerHub). So, if we want to ship an emergency release, we have to - circumvent this limitation. A simple way is to trick the script by bumping the - date of the `/etc/apt/sources.list.d/debian.sources` and - `/etc/apt/sources.list` files. -* While we talk about image reproducibility, we can't actually achieve the exact - same SHA-256 hash for two different image builds. That's because the file - timestamps in the image layers will differ, depending on when the build took - place. The rest of the image though (file contents, permissions, manifest) - should be byte-for-byte the same. A simple way to check this is with the - [`diffoci`](https://github.com/reproducible-containers/diffoci) tool, and - specifically this invocation: - - ``` - ./diffoci diff podman:// podman:// \ - --ignore-timestamps --ignore-image-name --verbose - ``` - ### Updating the image The fact that our image is reproducible also means that it's frozen in time. -This means that rebuilding the image without updating our Dockerfile will **not** -receive security updates. +This means that rebuilding the image without updating our Dockerfile will +**not** receive security updates. -We list the necessary variables that make up our image in the `Dockerfile.env` -file. These are: +Here are the necessary variables that make up our image in the `Dockerfile.env` +file: * `DEBIAN_IMAGE_DATE`: The date that the Debian container image was released * `DEBIAN_ARCHIVE_DATE`: The Debian snapshot repo that we want to use * `GVISOR_ARCHIVE_DATE`: The gVisor APT repo that we want to use * `H2ORESTART_CHECKSUM`: The SHA-256 checksum of the H2ORestart plugin * `H2ORESTART_VERSION`: The version of the H2ORestart plugin -If you update these values in `Dockerfile.env`, you can create a new Dockerfile -with: +If you update these values in `Dockerfile.env`, you must also create a new +Dockerfile with: ``` poetry run jinja2 Dockerfile.in Dockerfile.env > Dockerfile ``` +Updating `Dockerfile` without bumping `Dockerfile.in` is detected and should +trigger a CI error. + ### Reproducing the image For a simple way to reproduce a Dangerzone container image, either local or