Compare commits

...

13 commits

Author SHA1 Message Date
Alex Pyrgiotis
cbb7ed902f
FIXUP: Use semantic diff for diffoci 2025-01-20 17:45:55 +02:00
Alex Pyrgiotis
eacf1eb2fa
grype: Add Debian CVEs to ignore list
Add some CVEs in our ignore list, which are present in the new Debian
image. These CVEs are marked as "wont-fix" by the Debian Security team.
2025-01-20 17:05:48 +02:00
Alex Pyrgiotis
505db39ca0
fixup! FIXUP: Copy all the Python files from the conversion/ dir 2025-01-20 15:40:14 +02:00
Alex Pyrgiotis
0f0fa49923
FIXUP: determine_tag -> determine_git_tag 2025-01-20 15:27:32 +02:00
Alex Pyrgiotis
8911b72529
fixup! FIXUP: Handle diffoci updates appropriately 2025-01-20 15:26:07 +02:00
Alex Pyrgiotis
725ce3b9c7
FIXUP: Add easier method to generate Dockerfile 2025-01-20 15:02:04 +02:00
Alex Pyrgiotis
afc5e8e636
FIXUP: Bump Dockerfile envs 2025-01-20 14:59:42 +02:00
Alex Pyrgiotis
5bb37ef48f
fixup! FIXUP: Rename dangerzone/container to dangerzone/container_helpers 2025-01-20 14:20:44 +02:00
Alex Pyrgiotis
c70d1970dd
FIXUP: Remove unnecessary needs 2025-01-20 14:20:44 +02:00
Alex Pyrgiotis
ec616be2c0
Rename vendor-pymupdf.py to debian-vendor-pymupdf.py
Rename the `vendor-pymupdf.py` script to `debian-vendor-pymupdf.py`,
since it's used only when building Debian packages.
2025-01-20 12:37:02 +02:00
Alex Pyrgiotis
acbc433717
FIXUP: Keep only the necessary instructions for checking reproducibility 2025-01-20 12:35:32 +02:00
Alex Pyrgiotis
685cf431a3
FIXUP: Handle diffoci updates appropriately 2025-01-20 12:28:34 +02:00
Alex Pyrgiotis
2b71e615a8
FIXUP: Add indication of faulty file command 2025-01-20 12:23:56 +02:00
13 changed files with 80 additions and 111 deletions

View file

@ -473,8 +473,6 @@ jobs:
bash -c 'cd dangerzone; poetry run make test'
check-reproducibility:
needs:
- build-container-image
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
@ -489,8 +487,9 @@ jobs:
- name: Verify that the Dockerfile matches the commited template and params
run: |-
poetry run jinja2 Dockerfile.in Dockerfile.env > out
diff Dockerfile out
cp Dockerfile Dockerfile.orig
make Dockerfile
diff Dockerfile.orig Dockerfile
- name: Build Dangerzone container image
run: |

View file

@ -26,7 +26,7 @@ jobs:
run: |
date=$(date "+%Y%m%d")
sed -i "s/DEBIAN_ARCHIVE_DATE=[0-9]\+/DEBIAN_ARCHIVE_DATE=${date}/" Dockerfile.env
poetry run jinja2 Dockerfile.in Dockerfile.env > Dockerfile
make Dockerfile
- name: Build container image
run: python3 ./install/common/build-image.py --runtime docker --no-save
- name: Get image tag

View file

@ -2,10 +2,38 @@
# latest release of Dangerzone, and offer our analysis.
ignore:
# CVE-2024-11053
# CVE-2023-45853
# ==============
#
# NVD Entry: https://nvd.nist.gov/vuln/detail/CVE-2024-11053
# Verdict: Dangerzone is not affected because libcurl is an HTTP client, and
# the Dangerzone container does not make any network calls.
- vulnerability: CVE-2024-11053
# Debian tracker: https://security-tracker.debian.org/tracker/CVE-2023-45853
# Verdict: Dangerzone is not affected because the zlib library in Debian is
# built in a way that is not vulnerable.
- vulnerability: CVE-2023-45853
# CVE-2024-38428
# ==============
#
# Debian tracker: https://security-tracker.debian.org/tracker/CVE-2024-38428
# Verdict: Dangerzone is not affected because it doesn't use wget in the
# container image (which also has no network connectivity).
- vulnerability: CVE-2024-38428
# CVE-2024-57823
# ==============
#
# Debian tracker: https://security-tracker.debian.org/tracker/CVE-2024-57823
# Verdict: Dangerzone is not affected. First things first, LibreOffice is
# using this library for parsing RDF metadata in a document [1], and has
# issued a fix for the vendored raptor2 package they have for other distros
# [2].
#
# On the other hand, the Debian security team has stated that this is a minor
# issue [3], and there's no fix from the developers yet. It seems that the
# Debian package is not affected somehow by this CVE, probably due to the way
# it's packaged.
#
# [1] https://wiki.documentfoundation.org/Documentation/DevGuide/Office_Development#RDF_metadata
# [2] https://cgit.freedesktop.org/libreoffice/core/commit/?id=2b50dc0e4482ac0ad27d69147b4175e05af4fba4
# [2] From https://security-tracker.debian.org/tracker/CVE-2024-57823:
#
# [bookworm] - raptor2 <postponed> (Minor issue, revisit when fixed upstream)
#
- vulnerability: CVE-2024-57823

View file

@ -6,8 +6,8 @@ ARG DEBIAN_IMAGE_DATE=20250113
FROM debian:bookworm-${DEBIAN_IMAGE_DATE}-slim
ARG GVISOR_ARCHIVE_DATE=20250106
ARG DEBIAN_ARCHIVE_DATE=20250114
ARG GVISOR_ARCHIVE_DATE=20250113
ARG DEBIAN_ARCHIVE_DATE=20250120
ARG H2ORESTART_CHECKSUM=7760dc2963332c50d15eee285933ec4b48d6a1de9e0c0f6082946f93090bd132
ARG H2ORESTART_VERSION=v0.7.0
@ -62,7 +62,7 @@ RUN mkdir -p /opt/dangerzone/dangerzone
RUN touch /opt/dangerzone/dangerzone/__init__.py
# Copy only the Python code, and not any produced .pyc files.
COPY conversion/*.py /opt/dangerzone/dangerzone/conversion
COPY conversion/*.py /opt/dangerzone/dangerzone/conversion/
# Let the entrypoint script write the OCI config for the inner container under
# /config.json.
@ -76,6 +76,6 @@ USER dangerzone
# store the state of its containers.
RUN mkdir /home/dangerzone/.containers
COPY container/entrypoint.py /
COPY container_helpers/entrypoint.py /
ENTRYPOINT ["/entrypoint.py"]

View file

@ -1,9 +1,9 @@
# Can be bumped to the latest date in https://hub.docker.com/_/debian/tags?name=bookworm-
DEBIAN_IMAGE_DATE=20250113
# Can be bumped to today's date
DEBIAN_ARCHIVE_DATE=20250114
DEBIAN_ARCHIVE_DATE=20250120
# Can be bumped to the latest date in https://github.com/google/gvisor/tags
GVISOR_ARCHIVE_DATE=20250106
GVISOR_ARCHIVE_DATE=20250113
# Can be bumped to the latest version and checksum from https://github.com/ebandal/H2Orestart/releases
H2ORESTART_CHECKSUM=7760dc2963332c50d15eee285933ec4b48d6a1de9e0c0f6082946f93090bd132
H2ORESTART_VERSION=v0.7.0

View file

@ -62,7 +62,7 @@ RUN mkdir -p /opt/dangerzone/dangerzone
RUN touch /opt/dangerzone/dangerzone/__init__.py
# Copy only the Python code, and not any produced .pyc files.
COPY conversion/*.py /opt/dangerzone/dangerzone/conversion
COPY conversion/*.py /opt/dangerzone/dangerzone/conversion/
# Let the entrypoint script write the OCI config for the inner container under
# /config.json.
@ -76,6 +76,6 @@ USER dangerzone
# store the state of its containers.
RUN mkdir /home/dangerzone/.containers
COPY container/entrypoint.py /
COPY container_helpers/entrypoint.py /
ENTRYPOINT ["/entrypoint.py"]

View file

@ -47,6 +47,9 @@ test-large: test-large-init ## Run large test set
python -m pytest --tb=no tests/test_large_set.py::TestLargeSet -v $(JUNIT_FLAGS) --junitxml=$(TEST_LARGE_RESULTS)
python $(TEST_LARGE_RESULTS)/report.py $(TEST_LARGE_RESULTS)
Dockerfile: Dockerfile.env Dockerfile.in
poetry run jinja2 Dockerfile.in Dockerfile.env > Dockerfile
.PHONY: build-clean
build-clean:
doit clean

View file

@ -129,7 +129,9 @@ class DocumentToPixels(DangerzoneConverter):
# At least .odt, .docx, .odg, .odp, .ods, and .pptx
"application/zip": {
"type": "libreoffice",
# NOTE: Older `file` command cannot detect hwpx files properly.
# NOTE: `file` command < 5.45 cannot detect hwpx files properly, so we
# enable the extension in any case. See also:
# https://github.com/freedomofpress/dangerzone/pull/460#issuecomment-1654166465
"libreoffice_ext": "h2orestart.oxt",
},
# At least .doc, .docx, .odg, .odp, .odt, .pdf, .ppt, .pptx, .xls, and .xlsx

2
debian/rules vendored
View file

@ -9,5 +9,5 @@ export DH_VERBOSE=1
dh $@ --with python3 --buildsystem=pybuild
override_dh_builddeb:
./install/linux/vendor-pymupdf.py --dest debian/dangerzone/usr/lib/python3/dist-packages/dangerzone/vendor/
./install/linux/debian-vendor-pymupdf.py --dest debian/dangerzone/usr/lib/python3/dist-packages/dangerzone/vendor/
dh_builddeb $@

View file

@ -42,16 +42,21 @@ def git_verify(commit, source):
def diffoci_hash_matches(diffoci):
"""Check if the hash of the downloaded diffoci bin matches the expected one."""
m = hashlib.sha256()
m.update(DIFFOCI_PATH.open().read())
m.update(diffoci)
diffoci_checksum = m.hexdigest()
return diffoci_checksum == DIFFOCI_CHECKSUM
def diffoci_exists():
"""Check if the diffoci helper exists, and if the hash matches."""
def diffoci_is_installed():
"""Determine if diffoci has been installed.
Determine if diffoci has been installed, by checking if the binary exists, and if
its hash is the expected one. If the binary exists but the hash is different, then
this is a sign that we need to update the local diffoci binary.
"""
if not DIFFOCI_PATH.exists():
return False
return diffoci_hash_matches(DIFFOCI_PATH.open().read())
return diffoci_hash_matches(DIFFOCI_PATH.open("rb").read())
def diffoci_download():
@ -79,8 +84,7 @@ def diffoci_diff(source, local_target):
"diff",
source,
target,
"--ignore-timestamps",
"--ignore-image-name",
"--semantic",
"--verbose",
)
except subprocess.CalledProcessError as e:
@ -134,7 +138,7 @@ def main():
commit = git_commit_get()
git_verify(commit, args.source)
if diffoci_exists():
if not diffoci_is_installed():
logger.info(f"Downloading diffoci helper from {DIFFOCI_URL}")
diffoci_download()

View file

@ -11,105 +11,38 @@ Our build artifacts consist of:
* Fedora packages (for regular Fedora distros and Qubes)
* Debian packages (for Debian and Ubuntu)
As of writing this, none of the above artifacts are reproducible. For this
reason, we purposefully build them in machines owned by FPF, since we can't
trust third-party servers. A security hole in GitHub, or
in our CI pipeline (check out the
[Ultralytics cryptominer saga](https://github.com/ultralytics/ultralytics/issues/18027)),
may allow attackers to plant a malicious artifact with no detection.
As of writing this, only the following artifacts are reproducible:
* Container images (see [#1047](https://github.com/freedomofpress/dangerzone/issues/1047))
Still, building our artifacts in private is not ideal. Third parties cannot
easily audit if our artifacts have been built correctly or if they have been
tampered with. For instance, our Apple Silicon container image builds PyMuPDF
from source, and while the PyPI source package is hashed, the produced output
does not have a known hash. So, it's not easy to verify it's been built
correctly (read also the seminal
["Reflections on Trusting Trust"](https://www.cs.cmu.edu/~rdriley/487/papers/Thompson_1984_ReflectionsonTrustingTrust.pdf)
lecture by Ken Thompson on that subject).
In order to make our builds auditable and allow building artifacts in
third-party servers safely, we want to make each artifact build reproducible. In
the following sections, we'll lay down the plan to do so for each artifact type.
In the following sections, we'll mention some specifics about enforcing
reproducibility for each artifact type.
## Container image
### Current limitations
Our container image is currently not reproducible for the following main
reasons:
* We build PyMuPDF from source, since it's not available in Alpine Linux. The
result of this build is not reproducible. Note that PyMuPDF wheels are
available from PyPI, but there are no ARM wheels for the musl libc platforms.
* Alpine Linux does not have a way to pin packages and their dependencies, and
does not retain old packages. There's a
[workaround](https://github.com/reproducible-containers/repro-pkg-cache)
to download the required packages and store them elsewhere, but then the
cached package downloads cannot be easily audited.
## Proposed implementation
We can take advantage of the
[Debian snapshot archives](https://snapshot.debian.org/)
and pin our packages by specifying a date. There's already
[prior art](https://github.com/reproducible-containers/repro-sources-list.sh/)
for that, thanks to the incredible work of @AkihiroSuda on
[reproducible containers](https://github.com/reproducible-containers).
As for PyMuPDF, it is available from the Debian repos, so we won't have to build
it from source.
Here are a few other obstacles that we need to overcome:
* We currently download the
[latest gVisor version](https://gvisor.dev/docs/user_guide/install/#latest-release)
from a GCS bucket. Now that we have switched to Debian, we can take advantage
of their
[timestamped APT repos](https://gvisor.dev/docs/user_guide/install/#specific-release)
and download specific releases from those. An extra benefit is that such
releases are signed with their APT key.
* We can no longer update the packages in the container image by rebuilding it.
We have to bump the dates in the Dockerfile first, which is a minor hassle,
but much more declarative.
* The `repro-source-list-.sh` script uses the release date of the container
image. However, the Debian image is not updated daily (see
[newest tags](https://hub.docker.com/_/debian/tags)
in DockerHub). So, if we want to ship an emergency release, we have to
circumvent this limitation. A simple way is to trick the script by bumping the
date of the `/etc/apt/sources.list.d/debian.sources` and
`/etc/apt/sources.list` files.
* While we talk about image reproducibility, we can't actually achieve the exact
same SHA-256 hash for two different image builds. That's because the file
timestamps in the image layers will differ, depending on when the build took
place. The rest of the image though (file contents, permissions, manifest)
should be byte-for-byte the same. A simple way to check this is with the
[`diffoci`](https://github.com/reproducible-containers/diffoci) tool, and
specifically this invocation:
```
./diffoci diff podman://<new_image_tag> podman://<old_image_tag> \
--ignore-timestamps --ignore-image-name --verbose
```
### Updating the image
The fact that our image is reproducible also means that it's frozen in time.
This means that rebuilding the image without updating our Dockerfile will **not**
receive security updates.
This means that rebuilding the image without updating our Dockerfile will
**not** receive security updates.
We list the necessary variables that make up our image in the `Dockerfile.env`
file. These are:
Here are the necessary variables that make up our image in the `Dockerfile.env`
file:
* `DEBIAN_IMAGE_DATE`: The date that the Debian container image was released
* `DEBIAN_ARCHIVE_DATE`: The Debian snapshot repo that we want to use
* `GVISOR_ARCHIVE_DATE`: The gVisor APT repo that we want to use
* `H2ORESTART_CHECKSUM`: The SHA-256 checksum of the H2ORestart plugin
* `H2ORESTART_VERSION`: The version of the H2ORestart plugin
If you update these values in `Dockerfile.env`, you can create a new Dockerfile
with:
If you update these values in `Dockerfile.env`, you must also create a new
Dockerfile with:
```
poetry run jinja2 Dockerfile.in Dockerfile.env > Dockerfile
make Dockerfile
```
Updating `Dockerfile` without bumping `Dockerfile.in` is detected and should
trigger a CI error.
### Reproducing the image
For a simple way to reproduce a Dangerzone container image, either local or

View file

@ -27,7 +27,7 @@ def str2bool(v):
raise argparse.ArgumentTypeError("Boolean value expected.")
def determine_tag():
def determine_git_tag():
# Designate a unique tag for this image, depending on the Git commit it was created
# from:
# 1. If created from a Git tag (e.g., 0.8.0), the image tag will be `0.8.0`.
@ -90,7 +90,7 @@ def main():
print(f"Building for architecture '{ARCH}'")
tag = args.tag or determine_tag()
tag = args.tag or determine_git_tag()
image_name_tagged = IMAGE_NAME + ":" + tag
print(f"Will tag the container image as '{image_name_tagged}'")