mirror of
https://github.com/freedomofpress/dangerzone.git
synced 2025-05-17 18:51:50 +02:00
Compare commits
13 commits
02e63e5a49
...
cbb7ed902f
Author | SHA1 | Date | |
---|---|---|---|
![]() |
cbb7ed902f | ||
![]() |
eacf1eb2fa | ||
![]() |
505db39ca0 | ||
![]() |
0f0fa49923 | ||
![]() |
8911b72529 | ||
![]() |
725ce3b9c7 | ||
![]() |
afc5e8e636 | ||
![]() |
5bb37ef48f | ||
![]() |
c70d1970dd | ||
![]() |
ec616be2c0 | ||
![]() |
acbc433717 | ||
![]() |
685cf431a3 | ||
![]() |
2b71e615a8 |
13 changed files with 80 additions and 111 deletions
7
.github/workflows/ci.yml
vendored
7
.github/workflows/ci.yml
vendored
|
@ -473,8 +473,6 @@ jobs:
|
|||
bash -c 'cd dangerzone; poetry run make test'
|
||||
|
||||
check-reproducibility:
|
||||
needs:
|
||||
- build-container-image
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
@ -489,8 +487,9 @@ jobs:
|
|||
|
||||
- name: Verify that the Dockerfile matches the commited template and params
|
||||
run: |-
|
||||
poetry run jinja2 Dockerfile.in Dockerfile.env > out
|
||||
diff Dockerfile out
|
||||
cp Dockerfile Dockerfile.orig
|
||||
make Dockerfile
|
||||
diff Dockerfile.orig Dockerfile
|
||||
|
||||
- name: Build Dangerzone container image
|
||||
run: |
|
||||
|
|
2
.github/workflows/scan.yml
vendored
2
.github/workflows/scan.yml
vendored
|
@ -26,7 +26,7 @@ jobs:
|
|||
run: |
|
||||
date=$(date "+%Y%m%d")
|
||||
sed -i "s/DEBIAN_ARCHIVE_DATE=[0-9]\+/DEBIAN_ARCHIVE_DATE=${date}/" Dockerfile.env
|
||||
poetry run jinja2 Dockerfile.in Dockerfile.env > Dockerfile
|
||||
make Dockerfile
|
||||
- name: Build container image
|
||||
run: python3 ./install/common/build-image.py --runtime docker --no-save
|
||||
- name: Get image tag
|
||||
|
|
38
.grype.yaml
38
.grype.yaml
|
@ -2,10 +2,38 @@
|
|||
# latest release of Dangerzone, and offer our analysis.
|
||||
|
||||
ignore:
|
||||
# CVE-2024-11053
|
||||
# CVE-2023-45853
|
||||
# ==============
|
||||
#
|
||||
# NVD Entry: https://nvd.nist.gov/vuln/detail/CVE-2024-11053
|
||||
# Verdict: Dangerzone is not affected because libcurl is an HTTP client, and
|
||||
# the Dangerzone container does not make any network calls.
|
||||
- vulnerability: CVE-2024-11053
|
||||
# Debian tracker: https://security-tracker.debian.org/tracker/CVE-2023-45853
|
||||
# Verdict: Dangerzone is not affected because the zlib library in Debian is
|
||||
# built in a way that is not vulnerable.
|
||||
- vulnerability: CVE-2023-45853
|
||||
# CVE-2024-38428
|
||||
# ==============
|
||||
#
|
||||
# Debian tracker: https://security-tracker.debian.org/tracker/CVE-2024-38428
|
||||
# Verdict: Dangerzone is not affected because it doesn't use wget in the
|
||||
# container image (which also has no network connectivity).
|
||||
- vulnerability: CVE-2024-38428
|
||||
# CVE-2024-57823
|
||||
# ==============
|
||||
#
|
||||
# Debian tracker: https://security-tracker.debian.org/tracker/CVE-2024-57823
|
||||
# Verdict: Dangerzone is not affected. First things first, LibreOffice is
|
||||
# using this library for parsing RDF metadata in a document [1], and has
|
||||
# issued a fix for the vendored raptor2 package they have for other distros
|
||||
# [2].
|
||||
#
|
||||
# On the other hand, the Debian security team has stated that this is a minor
|
||||
# issue [3], and there's no fix from the developers yet. It seems that the
|
||||
# Debian package is not affected somehow by this CVE, probably due to the way
|
||||
# it's packaged.
|
||||
#
|
||||
# [1] https://wiki.documentfoundation.org/Documentation/DevGuide/Office_Development#RDF_metadata
|
||||
# [2] https://cgit.freedesktop.org/libreoffice/core/commit/?id=2b50dc0e4482ac0ad27d69147b4175e05af4fba4
|
||||
# [2] From https://security-tracker.debian.org/tracker/CVE-2024-57823:
|
||||
#
|
||||
# [bookworm] - raptor2 <postponed> (Minor issue, revisit when fixed upstream)
|
||||
#
|
||||
- vulnerability: CVE-2024-57823
|
||||
|
|
|
@ -6,8 +6,8 @@ ARG DEBIAN_IMAGE_DATE=20250113
|
|||
|
||||
FROM debian:bookworm-${DEBIAN_IMAGE_DATE}-slim
|
||||
|
||||
ARG GVISOR_ARCHIVE_DATE=20250106
|
||||
ARG DEBIAN_ARCHIVE_DATE=20250114
|
||||
ARG GVISOR_ARCHIVE_DATE=20250113
|
||||
ARG DEBIAN_ARCHIVE_DATE=20250120
|
||||
ARG H2ORESTART_CHECKSUM=7760dc2963332c50d15eee285933ec4b48d6a1de9e0c0f6082946f93090bd132
|
||||
ARG H2ORESTART_VERSION=v0.7.0
|
||||
|
||||
|
@ -62,7 +62,7 @@ RUN mkdir -p /opt/dangerzone/dangerzone
|
|||
RUN touch /opt/dangerzone/dangerzone/__init__.py
|
||||
|
||||
# Copy only the Python code, and not any produced .pyc files.
|
||||
COPY conversion/*.py /opt/dangerzone/dangerzone/conversion
|
||||
COPY conversion/*.py /opt/dangerzone/dangerzone/conversion/
|
||||
|
||||
# Let the entrypoint script write the OCI config for the inner container under
|
||||
# /config.json.
|
||||
|
@ -76,6 +76,6 @@ USER dangerzone
|
|||
# store the state of its containers.
|
||||
RUN mkdir /home/dangerzone/.containers
|
||||
|
||||
COPY container/entrypoint.py /
|
||||
COPY container_helpers/entrypoint.py /
|
||||
|
||||
ENTRYPOINT ["/entrypoint.py"]
|
||||
|
|
|
@ -1,9 +1,9 @@
|
|||
# Can be bumped to the latest date in https://hub.docker.com/_/debian/tags?name=bookworm-
|
||||
DEBIAN_IMAGE_DATE=20250113
|
||||
# Can be bumped to today's date
|
||||
DEBIAN_ARCHIVE_DATE=20250114
|
||||
DEBIAN_ARCHIVE_DATE=20250120
|
||||
# Can be bumped to the latest date in https://github.com/google/gvisor/tags
|
||||
GVISOR_ARCHIVE_DATE=20250106
|
||||
GVISOR_ARCHIVE_DATE=20250113
|
||||
# Can be bumped to the latest version and checksum from https://github.com/ebandal/H2Orestart/releases
|
||||
H2ORESTART_CHECKSUM=7760dc2963332c50d15eee285933ec4b48d6a1de9e0c0f6082946f93090bd132
|
||||
H2ORESTART_VERSION=v0.7.0
|
||||
|
|
|
@ -62,7 +62,7 @@ RUN mkdir -p /opt/dangerzone/dangerzone
|
|||
RUN touch /opt/dangerzone/dangerzone/__init__.py
|
||||
|
||||
# Copy only the Python code, and not any produced .pyc files.
|
||||
COPY conversion/*.py /opt/dangerzone/dangerzone/conversion
|
||||
COPY conversion/*.py /opt/dangerzone/dangerzone/conversion/
|
||||
|
||||
# Let the entrypoint script write the OCI config for the inner container under
|
||||
# /config.json.
|
||||
|
@ -76,6 +76,6 @@ USER dangerzone
|
|||
# store the state of its containers.
|
||||
RUN mkdir /home/dangerzone/.containers
|
||||
|
||||
COPY container/entrypoint.py /
|
||||
COPY container_helpers/entrypoint.py /
|
||||
|
||||
ENTRYPOINT ["/entrypoint.py"]
|
||||
|
|
3
Makefile
3
Makefile
|
@ -47,6 +47,9 @@ test-large: test-large-init ## Run large test set
|
|||
python -m pytest --tb=no tests/test_large_set.py::TestLargeSet -v $(JUNIT_FLAGS) --junitxml=$(TEST_LARGE_RESULTS)
|
||||
python $(TEST_LARGE_RESULTS)/report.py $(TEST_LARGE_RESULTS)
|
||||
|
||||
Dockerfile: Dockerfile.env Dockerfile.in
|
||||
poetry run jinja2 Dockerfile.in Dockerfile.env > Dockerfile
|
||||
|
||||
.PHONY: build-clean
|
||||
build-clean:
|
||||
doit clean
|
||||
|
|
|
@ -129,7 +129,9 @@ class DocumentToPixels(DangerzoneConverter):
|
|||
# At least .odt, .docx, .odg, .odp, .ods, and .pptx
|
||||
"application/zip": {
|
||||
"type": "libreoffice",
|
||||
# NOTE: Older `file` command cannot detect hwpx files properly.
|
||||
# NOTE: `file` command < 5.45 cannot detect hwpx files properly, so we
|
||||
# enable the extension in any case. See also:
|
||||
# https://github.com/freedomofpress/dangerzone/pull/460#issuecomment-1654166465
|
||||
"libreoffice_ext": "h2orestart.oxt",
|
||||
},
|
||||
# At least .doc, .docx, .odg, .odp, .odt, .pdf, .ppt, .pptx, .xls, and .xlsx
|
||||
|
|
2
debian/rules
vendored
2
debian/rules
vendored
|
@ -9,5 +9,5 @@ export DH_VERBOSE=1
|
|||
dh $@ --with python3 --buildsystem=pybuild
|
||||
|
||||
override_dh_builddeb:
|
||||
./install/linux/vendor-pymupdf.py --dest debian/dangerzone/usr/lib/python3/dist-packages/dangerzone/vendor/
|
||||
./install/linux/debian-vendor-pymupdf.py --dest debian/dangerzone/usr/lib/python3/dist-packages/dangerzone/vendor/
|
||||
dh_builddeb $@
|
||||
|
|
|
@ -42,16 +42,21 @@ def git_verify(commit, source):
|
|||
def diffoci_hash_matches(diffoci):
|
||||
"""Check if the hash of the downloaded diffoci bin matches the expected one."""
|
||||
m = hashlib.sha256()
|
||||
m.update(DIFFOCI_PATH.open().read())
|
||||
m.update(diffoci)
|
||||
diffoci_checksum = m.hexdigest()
|
||||
return diffoci_checksum == DIFFOCI_CHECKSUM
|
||||
|
||||
|
||||
def diffoci_exists():
|
||||
"""Check if the diffoci helper exists, and if the hash matches."""
|
||||
def diffoci_is_installed():
|
||||
"""Determine if diffoci has been installed.
|
||||
|
||||
Determine if diffoci has been installed, by checking if the binary exists, and if
|
||||
its hash is the expected one. If the binary exists but the hash is different, then
|
||||
this is a sign that we need to update the local diffoci binary.
|
||||
"""
|
||||
if not DIFFOCI_PATH.exists():
|
||||
return False
|
||||
return diffoci_hash_matches(DIFFOCI_PATH.open().read())
|
||||
return diffoci_hash_matches(DIFFOCI_PATH.open("rb").read())
|
||||
|
||||
|
||||
def diffoci_download():
|
||||
|
@ -79,8 +84,7 @@ def diffoci_diff(source, local_target):
|
|||
"diff",
|
||||
source,
|
||||
target,
|
||||
"--ignore-timestamps",
|
||||
"--ignore-image-name",
|
||||
"--semantic",
|
||||
"--verbose",
|
||||
)
|
||||
except subprocess.CalledProcessError as e:
|
||||
|
@ -134,7 +138,7 @@ def main():
|
|||
commit = git_commit_get()
|
||||
git_verify(commit, args.source)
|
||||
|
||||
if diffoci_exists():
|
||||
if not diffoci_is_installed():
|
||||
logger.info(f"Downloading diffoci helper from {DIFFOCI_URL}")
|
||||
diffoci_download()
|
||||
|
||||
|
|
|
@ -11,105 +11,38 @@ Our build artifacts consist of:
|
|||
* Fedora packages (for regular Fedora distros and Qubes)
|
||||
* Debian packages (for Debian and Ubuntu)
|
||||
|
||||
As of writing this, none of the above artifacts are reproducible. For this
|
||||
reason, we purposefully build them in machines owned by FPF, since we can't
|
||||
trust third-party servers. A security hole in GitHub, or
|
||||
in our CI pipeline (check out the
|
||||
[Ultralytics cryptominer saga](https://github.com/ultralytics/ultralytics/issues/18027)),
|
||||
may allow attackers to plant a malicious artifact with no detection.
|
||||
As of writing this, only the following artifacts are reproducible:
|
||||
* Container images (see [#1047](https://github.com/freedomofpress/dangerzone/issues/1047))
|
||||
|
||||
Still, building our artifacts in private is not ideal. Third parties cannot
|
||||
easily audit if our artifacts have been built correctly or if they have been
|
||||
tampered with. For instance, our Apple Silicon container image builds PyMuPDF
|
||||
from source, and while the PyPI source package is hashed, the produced output
|
||||
does not have a known hash. So, it's not easy to verify it's been built
|
||||
correctly (read also the seminal
|
||||
["Reflections on Trusting Trust"](https://www.cs.cmu.edu/~rdriley/487/papers/Thompson_1984_ReflectionsonTrustingTrust.pdf)
|
||||
lecture by Ken Thompson on that subject).
|
||||
|
||||
In order to make our builds auditable and allow building artifacts in
|
||||
third-party servers safely, we want to make each artifact build reproducible. In
|
||||
the following sections, we'll lay down the plan to do so for each artifact type.
|
||||
In the following sections, we'll mention some specifics about enforcing
|
||||
reproducibility for each artifact type.
|
||||
|
||||
## Container image
|
||||
|
||||
### Current limitations
|
||||
|
||||
Our container image is currently not reproducible for the following main
|
||||
reasons:
|
||||
|
||||
* We build PyMuPDF from source, since it's not available in Alpine Linux. The
|
||||
result of this build is not reproducible. Note that PyMuPDF wheels are
|
||||
available from PyPI, but there are no ARM wheels for the musl libc platforms.
|
||||
* Alpine Linux does not have a way to pin packages and their dependencies, and
|
||||
does not retain old packages. There's a
|
||||
[workaround](https://github.com/reproducible-containers/repro-pkg-cache)
|
||||
to download the required packages and store them elsewhere, but then the
|
||||
cached package downloads cannot be easily audited.
|
||||
|
||||
## Proposed implementation
|
||||
|
||||
We can take advantage of the
|
||||
[Debian snapshot archives](https://snapshot.debian.org/)
|
||||
and pin our packages by specifying a date. There's already
|
||||
[prior art](https://github.com/reproducible-containers/repro-sources-list.sh/)
|
||||
for that, thanks to the incredible work of @AkihiroSuda on
|
||||
[reproducible containers](https://github.com/reproducible-containers).
|
||||
As for PyMuPDF, it is available from the Debian repos, so we won't have to build
|
||||
it from source.
|
||||
|
||||
Here are a few other obstacles that we need to overcome:
|
||||
* We currently download the
|
||||
[latest gVisor version](https://gvisor.dev/docs/user_guide/install/#latest-release)
|
||||
from a GCS bucket. Now that we have switched to Debian, we can take advantage
|
||||
of their
|
||||
[timestamped APT repos](https://gvisor.dev/docs/user_guide/install/#specific-release)
|
||||
and download specific releases from those. An extra benefit is that such
|
||||
releases are signed with their APT key.
|
||||
* We can no longer update the packages in the container image by rebuilding it.
|
||||
We have to bump the dates in the Dockerfile first, which is a minor hassle,
|
||||
but much more declarative.
|
||||
* The `repro-source-list-.sh` script uses the release date of the container
|
||||
image. However, the Debian image is not updated daily (see
|
||||
[newest tags](https://hub.docker.com/_/debian/tags)
|
||||
in DockerHub). So, if we want to ship an emergency release, we have to
|
||||
circumvent this limitation. A simple way is to trick the script by bumping the
|
||||
date of the `/etc/apt/sources.list.d/debian.sources` and
|
||||
`/etc/apt/sources.list` files.
|
||||
* While we talk about image reproducibility, we can't actually achieve the exact
|
||||
same SHA-256 hash for two different image builds. That's because the file
|
||||
timestamps in the image layers will differ, depending on when the build took
|
||||
place. The rest of the image though (file contents, permissions, manifest)
|
||||
should be byte-for-byte the same. A simple way to check this is with the
|
||||
[`diffoci`](https://github.com/reproducible-containers/diffoci) tool, and
|
||||
specifically this invocation:
|
||||
|
||||
```
|
||||
./diffoci diff podman://<new_image_tag> podman://<old_image_tag> \
|
||||
--ignore-timestamps --ignore-image-name --verbose
|
||||
```
|
||||
|
||||
### Updating the image
|
||||
|
||||
The fact that our image is reproducible also means that it's frozen in time.
|
||||
This means that rebuilding the image without updating our Dockerfile will **not**
|
||||
receive security updates.
|
||||
This means that rebuilding the image without updating our Dockerfile will
|
||||
**not** receive security updates.
|
||||
|
||||
We list the necessary variables that make up our image in the `Dockerfile.env`
|
||||
file. These are:
|
||||
Here are the necessary variables that make up our image in the `Dockerfile.env`
|
||||
file:
|
||||
* `DEBIAN_IMAGE_DATE`: The date that the Debian container image was released
|
||||
* `DEBIAN_ARCHIVE_DATE`: The Debian snapshot repo that we want to use
|
||||
* `GVISOR_ARCHIVE_DATE`: The gVisor APT repo that we want to use
|
||||
* `H2ORESTART_CHECKSUM`: The SHA-256 checksum of the H2ORestart plugin
|
||||
* `H2ORESTART_VERSION`: The version of the H2ORestart plugin
|
||||
|
||||
If you update these values in `Dockerfile.env`, you can create a new Dockerfile
|
||||
with:
|
||||
If you update these values in `Dockerfile.env`, you must also create a new
|
||||
Dockerfile with:
|
||||
|
||||
```
|
||||
poetry run jinja2 Dockerfile.in Dockerfile.env > Dockerfile
|
||||
make Dockerfile
|
||||
```
|
||||
|
||||
Updating `Dockerfile` without bumping `Dockerfile.in` is detected and should
|
||||
trigger a CI error.
|
||||
|
||||
### Reproducing the image
|
||||
|
||||
For a simple way to reproduce a Dangerzone container image, either local or
|
||||
|
|
|
@ -27,7 +27,7 @@ def str2bool(v):
|
|||
raise argparse.ArgumentTypeError("Boolean value expected.")
|
||||
|
||||
|
||||
def determine_tag():
|
||||
def determine_git_tag():
|
||||
# Designate a unique tag for this image, depending on the Git commit it was created
|
||||
# from:
|
||||
# 1. If created from a Git tag (e.g., 0.8.0), the image tag will be `0.8.0`.
|
||||
|
@ -90,7 +90,7 @@ def main():
|
|||
|
||||
print(f"Building for architecture '{ARCH}'")
|
||||
|
||||
tag = args.tag or determine_tag()
|
||||
tag = args.tag or determine_git_tag()
|
||||
image_name_tagged = IMAGE_NAME + ":" + tag
|
||||
|
||||
print(f"Will tag the container image as '{image_name_tagged}'")
|
||||
|
|
Loading…
Reference in a new issue