Switch base image from Alpine Linux to Debian Stable, in order to reduce
our image footprint, improve our security posture, and build our
container image reproducibly.
Fixes#1046
Refs #1047
Remove the need to copy the Dangerzone container image (used by the
inner container) within a wrapper gVisor image (used by the outer
container). Instead, use the root of the container filesystem for both
containers. We can do this safely because we don't mount any secrets to
the container, and because gVisor offers a read-only view of the
underlying filesystem
Fixes#1048
Move container-only build context (currently just the entrypoint script)
from `dangerzone/gvisor_wrapper` to `dangerzone/container_helpers`.
Update the rest of the scripts to use this location as well.
The PyMuPDF wheels for version 1.24.11 have changed the way they are
being built, which means we have to adapt our Dockerfile in order to
install them properly.
This removes the need to build the PyMuPDF project by ourselves, but
only when on non-ARM architectures since the wheels for these are not
provided yet.
Changes the `Dockerfile` and `build-image.py` script, introducing a new
`ARCH` flag to conditionally build the wheels.
This wraps the existing container image inside a gVisor-based sandbox.
gVisor is an open-source OCI-compliant container runtime.
It is a userspace reimplementation of the Linux kernel in a
memory-safe language.
It works by creating a sandboxed environment in which regular Linux
applications run, but their system calls are intercepted by gVisor.
gVisor then redirects these system calls and reinterprets them in
its own kernel. This means the host Linux kernel is isolated
from the sandboxed application, thereby providing protection against
Linux container escape attacks.
It also uses `seccomp-bpf` to provide a secondary layer of defense
against container escapes. Even if its userspace kernel gets
compromised, attackers would have to additionally have a Linux
container escape vector, and that exploit would have to fit within
the restricted `seccomp-bpf` rules that gVisor adds on itself.
Fixes#126Fixes#224Fixes#225Fixes#228
The minimum python version when installing from source is now python
3.9, as Pyside6 6.7.1 dropped support for python 3.8 (see #780 for more
information).
On Debian-derivatives distributions, the minimum Python version is now
set to 3.8. In practice, because Pyside6 is not packaged for Debian, we
use Pyside2 [0], which is why we can relax the python version requirement.
In practice, when installing from source on an environment where
python3.9 is not the default python, poetry will look for it and use it
if available
> For various reasons, this Python version might not be compatible with
> the python range supported by the project. In this case, Poetry will
> try to find one that is and use it.
>
> [Poetry docs](https://python-poetry.org/docs/managing-environments/)
On Ubuntu Focal (20.04) where Python 3.9 is not installed by default,
it is possible to install it using the `python3.9` package.
Additionally, In version 1.24.3, PyMuPDF changed its package name from `fitz`
to `pymupdf` [2], resulting in a breakage on how it is installed in our
container. This is now fixed.
[0] More information on how Pyside6 packaging affects dangerzone on #221
[1] See [the current status of Pyside6 packaging](https://repology.org/
project/python:pyside6/packages)
[2] PyMuPDF changelog: https://pymupdf.readthedocs.io/en/latest/changes.html#change-log
Alpine Linux 3.20 was released recently [1]. As a result, the
`alpine:latest` image ref, that our Dockerfile uses, switched from the
3.19 to the 3.20 Alpine Linux release. This release has Python 3.12,
meaning that the following line in our Dockerfile now fails:
COPY --from=pymupdf-build /usr/lib/python3.11/site-packages/fitz/ /usr/lib/python3.11/site-packages/fitz
Bump the Python version in the Python system path to 3.12, so that we
can successfully build the container image.
[1]: https://alpinelinux.org/posts/Alpine-3.20.0-released.html
Merge Qubes and Containers isolation providers core code into the class
parent IsolationProviders abstract class.
This is done by streaming pages in containers for exclusively in first
conversion process. The commit is rather large due to the multiple
interdependencies of the code, making it difficult to split into various
commits.
The main conversion method (_convert) now in the superclass simply calls
two methods:
- doc_to_pixels()
- pixels_to_pdf()
Critically, doc_to_pixels is implemented in the superclass, diverging
only in a specialized method called "start_doc_to_pixels_proc()". This
method obtains the process responsible that communicates with the
isolation provider (container / disp VM) via `podman/docker` and qrexec
on Containers and Qubes respectively.
Known regressions:
- progress reports stopped working on containers
Fixes#443
PyMuPDF replaced the need for almost all dependencies, which this commit
now removes.
We are also removing tesseract-ocr as a dependency since
(to our surprise) PyMuPDF ships directly with tesseract binaries [1].
However, now that tesseract-ocr is not available directly as a binary
tool, the `test_ocr.py` needed to be changed.
Fixes#658
[1]: https://github.com/freedomofpress/dangerzone/issues/658#issuecomment-1861033149
Breaks down the container build into multiple stages in order to speed
up build times. Building PyMuPDF was taking too long and this way it can
be cached.
The original version was made by @apyrgio
Ensure that when the container image is installing pymupdf (unavailable
in the repos) with verified hashes. To do so, it has the pymupdf
dependency declared in a "container" group in `pyproject.toml`, which
then gets exported into a requirements.txt, which is then used for
hash-verification when building the container.
Because this required modifying the container image build scripts, they
were all merged to avoid duplicate code. This was an overdue change
anyways.
We're intentionally bypassing PEP 668 [1], which prevents the
installation of non-distro python wheels alongside system packages to
avoid incompatibilities at distro-level.
We are intentionally bypassing this since our container image is a
controlled environment (we only ship a version after rigorous testing).
[1]: https://peps.python.org/pep-0668/
Use PyMuPDF (AGPL-licensed) within the container conversion to replace
the pdf conversion to RGB. This massively simplifies the code since
PyMuPDF is a native python library.
Switch to the tessdata-fast Tesseract model, instead of the tessdata
one. The tessdata-fast Tesseract model is much smaller, and a bit faster
than the other one. Also, it's the model that Debian/Fedora ship by
default.
Closes#545
The Alpine Linux team has enabled Java support for LibreOffice on ARM
architecture:
74d443f479
This commit is included in 7.5.5.2-r2, so the installed LibreOffice
package should be 7.5.5.2-r2 or higher to fix this issue.
However 3.18 doesn't have the 7.5.5.2-r2 package:
https://pkgs.alpinelinux.org/package/v3.18/community/aarch64/libreoffice
The Dangerzone image uses the alpine:latest image which is 3.18 as of
writing this.
For this reason, we switch to the edge repo of Alpine Linux, which
includes this fix.
Refs #498
Refs #540
Refs #542
Only load the LibreOffice extension for opening hwp/hwpx when it is
actually needed. Adding an extension to libreoffice may allow for it to
run arbitrary code. This makes it trust more scalable by trusting
LibreOffice extensions only for the filetypes which they target.
Reasoning
---------
Assuming a malicious `.oxt` extension this means that the extension has
arbitrary code execution in the container. While this is not an
existential threat in itself, we should not expose every Dangerzone user
to it. This is achieved by dynamically loading the extension at runtime
only when needed.
This ensures that a compromised extension will in its least malicious
form be able to modify the visual content of any hancom office files but
not *every file*. In the more malicious version, if the code execution
manages to do a container escape, this will only affect users that have
converted a Hancom office file.
H2ORestart is a LibreOffice extension which adds Hancom HWP/HWPX (Hangul Word Processor)
supports for LibreOffice. This format is widely used in South Korea.
Version: v0.5.7
Extension Repository: https://github.com/ebandal/H2Orestart/releases
The files in `container/` no longer make sense to have that name since
the "document to pixels" part will run in Qubes OS in its own virtual
machine.
To adapt to this, this PR does the following:
- Moves all the files in `container` to `dangerzone/conversion`
- Splits the old `container/dangerzone.py` into its two components
`dangerzone/conversion/{doc_to_pixels,pixels_to_pdf}.py` with a
`common.py` file for shared functions
- Moves the Dockerfile to the project root and adapts it to the new
container code location
- Updates the CircleCI config to properly cache Docker images.
- Updates our install scripts to properly build Docker images.
- Adds the new conversion module to the container image, so that it can
be imported as a package.
- Adapts the container isolation provider to use the new way of calling
the code.
NOTE: We have made zero changes to the conversion code in this commit,
except for necessary imports in order to factor out some common parts.
Any changes necessary for Qubes integration follow in the subsequent
commits.