Make the Dummy isolation provider follow the rest of the isolation
providers and perform the second part of the conversion on the host. The
first part of the conversion is just a dummy script that reads a file
from stdin and prints pixels to stdout.
Extend the base isolation provider to immediately convert each page to
a PDF, and optionally use OCR. In contract with the way we did things
previously, there are no more two separate stages (document to pixels,
pixels to PDF). We now handle each page individually, for two main
reasons:
1. We don't want to buffer pixel data, either on disk or in memory,
since they take a lot of space, and can potentially leave traces.
2. We can perform these operations in parallel, saving time. This is
more evident when OCR is not used, where the time to convert a page
to pixels, and then back to a PDF are comparable.
The PyMuPDF package was previously mainly used within the Dangerzone
container, as well as on Qubes. With on-host conversion, PyMuPDF will be
used in all supported platforms by default. For this reason, we can
promote it to a main dependency.
Add a new way to detect where the Tesseract data are stored in a user's
system. On Linux, the Tesseract data should be installed via the package
manager. On macOS and Windows, they should be bundled with the
Dangerzone application.
There is also the exception of running Dangerzone locally, where even
on Linux, we should get the Tesseract data from the Dangerzone share/
folder.
Add a Python script that can run in all supported platforms, and can
download and extract the Tesseract language data from GitHub, while
also:
1. Checking that the expected hash matches.
2. Informing the user if the language data have already been downloaded.
3. Extracting only the subset of language data that Dangerzone needs
GitHub actions somehow managed to downgrade our runners from Ubuntu
24.04 to Ubuntu 22.04, even though we use `ubuntu-latest`. Make the
Ubuntu 24.04 requirement more explicit, until GitHub migrates fully to
this version for the `ubuntu-latest` tag.
Fixes#957
Unreleased Fedora versions may refer to themselves as "rawhide", instead
of their version (e.g., "41"). For this reason, we should try and
replace the "rawhide" string with the proper Fedora version.
Fedora 41 has a newer dnf interface (dnf v5), and the config-manager
plugin that we use is not compatible with it. Suggest running it with
`dnf-3` instead, which is present in all Fedora versions.
It seems that the container image for Ubuntu 24.10 also ships with a
default Ubuntu user with UID 1000, so we need to remove it when creating
our dev environment.
Try installing `passt`, which is responsible for user networking in
later Podman releases. If not installed, building the container image
within an Ubuntu 24.10 environment fails with:
setup network: could not find pasta, the network namespace can't be
configured: exec: "pasta": executable file not found in $PATH
Note that this package is not available in older Ubuntu versions. In
these cases, we should swallow installation failures and continue.
Install PyMuPDF under ./dangerzone/vendor, right before we build the
.deb package. We vendor PyMuPDF just for Debian, since the provided
versions don't have OCR support enabled.
Currently, we don't use PyMuPDf on the host, but this will change once
we fully implement the on-host conversion feature.
Refs #625
Add a script that installs PyMuPDF under ./dangerzone/vendor. This will
be useful in subsequent commits, for vendoring PyMuPDF when building
Debian packages.
The PyMuPDF wheels for version 1.24.11 have changed the way they are
being built, which means we have to adapt our Dockerfile in order to
install them properly.
Remove the installation steps for Xvfb, since it's already included in
GitHub actions, and fire up an Xvfb server with disabled host-based
access control.
Initially, we tried to wrap our CI tests with `xvfb-run`, but any
X11 client within our Podman container failed with the following error
message:
Authorization required, but no authorization protocol specified.
This error message is usually thrown when the X11 client does not
provide the magic cookie in the Xauthority file back to the X11 server.
In our case though, we can verify that commands in our Podman container
read the Xauthority file successfully:
socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC, 0) = 3
connect(3, {sa_family=AF_UNIX, sun_path=@"/tmp/.X11-unix/X99"}, 21) = -1 ECONNREFUSED (Connection refused)
close(3) = 0
socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC, 0) = 3
getsockopt(3, SOL_SOCKET, SO_SNDBUF, [212992], [4]) = 0
connect(3, {sa_family=AF_UNIX, sun_path="/tmp/.X11-unix/X99"}, 110) = 0
getpeername(3, {sa_family=AF_UNIX, sun_path="/tmp/.X11-unix/X99"}, [124->21]) = 0
uname({sysname="Linux", nodename="dangerzone-dev", ...}) = 0
access("/home/runner/work/dangerzone/dangerzone/cookie", R_OK) = 0
openat(AT_FDCWD, "/home/runner/work/dangerzone/dangerzone/cookie", O_RDONLY) = 4
fstat(4, {st_mode=S_IFREG|0600, st_size=59, ...}) = 0
read(4, "\1\0\0\rfv-az1915-957\0\299\0\22MIT-MAGIC"..., 4096) = 59
read(4, "", 4096) = 0
close(4) = 0
fcntl(3, F_GETFL) = 0x2 (flags O_RDWR)
fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
fcntl(3, F_SETFD, FD_CLOEXEC) = 0
poll([{fd=3, events=POLLIN|POLLOUT}], 1, -1) = 1 ([{fd=3, revents=POLLOUT}])
writev(3, [{iov_base="l\0\v\0\0\0\0\0\0\0\0\0", iov_len=12}, {iov_base="", iov_len=0}], 2) = 12
recvfrom(3, 0x55a5635c0050, 8, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=3, events=POLLIN}], 1, -1) = 1 ([{fd=3, revents=POLLIN}])
recvfrom(3, "\0@\v\0\0\0\20\0", 8, 0, NULL, NULL) = 8
recvfrom(3, "Authorization required, but no a"..., 64, 0, NULL, NULL) = 64
write(2, "Authorization required, but no a"..., 64Authorization required, but no authorization protocol specified
) = 64
The line with the magic cookie is:
read(4, "\1\0\0\rfv-az1915-957\0\299\0\22MIT-MAGIC"..., 4096) = 59
Since we are not sure why we are not allowed access to the X11 server
from the Podman container, we decided to disable host-based access
controls altogether. This is not a security concern, since this X11
session is a remote one. However, we shouldn't run tests this way in dev
machines.
Fixes#949
Do not use the `provider_wait` fixture in our termination logic tests,
and switch instead to the `provider` fixture, which instantiates a
typical isolation provider.
The `provider_wait` fixture's goal was to emulate how would the process
behave if it had fully spawned. In practice, this masked some
termination logic issues that became apparent in the WIP on-host
conversion PR. Now that we kill the spawned process via its process
group, we can just use the default isolation provider in our tests.
In practice, in this PR we just do `s/provider_wait/provider`, and
remove some stale code.
Add a fixture that returns our stock Dummy provider. Also, explicitly
use a blocking Dummy provider (`DummyWait`) for a specific test case.
This will prove useful when we stop using the `provider_wait` variant of
our isolation providers in the next commits.