Add a doc that contains an MP4 video in it, which has an audio and video
stream. This type of document could not be converted with the latest
Dangerzone releases, because PyMuPDF threw this error in the container's
stdout:
MuPDF error: unsupported error: cannot create appearance stream for
Screen annotations
This error message was treated literally by our client code, which
parsed the first few bytes in order to find out the page height/width.
This resulted to a misleading Dangerzone error, e.g.:
A page exceeded the maximum height
This issue started occurring since 0.6.0, which added streaming support,
and was fixed by commit 3f86e7b465. That
fix was not accompanied by a test document that would ensure we would
not have this regression from now on, so we add it in this
commit.
Refs #877Closes#917
Remove some macOS entitlements that are not necessary for the current
iteration of Dangerzone. Those are the ability to run as a hypervisor,
and the ability to accept network connections. They are a relic from
when we were experimenting with VMs, instead of relying on Docker
Desktop.
Make the Dummy isolation provider follow the rest of the isolation
providers and perform the second part of the conversion on the host. The
first part of the conversion is just a dummy script that reads a file
from stdin and prints pixels to stdout.
Extend the base isolation provider to immediately convert each page to
a PDF, and optionally use OCR. In contract with the way we did things
previously, there are no more two separate stages (document to pixels,
pixels to PDF). We now handle each page individually, for two main
reasons:
1. We don't want to buffer pixel data, either on disk or in memory,
since they take a lot of space, and can potentially leave traces.
2. We can perform these operations in parallel, saving time. This is
more evident when OCR is not used, where the time to convert a page
to pixels, and then back to a PDF are comparable.
The PyMuPDF package was previously mainly used within the Dangerzone
container, as well as on Qubes. With on-host conversion, PyMuPDF will be
used in all supported platforms by default. For this reason, we can
promote it to a main dependency.
Add a new way to detect where the Tesseract data are stored in a user's
system. On Linux, the Tesseract data should be installed via the package
manager. On macOS and Windows, they should be bundled with the
Dangerzone application.
There is also the exception of running Dangerzone locally, where even
on Linux, we should get the Tesseract data from the Dangerzone share/
folder.
Add a Python script that can run in all supported platforms, and can
download and extract the Tesseract language data from GitHub, while
also:
1. Checking that the expected hash matches.
2. Informing the user if the language data have already been downloaded.
3. Extracting only the subset of language data that Dangerzone needs
GitHub actions somehow managed to downgrade our runners from Ubuntu
24.04 to Ubuntu 22.04, even though we use `ubuntu-latest`. Make the
Ubuntu 24.04 requirement more explicit, until GitHub migrates fully to
this version for the `ubuntu-latest` tag.
Fixes#957
Unreleased Fedora versions may refer to themselves as "rawhide", instead
of their version (e.g., "41"). For this reason, we should try and
replace the "rawhide" string with the proper Fedora version.
Fedora 41 has a newer dnf interface (dnf v5), and the config-manager
plugin that we use is not compatible with it. Suggest running it with
`dnf-3` instead, which is present in all Fedora versions.
It seems that the container image for Ubuntu 24.10 also ships with a
default Ubuntu user with UID 1000, so we need to remove it when creating
our dev environment.
Try installing `passt`, which is responsible for user networking in
later Podman releases. If not installed, building the container image
within an Ubuntu 24.10 environment fails with:
setup network: could not find pasta, the network namespace can't be
configured: exec: "pasta": executable file not found in $PATH
Note that this package is not available in older Ubuntu versions. In
these cases, we should swallow installation failures and continue.
Install PyMuPDF under ./dangerzone/vendor, right before we build the
.deb package. We vendor PyMuPDF just for Debian, since the provided
versions don't have OCR support enabled.
Currently, we don't use PyMuPDf on the host, but this will change once
we fully implement the on-host conversion feature.
Refs #625
Add a script that installs PyMuPDF under ./dangerzone/vendor. This will
be useful in subsequent commits, for vendoring PyMuPDF when building
Debian packages.
The PyMuPDF wheels for version 1.24.11 have changed the way they are
being built, which means we have to adapt our Dockerfile in order to
install them properly.
Remove the installation steps for Xvfb, since it's already included in
GitHub actions, and fire up an Xvfb server with disabled host-based
access control.
Initially, we tried to wrap our CI tests with `xvfb-run`, but any
X11 client within our Podman container failed with the following error
message:
Authorization required, but no authorization protocol specified.
This error message is usually thrown when the X11 client does not
provide the magic cookie in the Xauthority file back to the X11 server.
In our case though, we can verify that commands in our Podman container
read the Xauthority file successfully:
socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC, 0) = 3
connect(3, {sa_family=AF_UNIX, sun_path=@"/tmp/.X11-unix/X99"}, 21) = -1 ECONNREFUSED (Connection refused)
close(3) = 0
socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC, 0) = 3
getsockopt(3, SOL_SOCKET, SO_SNDBUF, [212992], [4]) = 0
connect(3, {sa_family=AF_UNIX, sun_path="/tmp/.X11-unix/X99"}, 110) = 0
getpeername(3, {sa_family=AF_UNIX, sun_path="/tmp/.X11-unix/X99"}, [124->21]) = 0
uname({sysname="Linux", nodename="dangerzone-dev", ...}) = 0
access("/home/runner/work/dangerzone/dangerzone/cookie", R_OK) = 0
openat(AT_FDCWD, "/home/runner/work/dangerzone/dangerzone/cookie", O_RDONLY) = 4
fstat(4, {st_mode=S_IFREG|0600, st_size=59, ...}) = 0
read(4, "\1\0\0\rfv-az1915-957\0\299\0\22MIT-MAGIC"..., 4096) = 59
read(4, "", 4096) = 0
close(4) = 0
fcntl(3, F_GETFL) = 0x2 (flags O_RDWR)
fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
fcntl(3, F_SETFD, FD_CLOEXEC) = 0
poll([{fd=3, events=POLLIN|POLLOUT}], 1, -1) = 1 ([{fd=3, revents=POLLOUT}])
writev(3, [{iov_base="l\0\v\0\0\0\0\0\0\0\0\0", iov_len=12}, {iov_base="", iov_len=0}], 2) = 12
recvfrom(3, 0x55a5635c0050, 8, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=3, events=POLLIN}], 1, -1) = 1 ([{fd=3, revents=POLLIN}])
recvfrom(3, "\0@\v\0\0\0\20\0", 8, 0, NULL, NULL) = 8
recvfrom(3, "Authorization required, but no a"..., 64, 0, NULL, NULL) = 64
write(2, "Authorization required, but no a"..., 64Authorization required, but no authorization protocol specified
) = 64
The line with the magic cookie is:
read(4, "\1\0\0\rfv-az1915-957\0\299\0\22MIT-MAGIC"..., 4096) = 59
Since we are not sure why we are not allowed access to the X11 server
from the Podman container, we decided to disable host-based access
controls altogether. This is not a security concern, since this X11
session is a remote one. However, we shouldn't run tests this way in dev
machines.
Fixes#949