The "document to pixels" code assumes that the client has called it with
some mount points in which it can write files. This is true for the
container isolation provider, but not for Qubes, who can communicate
with the client only via stdin/stdout.
Add a Qubes wrapper for this code that reads the suspicious document
from stdin and writes the pages to stdout. The on-wire format is the
same as the one that TrustedPDF uses.
It seems that there are at least two Python libraries with libmagic
support:
* PyPI: python-magic (https://pypi.org/project/python-magic/)
On Fedora it's `python3-magic`
* PyPI: filemagic (https://pypi.org/project/filemagic/)
On Fedora it's `python3-file-magic`
The first package corresponds to the `py3-magic` package on Alpine
Linux, and it's the one we install in the container. The second package
uses a different API, and it's the only one we can use on Qubes.
To make matters worse, we:
* Can't install the first package on Fedora, because it installs the
second under the hood:
https://bugzilla.redhat.com/show_bug.cgi?id=1899279
* Can't install the second package on Alpine Linux (untested), due to
Musl being used instead of libC:
https://stackoverflow.com/a/53936722
Ultimately, we need to support both, by trying the first API, and on
failure using the other API.
The files in `container/` no longer make sense to have that name since
the "document to pixels" part will run in Qubes OS in its own virtual
machine.
To adapt to this, this PR does the following:
- Moves all the files in `container` to `dangerzone/conversion`
- Splits the old `container/dangerzone.py` into its two components
`dangerzone/conversion/{doc_to_pixels,pixels_to_pdf}.py` with a
`common.py` file for shared functions
- Moves the Dockerfile to the project root and adapts it to the new
container code location
- Updates the CircleCI config to properly cache Docker images.
- Updates our install scripts to properly build Docker images.
- Adds the new conversion module to the container image, so that it can
be imported as a package.
- Adapts the container isolation provider to use the new way of calling
the code.
NOTE: We have made zero changes to the conversion code in this commit,
except for necessary imports in order to factor out some common parts.
Any changes necessary for Qubes integration follow in the subsequent
commits.
Update our GitHub Actions workflow with the following tests:
1. Build a .deb for Dangerzone on Debian Bookworm.
2. Install this .deb on every Debian-based platform that we support.
3. Test that the installed version runs successfully.
This way, we can be sure that .deb that we create on a single Debian
version (here we choose Debian Bookworm) works on all platforms.
Refs #358
When we run our Dangerzone environments through dev_scripts/env.py, we
use the Podman flag `--userns keep-id`. This option maps the UID in the
host to the *same* UID in the container. This way, the container can
access mounted files from the host.
The reason this works is because the user within the container has UID
1000, and the user in the host *typically* has UID 1000 as well. This
setup can break though if the user outside the host has a different UID.
For instance, the UID of the GitHub actions user that runs our CI
command is 1001.
To fix this, we need to always map the host user UID (whatever that is)
to container UID 1000. We can achieve this with the following mapping:
1000:0:1 # Map container UID 1000 to subordinate UID 0
# (sub UID 0 = owner of the user ns = host user UID)
0:1:1000 # Map container UIDs 0-999 to subordinate UIDs 1-1000
1001:1001:64536 # Map container UIDs 1001-65535 to subordinate UIDs 1001-65535
Refs #228
In Debian-based images, there are some Podman dependencies that are
marked as recommended, but are essential for rootless containers. These
dependencies will not be installed in our Dangerzone environments, due
to the `--no-install-recommends` flag.
Our approach was to find these dependencies through trial and error,
and hardcode them in our image. Turns out though that there are some
dependencies (e.g., `netavark`) that may be necessary in some Debian
flavors, and not others.
In order to not impact the readability of the env.py file, we prefer
installing Podman with all of its recommended packages. On one hand,
this will make the image size of our Debian-based Dangerzone
environments slightly larger, but on the other hand, it will make CI
tests less flaky.
Fix transient errors in Debian Bullseye CI tests by using a different
machine image (Ubuntu 22.04 vs Ubuntu 20.04), and solving some Podman
config issues along the way.
Fixes#388
Remove the Kurdish (Arabic) language ("kur_ara") from the list of
languages that we offer for OCR, since it's not included in the
installed languages.
Interestingly, it is not present in the Apline Linux repos as well, so
this was probably an omission in the first place.
Restore the OCR languages to the state they were in
66d3c40163, with some minor changes. We
can now do so because we download all the trained models, not just the
ones that Alpine Linux offers.
Grab Tesseract's trained models from GitHub, instead of from the Alpine
Linux repos. Over the past few months, the models in the Alpine Linux
repos did not remain stable, leading to CI issues.
Since the models are already pre-trained and available through
Tesseract's repo on GitHub, we can use the release tarball that they
offer to install them in the container image, which is basically what
the upstream packages are doing as well.
In order to make sure that we have no regressions, at the time of this
commit we ensured that the hashes of the models offered through the
Alpine Linux repos and the models offered from the GitHub release are
the same. Also, in order to detect future regressions or foul play, we
check the downloaded models against a known checksum. Given that these
models change every few years, updating the checksum should not be an
issue.
Fix#357
Ignore two CVEs from our security scans, which were triggered when
scanning the Dangerzone container image for v0.4.1. These CVEs do not
affect out users, and we offer an explanation why.
Add two GitHub Actions workflows, that perform the following checks:
* Security scan the Python dependencies of the Dangerzone application
(`poetry.lock`), for the current/main branch.
* Build and security scan the Dangerzone container image for the
current/main branch.
* Security scan the Python dependencies of the Dangerzone application
(`poetry.lock`), for the latest release of Dangerzone (currently
v0.4.1).
* Download and security scan the Dangerzone container image for the
latest release of Dangerzone (currently v0.4.1).
The first two checks will run on branch pushes, PRs, and nightly. The
last two checks will run only nightly, since the code in the current
branch cannot affect already released artifacts.
Also, besides the security scans, these workflows will also update the
Security alerts in the GitHub page for the Dangerzone project, and print
the SARIF report to the stdout, for debugging purposes.
Closes#222
Replace our reference to an Apple development certificate with a
Developer ID Application certificate. The former is not accepted during
the code notarization phase, whereas the latter is.
This release brings a split in the MacOS binaries, since we now have
separate ones for Intel and Apple Silicon architectures, so we must
reflect this in the README as well.
Remove any -rc identifiers (e.g., 0.4.1-rc3) from the Dangerzone
version, if it includes them. If we don't remove them, then building
the MSI for Windows will fail as follows:
error CNDL0108: The Product/@Version attribute's value, '0.4.1-rc3',
is not a valid version. Legal version values should look like
'x.x.x.x' where x is an integer from 0 to 65534.
Install the following packages in Dangerzone envs:
* python3-setuptools: We've seen that this package is necessary to build
the RPM package for Dangerzone. The error that we encountered was the
following:
* Deleting old build and dist
* Building RPM package
Traceback (most recent call last):
File "/home/user/dangerzone/setup.py", line 5, in <module>
import setuptools
ModuleNotFoundError: No module named 'setuptools'
Traceback (most recent call last):
File "/home/user/./dangerzone/install/linux/build-rpm.py", line 43, in <module>
main()
File "/home/user/./dangerzone/install/linux/build-rpm.py", line 30, in main
subprocess.run(
File "/usr/lib64/python3.11/subprocess.py", line 571, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'python3 setup.py bdist_rpm --requires='podman,python3-pyside2,python3-appdirs,python3-click,python3-pyxdg,python3-colorama'' returned non-zero exit status 1.
* fuse-overlayfs: In Ubuntu 22.10 (at least), we encountered the
following error when running Podman:
ERRO[0000] User-selected graph driver "overlay" overwritten by
graph driver "vfs" from database - delete libpod local files to
resolve
The `vfs` driver is much slower than the `overlayfs` storage driver,
so we need to fix this. The reason why we encounter this error is
explained in the Podman docs [1]:
[...] and is vfs for non-root users when fuse-overlayfs is not
available.
Normally, the `fuse-overlayfs` package would have been installed, but
we don't install it due to the `--no-install-recommends` flag, so we
install it manually.
[1]: https://docs.podman.io/en/latest/markdown/podman.1.html#storage-driver-value
In PR #378 ("container: Allow converting more document formats"), we
added support for the following MIME types:
* application/zip
* application/octet-stream
* application/x-ole-storage
* application/vnd.oasis.opendocument.spreadsheet-template
* application/vnd.oasis.opendocument.text-template
However, we forgot to add some tests for these MIME types in the repo.
In this commit, we add a file for each of these MIME types, to make sure
we have no regressions in the future.
The main use of safe mode [1] in LibreOffice is to run with a fresh user
profile, in case the default one got borked somehow. This is actually
not a concern of ours, since the user's profile is in the container and
is not persistent.
The main reason we want to preemptively run LibreOffice in safe mode is
to remove hardware acceleration capabilities. Whether hardware
acceleration actually works in a container is another question, but we
want to be extra sure.
[1]: https://help.libreoffice.org/latest/en-US/text/shared/01/profile_safe_mode.html
Remove the association between MIME types and export filters, because
LibreOffice is able to auto-detect them on its own. Instead, ask
LibreOffice to simply convert the document to a .pdf.
This association was cumbersome for yet another reason; there are MIME
types that may be associated with more than one file type. That's why
it's better to let LibreOffice decide the proper filter for the
conversion.
Our current understanding is that this change won't widen our attack
surface for the following reasons:
* The output filters for PDF documents are pretty specific, and we don't
affect the input filters somehow.
* The default behavior of LibreOffice on Alpine Linux is to disable
macros.
Closes#369
Due to a bump in our Python dependencies, we now install Mypy 1.1.1
instead of 0.982. This change triggered the following errors:
* Incompatible default for argument <a> (default has type
None, argument has type <t>):
Mypy further explains here that PEP 484 prohibits implicit Optional,
so we need to make these types explicit Optional.
* Unused "type: ignore" comment, use narrower [method-assign] instead of
[assignment]:
Mypy has specialized some of its lints, meaning that we should switch
to the newer variants.
Also, it detected several other small inconsistencies. We fix all of
these errors in this commit.