Commit graph

1365 commits

Author SHA1 Message Date
Alex Pyrgiotis
d410c49c75
Biting the Debian bullet 2024-10-08 16:01:09 +03:00
Alex Pyrgiotis
a22dcd15cd
FIXUP: Debian fixes 2024-10-08 16:01:09 +03:00
Alex Pyrgiotis
17bed1a724
FIXUP: Moar CI fixes 2024-10-08 16:01:09 +03:00
Alex Pyrgiotis
66af7bce59
FIXUP: Fix linger tests 2024-10-08 16:01:09 +03:00
Alex Pyrgiotis
1472dca744
FIXUP: Add missing tessdata for Linux 2024-10-08 16:01:09 +03:00
Alex Pyrgiotis
8539c97421
FIXUP: At this point, I'm just rambling 2024-10-08 16:01:09 +03:00
Alex Pyrgiotis
f6f9ff16e1
FIXUP: Minor caching improvement 2024-10-08 16:01:09 +03:00
Alex Pyrgiotis
e0878e489a
Vendor PyMuPDF 2024-10-08 16:01:09 +03:00
Alex Pyrgiotis
f68721637c
FIXUP: Remove stale code for PyMuPDF < 1.22.5 2024-10-08 16:01:08 +03:00
Alex Pyrgiotis
0d80cf1f0c
WIP: Fix Windows and macOS CI 2024-10-08 16:01:08 +03:00
Alex Pyrgiotis
288c0715ac
FIXUP: Use single log message per page 2024-10-08 16:01:08 +03:00
Alex Pyrgiotis
cd4e5d2136
FIXUP: Remove stale method from dummy provider 2024-10-08 16:01:08 +03:00
Alex Pyrgiotis
981056192c
FIXUP: Remove extra tessdata arg 2024-10-08 16:01:08 +03:00
Alex Pyrgiotis
4250c6a64f
FIXUP: Minor rename 2024-10-08 16:01:08 +03:00
Alex Pyrgiotis
8e6e4a3b44
FIXUP: Remove dead code 2024-10-08 16:01:08 +03:00
Alex Pyrgiotis
92ca4b172f
FIXUP: Handle different PyMuPDF versions 2024-10-08 16:01:08 +03:00
Alex Pyrgiotis
1f4dd1d71a
FIXUP: Let the RPM autodetect the PyMuPDF requirement 2024-10-08 16:01:08 +03:00
Alex Pyrgiotis
dd2cfe6ecf
Update build instructions 2024-10-08 16:01:08 +03:00
Alex Pyrgiotis
1f7dc2cf75
Update .deb/.rpm dependencies
Update .deb/.rpm specs to include PyMuPDF as a required package.
2024-10-08 16:01:08 +03:00
Alex Pyrgiotis
531a357491
Remove dead code 2024-10-08 16:01:08 +03:00
Alex Pyrgiotis
2b4b89a155
Update the way we get debug logs
Move the logic for grabbing debug logs to a new place, now that we have
merged the two conversion stages (doc to pixels, pixels to PDF).
2024-10-08 16:01:08 +03:00
Alex Pyrgiotis
137f21da8d
Perform on-host pixels to PDF conversion
Extend the base isolation provider to immediately convert each page to
a PDF, and optionally use OCR. In contract with the way we did things
previously, there are no more two separate stages (document to pixels,
pixels to PDF). We now handle each page individually, for two main
reasons:

1. We don't want to buffer pixel data, either on disk or in memory,
   since they take a lot of space, and can potentially leave traces.
2. We can perform these operations in parallel, saving time. This is
   more evident when OCR is not used, where the time to convert a page
   to pixels, and then back to a PDF are comparable.
2024-10-08 16:01:08 +03:00
Alex Pyrgiotis
cde8ee70bb
Make PyMuPDF a main Dangerzone dependency
The PyMuPDF package was previously mainly used within the Dangerzone
container, as well as on Qubes. With on-host conversion, PyMuPDF will be
used in all supported platforms by default. For this reason, we can
promote it to a main dependency.
2024-10-08 16:01:08 +03:00
Alex Pyrgiotis
fc977da964
Add new way to detect tessdata dir
Add a new way to detect where the Tesseract data are stored in a user's
system. On Linux, the Tesseract data should be installed via the package
manager. On macOS and Windows, they should be bundled with the
Dangerzone application.

There is also the exception of running Dangerzone locally, where even
on Linux, we should get the Tesseract data from the Dangerzone share/
folder.
2024-10-08 16:01:08 +03:00
Alex Pyrgiotis
9d2b2b2a47
Add script for downloading Tesseract data
Add a Python script that can run in all supported platforms, and can
download and extract the Tesseract language data from GitHub, while
also:

1. Checking that the expected hash matches.
2. Informing the user if the language data have already been downloaded.
3. Extracting only the subset of language data that Dangerzone needs
2024-10-08 16:01:08 +03:00
Alex Pyrgiotis
6fd0f925a8
FIXUP: Fix a lint 2024-10-08 13:34:33 +03:00
Alex Pyrgiotis
30b4f24d77
FIXUP: Use the proper pip argument 2024-10-08 13:34:33 +03:00
Alex Pyrgiotis
e027d853c2
FIXUP: Implement review comments 2024-10-08 13:34:33 +03:00
Alex Pyrgiotis
07921566ba
FIXUP: Make Dockerfile work with latest wheels 2024-10-08 13:34:33 +03:00
Alex Pyrgiotis
eef4e8b548
debian: Vendor PyMuPDf when building Debian package
Install PyMuPDF under ./dangerzone/vendor, right before we build the
.deb package. We vendor PyMuPDF just for Debian, since the provided
versions don't have OCR support enabled.

Currently, we don't use PyMuPDf on the host, but this will change once
we fully implement the on-host conversion feature.

Refs #625
2024-10-08 13:34:32 +03:00
Alex Pyrgiotis
ed55124a8b
Add an import preference for vendored packages
Prefer importing packages from ./dangerzone/vendor, if there is one
there, instead of using the system ones.
2024-10-08 13:34:32 +03:00
Alex Pyrgiotis
f61097e9b3
install: Add script for vendoring PyMuPDF
Add a script that installs PyMuPDF under ./dangerzone/vendor. This will
be useful in subsequent commits, for vendoring PyMuPDF when building
Debian packages.
2024-10-08 13:34:32 +03:00
Alex Pyrgiotis
c22f945614
dev_scripts: Install pip in dev environments
Install pip in dev environments, so that we can use it to vendor
PyMuPDf in subsequent commits.
2024-10-08 13:34:32 +03:00
Alex Pyrgiotis
892dfaf1bc
Bump our Poetry dependencies 2024-10-08 13:34:32 +03:00
Alex Pyrgiotis
00711fa9e2
Add missing .pybuild dir in .gitignore 2024-10-08 13:34:32 +03:00
Alex Pyrgiotis
93b960cd23
Bump H2ORestart to version 0.6.6
Follow Debian's lead [1] and bump this version to 0.6.6. This change
should bring some stability improvements to our CI tests as well.

[1]: https://packages.debian.org/unstable/text/libreoffice-h2orestart
2024-10-07 18:36:06 +03:00
bnewc
752eff02d8
Prevent user from using illegal characters in output filename
Add some checks in the Dangerzone GUI and CLI that will prevent a user
from mistakenly adding illegal characters in the output filename.
2024-10-07 18:04:47 +03:00
Alex Pyrgiotis
275189587e
tests: Test termination logic under default conditions
Do not use the `provider_wait` fixture in our termination logic tests,
and switch instead to the `provider` fixture, which instantiates a
typical isolation provider.

The `provider_wait` fixture's goal was to emulate how would the process
behave if it had fully spawned. In practice, this masked some
termination logic issues that became apparent in the WIP on-host
conversion PR. Now that we kill the spawned process via its process
group, we can just use the default isolation provider in our tests.

In practice, in this PR we just do `s/provider_wait/provider`, and
remove some stale code.
2024-10-07 17:37:57 +03:00
Alex Pyrgiotis
b5130b08b6
tests: Improve Dummy provider tests
Add a fixture that returns our stock Dummy provider. Also, explicitly
use a blocking Dummy provider (`DummyWait`) for a specific test case.
This will prove useful when we stop using the `provider_wait` variant of
our isolation providers in the next commits.
2024-10-07 17:37:42 +03:00
Alex Pyrgiotis
dc8a22c8e7
Fix the dummy provider
Make the dummy provider behave a bit more like the other providers, with
a proper function and termination logic. This will be helpful soon in
the tests.
2024-10-07 17:37:42 +03:00
Alex Pyrgiotis
d6410652cb
Kill the process group when conversion terminates
Instead of killing just the invoked Podman/Docker/qrexec process, kill
the whole process group, to make sure that other components that have
been spawned die as well. In the case of Podman, conmon is one of the
processes that lingers, so that's one way to kill it.
2024-10-07 17:37:39 +03:00
Alex Pyrgiotis
b9a3dd63ad
Always start conversion process in new session
Start the conversion process in a new session, so that we can later on
kill the process group, without killing the controlling script (i.e.,
the Dangezone UI). This should not affect the conversion process in any
other way.
2024-10-07 17:27:38 +03:00
Alex Pyrgiotis
8d856ff4c3
ci: Add Intel macOS runner
GitHub provides an Intel macOS runner as `macos-13`. Add it alongside
our M1 macOS runner (`macos-latest`), in order to cover all of our
target environments.
2024-10-07 12:48:03 +03:00
Alex Pyrgiotis
95660c3ec7
Make dummy tests faster
Remove the unnecessary sleep command in our dummy tests, which made them
run much slower.
2024-10-07 12:48:03 +03:00
Alex Pyrgiotis
58b4659ffd
Improve .gitattributes
It seems that we need to specify that Python files have LF line endings
on Windows environments, else they will get converted to CRLF. If this
happens, then the container image we build in this environment will have
Python files with wrong endings, and tests will break.

Refs #838 for previous attempt.
2024-10-07 12:48:02 +03:00
Alex Pyrgiotis
a001b5497c
Add release note for Debian packages 2024-10-02 16:49:46 +02:00
Alex Pyrgiotis
eb2d114ea7
install: Catch version errors when building DEBs
Make sure that the Debian package we build conforms to the expected
naming scheme else, it's possible that something is off. A scenario
we've encountered is bumping `share/version.txt`, but not
`debian/changelog`, which would create a Debian package with an older
version.
2024-10-02 16:49:46 +02:00
Alex Pyrgiotis
a32522f6c8
debian: Bump version to 0.7.1
Add a dummy entry in debian/changelog, to signal that the latest
Dangerzone version is 0.7.1.
2024-10-02 16:49:46 +02:00
Alexis Métaireau
025e5dda51
Switch from CircleCI runners to Github actions.
As part of this change, the dev (build) and end-user test images names
changed from `dangerzone.rocks/*` to `ghcr.io`.

A new `--sync` option is provided in the `env.py` command, in order to
retrieve the images from the registry, or build and upload otherwise.
2024-10-02 16:47:58 +02:00
Alexis Métaireau
3e434d08d1
Always use our own seccomp policy as a default.
As per Etienne Perot's comment on #908:

> Then it seems to me like it would be easy to simply apply this seccomp
profile under all container runtimes (since there's no reason why the
same image and the same command-line would call different syscalls under
different container runtimes).
2024-10-02 14:12:48 +02:00