Commit graph

1381 commits

Author SHA1 Message Date
Alex Pyrgiotis
e5fd1e91d5
FIXUP: Restore previous behavior 2024-10-08 17:05:59 +03:00
Alex Pyrgiotis
4dcca91a05
Revert "WIP: Check newest PyMuPDF wheels"
This reverts commit 3d4bea8ee7.
2024-10-08 17:03:06 +03:00
Alex Pyrgiotis
05090ada33
FIXUP: Grab just the tests 2024-10-08 16:45:19 +03:00
Alex Pyrgiotis
3d4bea8ee7
WIP: Check newest PyMuPDF wheels 2024-10-08 16:01:09 +03:00
Alex Pyrgiotis
130ba92e34
WIP: Remove all provider wait lines 2024-10-08 16:01:09 +03:00
Alex Pyrgiotis
e749e92425
FIXUP: Fix dummy script 2024-10-08 16:01:09 +03:00
Alex Pyrgiotis
48bd4c651b
FIXUP: Fix the dummy provider 2024-10-08 16:01:09 +03:00
Alex Pyrgiotis
1d9e0575a5
FIXU: Lint 2024-10-08 16:01:09 +03:00
Alex Pyrgiotis
76ff961c78
fixup! Drop unecessary temp files 2024-10-08 16:01:09 +03:00
Alex Pyrgiotis
78987ace8f
Better check for unhandled exceptions 2024-10-08 16:01:09 +03:00
Alex Pyrgiotis
c810ad5642
Drop unecessary temp files 2024-10-08 16:01:09 +03:00
Alex Pyrgiotis
76b0de4169
WIP: Make the dummy provider less... dummy 2024-10-08 16:01:09 +03:00
Alex Pyrgiotis
f9dfec6112
Remove mount-related faq in README 2024-10-08 16:01:09 +03:00
Alex Pyrgiotis
3343149486
FIXUP: Lint 2024-10-08 16:01:09 +03:00
Alex Pyrgiotis
8aebc44fd0
FIXUP: Lint 2024-10-08 16:01:09 +03:00
Alex Pyrgiotis
27925d24a4
FIXUP: Make Ubuntu Focal work 2024-10-08 16:01:09 +03:00
Alex Pyrgiotis
d410c49c75
Biting the Debian bullet 2024-10-08 16:01:09 +03:00
Alex Pyrgiotis
a22dcd15cd
FIXUP: Debian fixes 2024-10-08 16:01:09 +03:00
Alex Pyrgiotis
17bed1a724
FIXUP: Moar CI fixes 2024-10-08 16:01:09 +03:00
Alex Pyrgiotis
66af7bce59
FIXUP: Fix linger tests 2024-10-08 16:01:09 +03:00
Alex Pyrgiotis
1472dca744
FIXUP: Add missing tessdata for Linux 2024-10-08 16:01:09 +03:00
Alex Pyrgiotis
8539c97421
FIXUP: At this point, I'm just rambling 2024-10-08 16:01:09 +03:00
Alex Pyrgiotis
f6f9ff16e1
FIXUP: Minor caching improvement 2024-10-08 16:01:09 +03:00
Alex Pyrgiotis
e0878e489a
Vendor PyMuPDF 2024-10-08 16:01:09 +03:00
Alex Pyrgiotis
f68721637c
FIXUP: Remove stale code for PyMuPDF < 1.22.5 2024-10-08 16:01:08 +03:00
Alex Pyrgiotis
0d80cf1f0c
WIP: Fix Windows and macOS CI 2024-10-08 16:01:08 +03:00
Alex Pyrgiotis
288c0715ac
FIXUP: Use single log message per page 2024-10-08 16:01:08 +03:00
Alex Pyrgiotis
cd4e5d2136
FIXUP: Remove stale method from dummy provider 2024-10-08 16:01:08 +03:00
Alex Pyrgiotis
981056192c
FIXUP: Remove extra tessdata arg 2024-10-08 16:01:08 +03:00
Alex Pyrgiotis
4250c6a64f
FIXUP: Minor rename 2024-10-08 16:01:08 +03:00
Alex Pyrgiotis
8e6e4a3b44
FIXUP: Remove dead code 2024-10-08 16:01:08 +03:00
Alex Pyrgiotis
92ca4b172f
FIXUP: Handle different PyMuPDF versions 2024-10-08 16:01:08 +03:00
Alex Pyrgiotis
1f4dd1d71a
FIXUP: Let the RPM autodetect the PyMuPDF requirement 2024-10-08 16:01:08 +03:00
Alex Pyrgiotis
dd2cfe6ecf
Update build instructions 2024-10-08 16:01:08 +03:00
Alex Pyrgiotis
1f7dc2cf75
Update .deb/.rpm dependencies
Update .deb/.rpm specs to include PyMuPDF as a required package.
2024-10-08 16:01:08 +03:00
Alex Pyrgiotis
531a357491
Remove dead code 2024-10-08 16:01:08 +03:00
Alex Pyrgiotis
2b4b89a155
Update the way we get debug logs
Move the logic for grabbing debug logs to a new place, now that we have
merged the two conversion stages (doc to pixels, pixels to PDF).
2024-10-08 16:01:08 +03:00
Alex Pyrgiotis
137f21da8d
Perform on-host pixels to PDF conversion
Extend the base isolation provider to immediately convert each page to
a PDF, and optionally use OCR. In contract with the way we did things
previously, there are no more two separate stages (document to pixels,
pixels to PDF). We now handle each page individually, for two main
reasons:

1. We don't want to buffer pixel data, either on disk or in memory,
   since they take a lot of space, and can potentially leave traces.
2. We can perform these operations in parallel, saving time. This is
   more evident when OCR is not used, where the time to convert a page
   to pixels, and then back to a PDF are comparable.
2024-10-08 16:01:08 +03:00
Alex Pyrgiotis
cde8ee70bb
Make PyMuPDF a main Dangerzone dependency
The PyMuPDF package was previously mainly used within the Dangerzone
container, as well as on Qubes. With on-host conversion, PyMuPDF will be
used in all supported platforms by default. For this reason, we can
promote it to a main dependency.
2024-10-08 16:01:08 +03:00
Alex Pyrgiotis
fc977da964
Add new way to detect tessdata dir
Add a new way to detect where the Tesseract data are stored in a user's
system. On Linux, the Tesseract data should be installed via the package
manager. On macOS and Windows, they should be bundled with the
Dangerzone application.

There is also the exception of running Dangerzone locally, where even
on Linux, we should get the Tesseract data from the Dangerzone share/
folder.
2024-10-08 16:01:08 +03:00
Alex Pyrgiotis
9d2b2b2a47
Add script for downloading Tesseract data
Add a Python script that can run in all supported platforms, and can
download and extract the Tesseract language data from GitHub, while
also:

1. Checking that the expected hash matches.
2. Informing the user if the language data have already been downloaded.
3. Extracting only the subset of language data that Dangerzone needs
2024-10-08 16:01:08 +03:00
Alex Pyrgiotis
6fd0f925a8
FIXUP: Fix a lint 2024-10-08 13:34:33 +03:00
Alex Pyrgiotis
30b4f24d77
FIXUP: Use the proper pip argument 2024-10-08 13:34:33 +03:00
Alex Pyrgiotis
e027d853c2
FIXUP: Implement review comments 2024-10-08 13:34:33 +03:00
Alex Pyrgiotis
07921566ba
FIXUP: Make Dockerfile work with latest wheels 2024-10-08 13:34:33 +03:00
Alex Pyrgiotis
eef4e8b548
debian: Vendor PyMuPDf when building Debian package
Install PyMuPDF under ./dangerzone/vendor, right before we build the
.deb package. We vendor PyMuPDF just for Debian, since the provided
versions don't have OCR support enabled.

Currently, we don't use PyMuPDf on the host, but this will change once
we fully implement the on-host conversion feature.

Refs #625
2024-10-08 13:34:32 +03:00
Alex Pyrgiotis
ed55124a8b
Add an import preference for vendored packages
Prefer importing packages from ./dangerzone/vendor, if there is one
there, instead of using the system ones.
2024-10-08 13:34:32 +03:00
Alex Pyrgiotis
f61097e9b3
install: Add script for vendoring PyMuPDF
Add a script that installs PyMuPDF under ./dangerzone/vendor. This will
be useful in subsequent commits, for vendoring PyMuPDF when building
Debian packages.
2024-10-08 13:34:32 +03:00
Alex Pyrgiotis
c22f945614
dev_scripts: Install pip in dev environments
Install pip in dev environments, so that we can use it to vendor
PyMuPDf in subsequent commits.
2024-10-08 13:34:32 +03:00
Alex Pyrgiotis
892dfaf1bc
Bump our Poetry dependencies 2024-10-08 13:34:32 +03:00