This is useful to reduce the computation time when creating PDF visual
diffs. Here is a comparison of the same operation using python arrays
and numpy arrays + lookups:
Python arrays:
```
diff took 5.094218431997433 seconds
diff took 3.1553626069980965 seconds
diff took 3.3721952960004273 seconds
diff took 3.2134646750018874 seconds
diff took 3.3410625500000606 seconds
diff took 3.2893160990024626 seconds
```
Numpy:
```
diff took 0.13705662599750212 seconds
diff took 0.05698924000171246 seconds
diff took 0.15319590600120137 seconds
diff took 0.06126453700198908 seconds
diff took 0.12916332699751365 seconds
diff took 0.05839455900058965 seconds
Unpin the PyMuPDF package that we vendor in our Debian packages. We
originally pinned it to version 1.24.11, because it was the last version
that supported Ubuntu Focal, but we can now unpin it, since we have
dropped Ubuntu Focal support.
Fixes#1018
Remove the Shiboken6 pin for our Linux and macOS platforms, since a new
upstream package has been released, that has wheels for every platform.
Also, remove the `sed` command from our dangerzone.spec, whose purpose
was to nullify this pin for our Fedora packages.
Fixes#1061
Remove our suggestions for not using the container cache, which stemmed
from the fact that our Dangerzone image was not reproducible. Now that
we have switched to Debian Stable and the Dockerfile is all we need to
reproducibly build the exact same container image, we can just use the
cache to speed up builds.
Add the doit automation tool in our `pyproject.toml` and `poetry.lock`
file as a package-related dependency, since we don't want to ship it to
our end users.
It seem that these tests are flaky, and as a result our CI pipeline is
failing from time to time. This will rerun it automatically when there
is an error.
See https://github.com/freedomofpress/dangerzone/issues/968 for more
information
The PyMuPDF package was previously mainly used within the Dangerzone
container, as well as on Qubes. With on-host conversion, PyMuPDF will be
used in all supported platforms by default. For this reason, we can
promote it to a main dependency.
This removes the need to build the PyMuPDF project by ourselves, but
only when on non-ARM architectures since the wheels for these are not
provided yet.
Changes the `Dockerfile` and `build-image.py` script, introducing a new
`ARCH` flag to conditionally build the wheels.
The minimum python version when installing from source is now python
3.9, as Pyside6 6.7.1 dropped support for python 3.8 (see #780 for more
information).
On Debian-derivatives distributions, the minimum Python version is now
set to 3.8. In practice, because Pyside6 is not packaged for Debian, we
use Pyside2 [0], which is why we can relax the python version requirement.
In practice, when installing from source on an environment where
python3.9 is not the default python, poetry will look for it and use it
if available
> For various reasons, this Python version might not be compatible with
> the python range supported by the project. In this case, Poetry will
> try to find one that is and use it.
>
> [Poetry docs](https://python-poetry.org/docs/managing-environments/)
On Ubuntu Focal (20.04) where Python 3.9 is not installed by default,
it is possible to install it using the `python3.9` package.
Additionally, In version 1.24.3, PyMuPDF changed its package name from `fitz`
to `pymupdf` [2], resulting in a breakage on how it is installed in our
container. This is now fixed.
[0] More information on how Pyside6 packaging affects dangerzone on #221
[1] See [the current status of Pyside6 packaging](https://repology.org/
project/python:pyside6/packages)
[2] PyMuPDF changelog: https://pymupdf.readthedocs.io/en/latest/changes.html#change-log
The problem (MuPDF C++ bindings generation breakage) was
apparently caused by a recent libclang update on pypi, and
fixed in the 1.24.0 release[1].
Fixes#750
[1]: https://github.com/pymupdf/PyMuPDF/issues/3279
PyMuPDF 1.23.9 swapped the new fitz implementation (fitz_new)
with the fitz module. In the new module there are prints in the code
that interfere with our stdout for sending JSON from the container.
Pinning the version seems to have no adverse consequences [1], since
fitz_old hasn't had significant changes and it gives breathing room for
the print-related issue to be tackled in PR [2].
Fixes temporarily #700
[1]: https://github.com/freedomofpress/dangerzone/issues/700#issuecomment-1938357651
[2]: https://github.com/pymupdf/PyMuPDF/pull/3137
Fedora 39 ships with Python 3.12 by default, which Dangerzone previously
did not support due to limitations from the PySide6 package. Now that
the PySide6 package has been updated to 6.6.1, and the limitation has
lifted, we should to reflect this in pyproject.toml.
Make Poetry include data files only in the source distribution, and not
on our wheels. This mainly makes RPM packaging a bit easier, but does
not solve the problem of how to install files to
`/usr/share/dangerzone`.
Also, include files using globs, which is the way Poetry prefers.
Fixes#678
Refs #677
Ensure that when the container image is installing pymupdf (unavailable
in the repos) with verified hashes. To do so, it has the pymupdf
dependency declared in a "container" group in `pyproject.toml`, which
then gets exported into a requirements.txt, which is then used for
hash-verification when building the container.
Because this required modifying the container image build scripts, they
were all merged to avoid duplicate code. This was an overdue change
anyways.
Adding PyMuPDF essentially make the code much simpler since it can do
everything that we'd need multiple programs for. It also includes
tesseract-OCR integration, which this commit makes use of.
By using `--skip / --extend-skip .gitignore`, we actually never read the
.gitignore file. We have to use `--skip-gitignore` instead.
This requires Git in the development environment, so we need to install
Git in our CI runners as well.