dangerzone/dangerzone/conversion
deeplow 6006beeb03
Fix OCR on Qubes: PyMuPDF required TESSDATA_PREFIX
PyMuPDF versions lower than 1.22.5 pass the tesseract data path as
an argument to `pixmap.pdfocr_tobytes()` [1], but lower versions require
setting instead the TESSDATA_PREFIX environment variable [2].

Because on Qubes the pixels to pdf conversion happens on the host and
Qubes has a lower PyMuPDF package version, we need to pass instead via
environment variable.

NOTE: the TESSDATA_PREFIX env. variable was set in dangerzone-cli
instead of closer to the calling method in `doc_to_pixels.py` since
PyMuPDF reads this variable as soon as the fitz module is imported
[3][4].

[1]: https://pymupdf.readthedocs.io/en/latest/pixmap.html#Pixmap.pdfocr_tobytes
[2]: https://pymupdf.readthedocs.io/en/latest/installation.html#enabling-integrated-ocr-support
[3]: https://github.com/pymupdf/PyMuPDF/discussions/2439
[4]: https://github.com/pymupdf/PyMuPDF/blob/5d6a7db/src/__init__.py#L159

Fixes #682
2024-02-07 13:13:10 +00:00
..
__init__.py Restructure container code 2023-06-21 11:44:47 +03:00
common.py Remove timeouts 2024-02-06 20:11:43 +00:00
doc_to_pixels.py Remove leftover progress variable in pixels_to_pdf 2024-02-06 20:11:52 +00:00
errors.py Allow each conversion to have its own proc 2024-02-06 19:42:41 +00:00
pixels_to_pdf.py Fix OCR on Qubes: PyMuPDF required TESSDATA_PREFIX 2024-02-07 13:13:10 +00:00