Commit graph

1158 commits

Author SHA1 Message Date
Alex Pyrgiotis
ce5adb33fd
Bump poetry dependencies 2024-03-13 21:03:15 +02:00
Alex Pyrgiotis
74c467eaf7
conversion: Do not let PyMuPDF print to stdout
PyMuPDF has some hardcoded log messages that print to stdout [1]. We don't
have a way to silence them, because they don't use the Python logging
infrastructure.

What we can do here is silence a particular call that's been creating
debug messages. For a long term solution, we have sent a PR to the
PyMuPDF team, and we will follow up there [2].

Fixes #700

[1]: https://github.com/freedomofpress/dangerzone/issues/700
[2]: https://github.com/pymupdf/PyMuPDF/pull/3137
2024-03-13 21:03:15 +02:00
Alex Pyrgiotis
be8e2aa36b
Allow setting the compression level of the image
There are times where we may want to build the container image for
testing, but compression takes too much time. If we don't plan to use
this image for production builds, we can specify instead a compression
level that is so low, that the image will be compressed instantly.

In this commit, we allow the user to specify the Gzip compression level,
and even set it to 0. The default will always be 9, so that we don't
make a mistake during release.
2024-03-13 21:03:13 +02:00
Alex Pyrgiotis
a31f3370d0
Capture missing logs in second-stage conversion
For a while now, we didn't get logs for the second-stage conversion
when using containers. Extend the code to log any captured output from
the second stage conversion, only if we run Dangerzone via our dev
entrypoint.

Note that the Qubes isolation provider was always logging output from
the second stage of the conversion.
2024-03-13 20:59:50 +02:00
deeplow
0449840ec3
dz.ConvertDev: do not teleport .pyc files
On Qubes the conversion in dev mode would fail when converting from a
Fedora 38 development qube via a Fedora 39 disposable qube. The reason
was that dz.ConvertDev was receiving `.pyc` files, which were compiled
for python 3.11 but running on python 3.12.

Unfortunately PyZipFile objects cannot send source python files, even
though the documentation is a little bit unclear on this [1].

Fixes #723

[1]: https://docs.python.org/3/library/zipfile.html#pyzipfile-objects
2024-03-13 07:13:39 +00:00
deeplow
297feab63d
Ctx mgr to ensure destuction of container-pip-deps.txt
The file container-pip-dependencies.txt was being left a directory when
building the docker image. This meant that it was being packaged when it
wasn't supposed to.

To avoid this, we remove file with the help from a context manager.

The change is minimal and the biggest part of the diff are indentation
changes.

Fixes #739
2024-03-05 17:54:34 +00:00
deeplow
4f08f99e93
Add release notes template
Simplifies the release announcement drafting by providing some
templates. It would have been preferable to be a .github config file,
but GitHub does not yet support content templates for release notes.
2024-03-05 14:48:37 +00:00
deeplow
41c48106fb
RELEASE.md: add check for verifying last-minute criticals 2024-03-05 14:46:05 +00:00
Alex Pyrgiotis
f75d471ec8
Fix OCR bug in Qubes Fedora 38 templates
Provide a fix for an OCR bug that affected Fedora 38 templates of Qubes
OS. In that specific configuration, the PyMuPDF version accepts the
Tesseract data directory only from the `TESSDATA_PREFIX` environment
variable. Our mistake was that we were setting this environment variable
in a dev script, instead of setting it for all configurations.

In this commit, we set an attribute in the fitz.fitz module, so that
both dev scripts and end-user installations can work. This is hacky, but
it targets an old PyMuPDF release after all, so we don't expect things
to break in the long run.

Fixes #737
2024-03-04 16:53:04 +02:00
Alex Pyrgiotis
d35eb56b4b
ci: Test Fedora 39 build instructions 2024-02-26 23:26:24 +02:00
deeplow
a5eb0a5f9d
README.md bump version to 0.6.0 2024-02-26 21:00:00 +02:00
Alex Pyrgiotis
f8984e4b49
Revert "README.md bump version to 0.6.0"
This reverts commit 2784260812.
2024-02-21 17:10:33 +02:00
Alex Pyrgiotis
5b6911af84
Properly add new file extensions
Accept `.svg` and `.bmp` files when browsing via the Dangerzone GUI.
Support for these extensions has already been added in the converter
code that runs in the sandbox (cd99122385)
but they were erroneously left out from the filter in the Dangerzone
main window.
2024-02-20 16:02:38 +02:00
Alex Pyrgiotis
e73f10f99b
Handle gracefully unknown error codes
Do not throw exceptions for unknown error codes. If
`get_proc_exception()` gets called from within an exception context and
raises an exception itself, then this exception will not get caught, and
it will get lost.

Prefer instead to return an exception class that we have for this
purpose, and show to the user the unknown error code of the converesion
process.
2024-02-20 16:00:35 +02:00
Alex Pyrgiotis
aeb8c33b6e
Update expected output for a QA scenario
Inform testers that the container code no longer returns "UNTRUSTED >"
strings in its output. Every string is trusted now, and the output will
be similar for container and Qubes isolation providers alike.
2024-02-20 16:00:35 +02:00
Alex Pyrgiotis
d376e1da00
tests: Adapt Qubes tests
Adapt Qubes tests to the addition of the conversion process in
doc_to_pixels() call.
2024-02-20 15:58:42 +02:00
Alex Pyrgiotis
bc55a64864
Appease lint checker 2024-02-20 15:55:46 +02:00
Alex Pyrgiotis
96cf5d0b4b
ci: Improve commit message lint
Improve the commit message check, by logging only the commit title, and
doing away with the extra spaces.
2024-02-20 15:55:45 +02:00
Alex Pyrgiotis
634523dac9
Get underlying error when conversion fails
When we get an early EOF from the converter process, we should
immediately get the exit code of that process, to find out the actual
underlying error. Currently, the exception we raise masks the underlying
error.

Raise a ConverterProcException, that in turns makes our error handling
code read the exit code of the spawned process, and converts it to a
helpful error message.

Fixes #714
2024-02-20 15:55:45 +02:00
Alex Pyrgiotis
6ee1d14c9a
Start conversion process earlier
Start the conversion process earlier, so that we have a reference to the
Popen object in case of an exception.
2024-02-20 15:55:45 +02:00
deeplow
e4a5dbce46
Don't show 50% duplicated progress info
50% would show twice in the conversion progress due to an overlap in
conversion progress values. The doc_to_pixels would be from 0-50% and
the pixels_to_pdf from 50%-100%.

This commit makes the first part go from 0 to 49% instead.

Fixes #715
2024-02-20 13:47:15 +00:00
deeplow
eb19926f9c
Update screenshots (hamburger menu + capitalization) 2024-02-20 13:45:38 +00:00
deeplow
2784260812
README.md bump version to 0.6.0 2024-02-20 13:45:38 +00:00
Alex Pyrgiotis
531a5bc96f
qa: Add extra actions in the Windows QA script 2024-02-19 17:13:57 +02:00
Alex Pyrgiotis
fd241e5964
qa: Consume stdin on Windows platforms
On Windows platforms, we can't consume the stdin using select(), because
it's not available for pipes [1]. We can instead consume it using some
native Windows calls.

[1]: From https://docs.python.org/3/library/select.html#select.select:

     "File objects on Windows are not acceptable, but sockets are. On
     Windows, the underlying select() function is provided by the
     WinSock library, and does not handle file descriptors that don’t
     originate from WinSock."
2024-02-19 17:13:57 +02:00
Etienne Perot
04508d9694
Check that image build was successful. 2024-02-19 15:37:50 +02:00
deeplow
e375624fdc
Bump Qubes Fedora on RELEASE.md
Fixes #712
2024-02-15 14:42:01 +00:00
deeplow
22ab6f65bf
Bump CodeQL upload action to V3 due to deprecation
The following warning was showing up in our conversion logs [1]:

| Warning: CodeQL Action v2 will be deprecated on December 5th, 2024.
| Please update all occurrences of the CodeQL Action in your workflow
| files to v3. For more information, see https://github.blog/changelog/2024-01-12-code-scanning-deprecation-of-codeql-action-v2/

[1]: https://github.com/freedomofpress/dangerzone/actions/runs/7916735564/job/21611227503?pr=718
2024-02-15 14:40:33 +00:00
deeplow
f569695bb0
CI: Prevent fixup / wip commits 2024-02-14 13:15:27 +00:00
deeplow
75f8d76c5b
Appease new version of black lint tool 2024-02-13 11:36:10 +00:00
deeplow
7168a4078a
Bump poetry dependencies 2024-02-13 11:36:09 +00:00
deeplow
d2065ea76e
FIXUP: add clang-dev contribution 2024-02-13 11:12:19 +00:00
deeplow
9ddb9734ea
Update changelog for v0.6.0 2024-02-13 11:12:19 +00:00
deeplow
832775f34e
Bump version to 0.6.0 2024-02-13 11:12:19 +00:00
deeplow
8f11156ce4
Deprecate Ubuntu Lunar Lobster (EOL)
Fixes #705
2024-02-13 11:07:11 +00:00
Alex Pyrgiotis
2703448d60
Update Jammy build instructions regarding conmon
Update the build instructions for Ubuntu Jammy regarding conmon, now
that oldstable-proposed-updates no longer offers a patched conmon
package. Propose instead to install conmon from our apt-tools-prod repo.
2024-02-13 12:33:57 +02:00
Alex Pyrgiotis
42c64569af
dev_scripts: Install conmon from our apt-tools-prod repo
Instead of installing a patched conmon version from the
oldstable-proposed-updates repo, install it from our apt-tools-prod
repo. This applies to just Ubuntu Jammy, since the rest of the platforms
don't have this problem.
2024-02-13 11:55:32 +02:00
Alex Pyrgiotis
0d7b6e8533
dev_scripts: Do not backport conmon in Bullseye
Now that the conmon package with version 2.0.25+ds1-1.1+deb11u1 has been
released [1] for Debian Bullseye, there is no need to install it from
the oldstable-proposed-updates repo any more.

[1]: https://tracker.debian.org/pkg/conmon
2024-02-13 11:26:15 +02:00
deeplow
3fb797cdd1
Temporarily pin PyMuPDF==1.23.8 in container
PyMuPDF 1.23.9 swapped the new fitz implementation (fitz_new)
with the fitz module. In the new module there are prints in the code
that interfere with our stdout for sending JSON from the container.
Pinning the version seems to have no adverse consequences [1], since
fitz_old hasn't had significant changes and it gives breathing room for
the print-related issue to be tackled in PR [2].

Fixes temporarily #700

[1]: https://github.com/freedomofpress/dangerzone/issues/700#issuecomment-1938357651
[2]: https://github.com/pymupdf/PyMuPDF/pull/3137
2024-02-12 11:37:46 +00:00
deeplow
879fca6f9f
Remove uneeded TESSDATA_PREFIX setting in container
The container image does not need the TESSDATA_PREFIX env variable since
its PyMuPDF version is new enough to support `tessdata` as an argument
when calling the PyMuPDF tesseract method.
2024-02-07 13:14:08 +00:00
deeplow
6006beeb03
Fix OCR on Qubes: PyMuPDF required TESSDATA_PREFIX
PyMuPDF versions lower than 1.22.5 pass the tesseract data path as
an argument to `pixmap.pdfocr_tobytes()` [1], but lower versions require
setting instead the TESSDATA_PREFIX environment variable [2].

Because on Qubes the pixels to pdf conversion happens on the host and
Qubes has a lower PyMuPDF package version, we need to pass instead via
environment variable.

NOTE: the TESSDATA_PREFIX env. variable was set in dangerzone-cli
instead of closer to the calling method in `doc_to_pixels.py` since
PyMuPDF reads this variable as soon as the fitz module is imported
[3][4].

[1]: https://pymupdf.readthedocs.io/en/latest/pixmap.html#Pixmap.pdfocr_tobytes
[2]: https://pymupdf.readthedocs.io/en/latest/installation.html#enabling-integrated-ocr-support
[3]: https://github.com/pymupdf/PyMuPDF/discussions/2439
[4]: https://github.com/pymupdf/PyMuPDF/blob/5d6a7db/src/__init__.py#L159

Fixes #682
2024-02-07 13:13:10 +00:00
Alex Pyrgiotis
d1afe4c30a
Fix Podman crashes due to old conmon version
Switching from mounting files to writing to stdout has introduced some
Podman crashes in specific environments (Ubuntu Jammy / Debian Bullseye)
due to a conmon bug that affects version 2.0.25.

Fixing it for various permutations of the environments we support
requires the following:

1. CI tests: Install conmon from the oldstable-proposed-updates in
   our Debian Bullseye / Ubuntu Jammy dev/end-user environments.
2. Developers: Add a line in BUILD.md that suggests users to install
   conmon from the oldstable-proposed-updates repo, or some other repo
   they prefer.
3. End-user installations: We will build conmon for Ubuntu Jammy, and
   wait until the proposed updates repo gets merged in Debian Bullseye.

Fixes #685
2024-02-07 12:53:15 +00:00
deeplow
8a32d80762
Remove leftover progress variable in pixels_to_pdf
Since the progress information is now inferred on host based on the
number of pages obtained, progress-tracking variables should be removed
from the server.
2024-02-06 20:11:52 +00:00
deeplow
69c2a02d81
Remove timeouts
Remove timeouts due to several reasons:

1. Lost purpose: after implementing the containers page streaming the
   only subprocess we have left is LibreOffice. So don't have such a
   big risk of commands hanging (the original reason for timeouts).

2. Little benefit: predicting execution time is generically unsolvable
   computer science problem. Ultimately we were guessing an arbitrary
   time based on the number of pages and the document size. As a guess
   we made it pretty lax (30s per page or MB). A document hanging for
   this long will probably lead to user frustration in any case and the
   user may be compelled to abort the conversion.

3. Technical Challenges with non-blocking timeout: there have been
several technical challenges in keeping timeouts that we've made effort
to accommodate. A significant one was having to do non-blocking read to
ensure we could timeout when reading conversion stream (and then used
here)

Fixes #687
2024-02-06 20:11:43 +00:00
deeplow
4d3f2b32c7
Revert "Add Stopwatch implementation"
This reverts commit 344d6f7bfa.
Stopwatch is no longer needed now that we're removing timeouts.
2024-02-06 19:42:42 +00:00
deeplow
f31374e33c
Revert "Add non-blocking read utility"
This reverts commit fea193e935.

This is part of the purge of timeout-related code since we no longer
need it [1]. Non-blocking reads were introduced in the reverted commit
in order to be able to cut a stream mid-way due to a timeout. This is
no longer needed now that we're getting rid of timeouts.

[1]: https://github.com/freedomofpress/dangerzone/issues/687
2024-02-06 19:42:41 +00:00
deeplow
07dd54cd13
Fix hanging: disable container logging
The conversion was hanging arbitrarily [1] on some systems. Sometimes it
would send the full page other times stop half-way.

Originally found by @apyrgio.

Co-authored-by: @apyrgio

[1]: https://github.com/freedomofpress/dangerzone/pull/627#issuecomment-1892491968
2024-02-06 19:42:41 +00:00
deeplow
f3032a7142
Make big endian explicit in int to bytes
Fix issues in older distros that don't yet support python 3.11 where
endianness was not a default argument [1]. This is in response to CI
failures [2].

[1]: https://docs.python.org/3/library/stdtypes.html#int.to_bytes
[2]: https://app.circleci.com/pipelines/github/freedomofpress/dangerzone/2186/workflows/e340ca21-85ce-42b6-9bc3-09e66f96684a/jobs/27380y
2024-02-06 19:42:41 +00:00
deeplow
5e169a832b
Bump CI macOS python version to 3.11
Attempt to fix missing issue installing poetry [1].

[1]: https://github.com/freedomofpress/dangerzone/actions/runs/7487413482/job/20379748604?pr=627
2024-02-06 19:42:41 +00:00
deeplow
1835756b45
Allow each conversion to have its own proc
If we increased the number of parallel conversions, we'd run into an
issue where the streams were getting mixed together. This was because
the Converter.proc was a single attribute. This breaks it down into a
local variable such that this mixup doesn't happen.
2024-02-06 19:42:41 +00:00