Commit graph

1122 commits

Author SHA1 Message Date
Alex Pyrgiotis
42c64569af
dev_scripts: Install conmon from our apt-tools-prod repo
Instead of installing a patched conmon version from the
oldstable-proposed-updates repo, install it from our apt-tools-prod
repo. This applies to just Ubuntu Jammy, since the rest of the platforms
don't have this problem.
2024-02-13 11:55:32 +02:00
Alex Pyrgiotis
0d7b6e8533
dev_scripts: Do not backport conmon in Bullseye
Now that the conmon package with version 2.0.25+ds1-1.1+deb11u1 has been
released [1] for Debian Bullseye, there is no need to install it from
the oldstable-proposed-updates repo any more.

[1]: https://tracker.debian.org/pkg/conmon
2024-02-13 11:26:15 +02:00
deeplow
3fb797cdd1
Temporarily pin PyMuPDF==1.23.8 in container
PyMuPDF 1.23.9 swapped the new fitz implementation (fitz_new)
with the fitz module. In the new module there are prints in the code
that interfere with our stdout for sending JSON from the container.
Pinning the version seems to have no adverse consequences [1], since
fitz_old hasn't had significant changes and it gives breathing room for
the print-related issue to be tackled in PR [2].

Fixes temporarily #700

[1]: https://github.com/freedomofpress/dangerzone/issues/700#issuecomment-1938357651
[2]: https://github.com/pymupdf/PyMuPDF/pull/3137
2024-02-12 11:37:46 +00:00
deeplow
879fca6f9f
Remove uneeded TESSDATA_PREFIX setting in container
The container image does not need the TESSDATA_PREFIX env variable since
its PyMuPDF version is new enough to support `tessdata` as an argument
when calling the PyMuPDF tesseract method.
2024-02-07 13:14:08 +00:00
deeplow
6006beeb03
Fix OCR on Qubes: PyMuPDF required TESSDATA_PREFIX
PyMuPDF versions lower than 1.22.5 pass the tesseract data path as
an argument to `pixmap.pdfocr_tobytes()` [1], but lower versions require
setting instead the TESSDATA_PREFIX environment variable [2].

Because on Qubes the pixels to pdf conversion happens on the host and
Qubes has a lower PyMuPDF package version, we need to pass instead via
environment variable.

NOTE: the TESSDATA_PREFIX env. variable was set in dangerzone-cli
instead of closer to the calling method in `doc_to_pixels.py` since
PyMuPDF reads this variable as soon as the fitz module is imported
[3][4].

[1]: https://pymupdf.readthedocs.io/en/latest/pixmap.html#Pixmap.pdfocr_tobytes
[2]: https://pymupdf.readthedocs.io/en/latest/installation.html#enabling-integrated-ocr-support
[3]: https://github.com/pymupdf/PyMuPDF/discussions/2439
[4]: https://github.com/pymupdf/PyMuPDF/blob/5d6a7db/src/__init__.py#L159

Fixes #682
2024-02-07 13:13:10 +00:00
Alex Pyrgiotis
d1afe4c30a
Fix Podman crashes due to old conmon version
Switching from mounting files to writing to stdout has introduced some
Podman crashes in specific environments (Ubuntu Jammy / Debian Bullseye)
due to a conmon bug that affects version 2.0.25.

Fixing it for various permutations of the environments we support
requires the following:

1. CI tests: Install conmon from the oldstable-proposed-updates in
   our Debian Bullseye / Ubuntu Jammy dev/end-user environments.
2. Developers: Add a line in BUILD.md that suggests users to install
   conmon from the oldstable-proposed-updates repo, or some other repo
   they prefer.
3. End-user installations: We will build conmon for Ubuntu Jammy, and
   wait until the proposed updates repo gets merged in Debian Bullseye.

Fixes #685
2024-02-07 12:53:15 +00:00
deeplow
8a32d80762
Remove leftover progress variable in pixels_to_pdf
Since the progress information is now inferred on host based on the
number of pages obtained, progress-tracking variables should be removed
from the server.
2024-02-06 20:11:52 +00:00
deeplow
69c2a02d81
Remove timeouts
Remove timeouts due to several reasons:

1. Lost purpose: after implementing the containers page streaming the
   only subprocess we have left is LibreOffice. So don't have such a
   big risk of commands hanging (the original reason for timeouts).

2. Little benefit: predicting execution time is generically unsolvable
   computer science problem. Ultimately we were guessing an arbitrary
   time based on the number of pages and the document size. As a guess
   we made it pretty lax (30s per page or MB). A document hanging for
   this long will probably lead to user frustration in any case and the
   user may be compelled to abort the conversion.

3. Technical Challenges with non-blocking timeout: there have been
several technical challenges in keeping timeouts that we've made effort
to accommodate. A significant one was having to do non-blocking read to
ensure we could timeout when reading conversion stream (and then used
here)

Fixes #687
2024-02-06 20:11:43 +00:00
deeplow
4d3f2b32c7
Revert "Add Stopwatch implementation"
This reverts commit 344d6f7bfa.
Stopwatch is no longer needed now that we're removing timeouts.
2024-02-06 19:42:42 +00:00
deeplow
f31374e33c
Revert "Add non-blocking read utility"
This reverts commit fea193e935.

This is part of the purge of timeout-related code since we no longer
need it [1]. Non-blocking reads were introduced in the reverted commit
in order to be able to cut a stream mid-way due to a timeout. This is
no longer needed now that we're getting rid of timeouts.

[1]: https://github.com/freedomofpress/dangerzone/issues/687
2024-02-06 19:42:41 +00:00
deeplow
07dd54cd13
Fix hanging: disable container logging
The conversion was hanging arbitrarily [1] on some systems. Sometimes it
would send the full page other times stop half-way.

Originally found by @apyrgio.

Co-authored-by: @apyrgio

[1]: https://github.com/freedomofpress/dangerzone/pull/627#issuecomment-1892491968
2024-02-06 19:42:41 +00:00
deeplow
f3032a7142
Make big endian explicit in int to bytes
Fix issues in older distros that don't yet support python 3.11 where
endianness was not a default argument [1]. This is in response to CI
failures [2].

[1]: https://docs.python.org/3/library/stdtypes.html#int.to_bytes
[2]: https://app.circleci.com/pipelines/github/freedomofpress/dangerzone/2186/workflows/e340ca21-85ce-42b6-9bc3-09e66f96684a/jobs/27380y
2024-02-06 19:42:41 +00:00
deeplow
5e169a832b
Bump CI macOS python version to 3.11
Attempt to fix missing issue installing poetry [1].

[1]: https://github.com/freedomofpress/dangerzone/actions/runs/7487413482/job/20379748604?pr=627
2024-02-06 19:42:41 +00:00
deeplow
1835756b45
Allow each conversion to have its own proc
If we increased the number of parallel conversions, we'd run into an
issue where the streams were getting mixed together. This was because
the Converter.proc was a single attribute. This breaks it down into a
local variable such that this mixup doesn't happen.
2024-02-06 19:42:41 +00:00
deeplow
943bab2def
Move Qubes-specific tests also to containers
Now that Qubes and Containers essentially share the same code, we can
have both run the same tests.
2024-02-06 19:42:41 +00:00
deeplow
61e7a3c107
Fix isolation provider tests
Conversions methods had changed and that was part of the reason why
the tests were failing. Furthermore, due to the `provider.proc`, which
stores the associated qrexec / container process, "server" exceptions
raise a IterruptedConversion error (now ConverterProcException), which
then requires interpretation of the process exit code to obtain the
"real" exception.
2024-02-06 19:42:41 +00:00
deeplow
0a54f6461a
Speed up container image building (pull + build)
Avoids downloading the container image 4 times in the multi-stage build
by first pulling the alpine image once and then building without any
pulls.

Implemented following a suggestion of @apyrgio.
2024-02-06 19:42:41 +00:00
deeplow
550786adfe
Remove untrusted progress parsing (stderr instead)
Now that only the second container can send JSON-encoded progress
information, we can the untrusted JSON parsing. The parse_progress was
also renamed to `parse_progress_trusted` to ensure future developers
don't mistake this as a safe method.

The old methods for sending untrusted JSON were repurposed to send the
progress instead to stderr for troubleshooting in development mode.

Fixes #456
2024-02-06 19:42:40 +00:00
deeplow
c991e530d0
Fix IsolationProvider.percentage variable reuse
If one converted more than one document, since the state of
IsolationProvider.percentage would be stored in the IsolationProvider
instance, it would get reused for the second document. The fix is to
keep it as a local variable, but we can explore having progress stored
on the document itself, for example. Or having one IsolationProvider per
conversion.
2024-02-06 19:42:40 +00:00
deeplow
0a099540c8
Stream pages in containers: merge isolation providers
Merge Qubes and Containers isolation providers core code into the class
parent IsolationProviders abstract class.

This is done by streaming pages in containers for exclusively in first
conversion process. The commit is rather large due to the multiple
interdependencies of the code, making it difficult to split into various
commits.

The main conversion method (_convert) now in the superclass simply calls
two methods:
  - doc_to_pixels()
  - pixels_to_pdf()

Critically, doc_to_pixels is implemented in the superclass, diverging
only in a specialized method called "start_doc_to_pixels_proc()". This
method obtains the process responsible that communicates with the
isolation provider (container / disp VM) via `podman/docker` and qrexec
on Containers and Qubes respectively.

Known regressions:
  - progress reports stopped working on containers

Fixes #443
2024-02-06 19:42:33 +00:00
deeplow
331b6514e8
Containers: remove debug messages (via files)
Remove container_log messages ahead of debug info being sent over
standard streams.
2024-02-06 18:54:39 +00:00
deeplow
dca46d0a6b
Homogenize qubes and containers inner convert method
Simple rename of the __convert() method in the Qubes conversion to make
the code structurally similar.
2024-02-06 18:54:31 +00:00
Alex Pyrgiotis
93bf0af348
ci: Reclaim some of the used space
Reclaim some storage space in the middle of the CI job that builds and
installs Dangerzone in Fedora. The reason is that previously, we
encountered an issues with CI runners running out of space.
2024-02-05 15:35:12 +02:00
deeplow
7f0346686d
Add Dangerzone logo to Fedora build
Fixes #645
2024-02-01 13:53:49 +00:00
deeplow
cd99122385
Adds file formats: epub svg bmp pnm bpm ppm
Partially fix for #660. Missing some files due to limitations [1]:
- PSD - only available from PyMuPDF>=1.23.0 (qubes-fedora is lower)
- TXT - only available from PyMuPDF>=1.23.7 (qubes-fedora is lower)
- JXR - PyMuPDF was refusing to due to missing codec [1]
- JPX - Generated test file was rejected by PyMuPDF [2]
- FB2 - Most often cannot be detected by mime type alone [3]
- CBZ - (idem)
- XPS - (idem)
- MOBI - (idem)
- PAM - General version of other file format already included, so I
  decided not to include this extension [0]

New test files were generated locally:
 - epub - generated with calibre's convert-ebook from another
   sample file
 - svg - generated with inkscape from a mix of a default template
   (hexagons) and a logo's PNG file
 - bmp, pnm, bpm, ppm - generated with ImageMagick's 'convert' from
   tests/test_docs/sample-png.png

[0]: https://github.com/freedomofpress/dangerzone/issues/660#issuecomment-1914681487
[1]: https://github.com/freedomofpress/dangerzone/issues/660#issuecomment-1916803201
[2]: https://github.com/freedomofpress/dangerzone/issues/660#issuecomment-1916870347
[3]: https://github.com/freedomofpress/dangerzone/issues/688
2024-01-31 19:58:48 +00:00
deeplow
4e720aa6e2
Replace 'None' conversion type with "PyMuPDF"
Replaced for clarity over the fact that this conversion is in fact
handled by PyMuPDF.
2024-01-31 19:58:36 +00:00
Alex Pyrgiotis
3e10fd1df4
Explain what happens when PySide6 gets updated
Explain what happens when we bump our `poetry.lock`, and a new
Pyside6 version. Also, have a step-by-step guide on how the maintainer
should create a new PySide6 RPM and update FPF's repo, so that
Dangerzone can be released.
2024-01-31 17:11:31 +02:00
Alex Pyrgiotis
46d5827772
Elaborate on how to add/remove Linux platforms
Explain what's the process behind adding/removing Linux platforms, prior
to a release.
2024-01-31 17:11:30 +02:00
Alex Pyrgiotis
3bc3c6c120
ci: Build and install Dangerzone RPMs
Add some Fedora CI jobs that build RPMs, install them in an end-user
environment, and make a simple conversion and GUI import check. These
are basically smoke tests for Fedora, similar to the ones we have for
Debian.
2024-01-31 17:11:30 +02:00
Alex Pyrgiotis
d54ef875a6
Add official support for Fedora 39
Now that we can create a Dangerzone RPM that depends on PySide6, we can
officially support Fedora 39 as a platform. Add this platform in our CI
tests, as well as our install/release notes.

Fixes #606
2024-01-31 17:11:30 +02:00
Alex Pyrgiotis
b0da1dde5f
dev_scripts: Build end-user Fedora env with PySide6
Extend the env.py script to build an end-user, Fedora 39+ environment
with PySide6 installed, as a regular RPM package. Previously, this was
only possible for development environments with PySide6 downloaded from
PyPI.

As a way to simplify builds, the env.py script offers the option to
download the RPM package itself from FPF's RPM repo [1], if the package
has been uploaded.

[1]: https://packages.freedom.press/yum-tools-prod
2024-01-31 17:11:30 +02:00
Alex Pyrgiotis
84037d4ffb
dev_scripts: Return exit code for failures
The env.py dev script does not return an exit code for failures, so we
add the necessary 'return' statements to do so.
2024-01-31 17:07:32 +02:00
Alex Pyrgiotis
3684b7ff61
Build Dangerzone RPM with PySide6 dependency
Update our RPM spec file to include PySide6 as a dependency, for Fedora
39 onward.
2024-01-31 17:07:32 +02:00
Alex Pyrgiotis
d7ee162852
Add support for Python 3.12
Fedora 39 ships with Python 3.12 by default, which Dangerzone previously
did not support due to limitations from the PySide6 package. Now that
the PySide6 package has been updated to 6.6.1, and the limitation has
lifted, we should to reflect this in pyproject.toml.
2024-01-31 17:07:32 +02:00
Alex Pyrgiotis
741c8311ee
Bump python dependencies via poetry lock 2024-01-31 17:07:32 +02:00
Alex Pyrgiotis
72ddbfd55a
dev_scripts: Install a subset of Podman deps
Install a subset of Podman dependencies, so that we don't also install
Systemd. Doing so can introduce some subtle issues of its own, which is
why we prefer cherry-picking the Podman packages we really need.

Fixes #689
2024-01-30 14:24:45 +02:00
Alex Pyrgiotis
d854657883
Include data files only in source distribution
Make Poetry include data files only in the source distribution, and not
on our wheels. This mainly makes RPM packaging a bit easier, but does
not solve the problem of how to install files to
`/usr/share/dangerzone`.

Also, include files using globs, which is the way Poetry prefers.

Fixes #678
Refs #677
2024-01-23 16:19:45 +02:00
Alex Pyrgiotis
067e787a3d
install: Remove .gitignore for rpm-build
Remove the .gitignore file for rpm-build, because it leads to making
Poetry ignore the Dangerzone module, when building the Python wheel.

Refs #678
2024-01-23 16:19:44 +02:00
deeplow
629278ae4a
Add capitalization to the changelog 2024-01-23 09:10:47 +00:00
sudwhiwdh
3d426ed36b
Linux desktop entry capitalisation 2024-01-22 11:49:42 +00:00
sudwhiwdh
b4ef47e101
GUI header capitalisation 2024-01-22 11:38:54 +00:00
Prateek Jain
699b116d4d
Add clang-dev to Dockerfile 2024-01-15 16:54:00 +00:00
Alex Pyrgiotis
a6755080ad
Ignore CVE-2023-7104 from our security scans
Our security scans for the released container image have flagged
CVE-2023-7104. Our assessment is that this CVE doesn't affect
Dangerzone, mainly because our understanding is that attackers cannot
embed SQLite dbs within LibreOffice spreadsheets.
2024-01-09 20:28:01 +02:00
Alex Pyrgiotis
2f318f1633
Remove stale ignored CVEs
Remove some CVEs from our ignore list of Grype, which affected previous
Dangerzone images.
2024-01-09 20:18:11 +02:00
deeplow
f27296cd45
Replace MIT license with AGPLv3
License change required due to the inclusion of the AGPL-licensed
PyMuPDF. This library greatly benefited Dangerzone in many aspects
detailed in [1].

Fixes #658

[1]: https://github.com/freedomofpress/dangerzone/issues/658
2024-01-04 09:57:49 +00:00
Alex Pyrgiotis
7e21d5e8c4
ci: Use Docker for building images, instead of Podman 2024-01-03 15:57:49 +00:00
Alex Pyrgiotis
f254575cb4
install: Make build image script more flexible
Add the following functionality to the build image script:

1. Let the user choose the container runtime of their choice. In some
   systems, both Docker and Podman may be available, so we need to let
   the user choose which runtime they want.
2. Let users choose if they want to save the image. For non-production
   builds, we may want to simply build the container image, without
   the time penalty of compression.
2024-01-03 15:57:41 +00:00
deeplow
f1d90c6fa9
Compress per page when not using OCR
Make the compression happen per page when OCR is not enabled [1].

[1]: https://github.com/freedomofpress/dangerzone/pull/622#discussion_r1410986342
2024-01-03 12:58:36 +00:00
deeplow
e2531279c0
FIXUP Revert "Disable image compression when saving PDF"
This reverts commit f074db0beaa50389634203657f9b46307164a353.
2024-01-03 12:58:36 +00:00
deeplow
f676891482
Remove Dockerfile dependencies replaced by PyMuPDF
PyMuPDF replaced the need for almost all dependencies, which this commit
now removes.

We are also removing tesseract-ocr as a dependency since
(to our surprise) PyMuPDF ships directly with tesseract binaries [1].
However, now that tesseract-ocr is not available directly as a binary
tool, the `test_ocr.py` needed to be changed.

Fixes #658

[1]: https://github.com/freedomofpress/dangerzone/issues/658#issuecomment-1861033149
2024-01-03 12:58:36 +00:00