Commit graph

550 commits

Author SHA1 Message Date
Alex Pyrgiotis
e11aaec3ac
Always use sys.exit when exiting the application
The `exit()` [1] function is not necessarily present in every Python
environment, as it's added by the `site` module. Also, this function is
"[...] useful for the interactive interpreter shell and should not be
used in programs"

For this reason, we replace all such occurrences with `sys.exit()` [2],
which is the canonical function to exit Python programs.

[1]: https://docs.python.org/3/library/constants.html#exit
[2]: https://docs.python.org/3/library/sys.html#sys.exit
2024-05-09 15:57:42 +03:00
Alex Pyrgiotis
d6202cd028
Invoke external command on Windows properly
On Windows, if we don't use the `startupinfo=` argument of
subprocess.Popen, then a terminal window will flash while running the
command.

Use `startupinfo=` when killing a container, as we do for every other
command.
2024-05-09 15:57:42 +03:00
Alex Pyrgiotis
1c70ee6771
Fix archiving the same doc twice on Windows
On Windows, if we somehow attempt to archive the same document twice
(e.g, because it got archived once, and then we copy it back), we will
get an error, because Windows does not overwrite the target path, if it
already exists.

Fix this issue by always removing the previously archived version, when
performing the next archival action, and update our tests.
2024-05-09 15:57:42 +03:00
Naglis Jonaitis
8cdb2d5720
Set the desktop filename and app name of the Qt application
Currently, the app ID of the Dangerzone GUI application when running
under Wayland is `python3`, which is not very useful if one wants to
automate some action related to the Dangerzone application window (e.g.
to always start Dangerzone window in floating mode under Sway WM).

Setting the desktop filename property also sets the app ID of the
application under Wayland. According to Qt documentation[1], the
property value should be the name of the application's .desktop file but
without the extension.

Qt documentation also states:

> This property gives a precise indication of what desktop entry
> represents the application and it is needed by the windowing system to
> retrieve such information without resorting to imprecise heuristics.

Therefore I also think that setting this property is needed to display
the correct application name and icon (taken from the .desktop entry)
when running under certain windowing systems (like Wayland)
(see also #402).

Note that this property is not enough, as we've encountered systems
where setting just the desktop file name does not alter the detected
application name by the window manager. For this reason, we also use
set the application name [2] to `dangerzone`, to remove any ambiguity.

[1]: https://doc.qt.io/qt-6/qguiapplication.html#desktopFileName-prop
[2]: https://doc.qt.io/qt-6/qcoreapplication.html#applicationName-prop

Fixes #402
2024-04-25 17:23:02 +03:00
Naglis Jonaitis
d632908a44
Fix printing of filenames with surrogate escapes
On Unix systems a filename can be a sequence of bytes that is not valid
UTF-8. Python uses[1] surrogate escapes to allow to decode such
filenames to Unicode (bytes that cannot be decoded are replaced by a
surrogate; upon encoding the surrogate is converted to the original
byte).

From `click` docs[2]:

> Invalid bytes or surrogate escapes will raise an error when written
> to a stream with `errors="strict"`. This will typically happen with
> `stdout` when the locale is something like `en_GB.UTF-8`.

To fix that, we use `utils.replace_control_chars()` before printing the
filenames to `stdout` so that surrogate escapes are replaced by �.

Fixes #768
2024-04-25 14:11:25 +03:00
Naglis Jonaitis
52ced04507
Relax the restrictions of util.replace_control_chars
The `util.replace_control_chars()` function was overly strict, and
would replace every non-ASCII character with "_". This included both
control characters, as well as normal characters in a non-English
alphabet.

Relax these restrictions by checking each character and deciding if it's
a Unicode control character, using the `unicodedata` Python package.
With this change, emojis and non-English letters are now allowed.
2024-04-25 14:11:16 +03:00
Alex Pyrgiotis
f57d2f7191
isolation_provider: Always terminate spawned process
Previously, we always assumed that the spawned process would quit
within 3 seconds. This was an arbitrary call, and did not work in
practice.

We can improve our standing here by doing the following:

1. Make `Popen.wait()` calls take a generous amount of time (since they
   are usually on the sad path), and handle any timeout errors that they
   throw. This way, a slow conversion process cleanup does not take too
   much of our users time, nor is it reported as an error.
2. Always make sure that once the conversion of doc to pixels is over,
   the corresponding process will finish within a reasonable amount of
   time as well.

Fixes #749
2024-04-24 14:39:15 +03:00
Alex Pyrgiotis
cd4cbdb00a
isolation_provider: Get exit code without timing out
Get the exit code of the spawned process for the doc-to-pixels phase,
without timing out. More specifically, if the spawned process has not
finished within a generous amount of time (hardcode to 15 seconds),
return UnexpectedConversionError, with a custom message.

This way, the happy path is not affected, and we still make our best to
learn the underlying cause of the I/O error.
2024-04-24 14:36:14 +03:00
Alex Pyrgiotis
171a7eca52
isolation_provider: Terminate doc-to-pixels proc
Extend the IsolationProvider class with a
`terminate_doc_to_pixels_proc()` method, which must be implemented by
the Qubes/Container providers and gracefully terminate a process started
for the doc to pixels phase.

Refs #563
2024-04-24 14:36:14 +03:00
Alex Pyrgiotis
a63f4b85eb
isolation_provider: Set a unique name for spawned containers
Set a unique name for spawned containers, based on the ID of the
provided document. This ID is not globally unique, as it has few bits of
entropy.  However, since we only want to avoid collisions within a
single Dangerzone invocation, and since we can't support multiple
containers running in parallel, this ID will suffice.
2024-04-24 14:33:33 +03:00
Alex Pyrgiotis
6850d31edc
isolation_provider: Pass doc when creating doc-to-pixels proc
Pass the Document instance that will be converted to the
`IsolationProvider.start_doc_to_pixels_proc()` method. Concrete classes
can then associate this name with the started process, so that they can
later on kill it.
2024-04-24 14:33:33 +03:00
deeplow
dfcb10c494
Move settings.json into constant
Move settings.json into a constant so that they can later be referred to
by the testing module.
2024-04-01 18:18:41 +03:00
deeplow
ad16a0e471
Fix Settings().set() when setting new setting
Settings().set() would fail if we were trying to set a setting that did
not exist before. The reason is because before setting it would try to
get the previous value, but though direct key access, which would lead
to an exception.
2024-04-01 18:18:41 +03:00
Alex Pyrgiotis
74c467eaf7
conversion: Do not let PyMuPDF print to stdout
PyMuPDF has some hardcoded log messages that print to stdout [1]. We don't
have a way to silence them, because they don't use the Python logging
infrastructure.

What we can do here is silence a particular call that's been creating
debug messages. For a long term solution, we have sent a PR to the
PyMuPDF team, and we will follow up there [2].

Fixes #700

[1]: https://github.com/freedomofpress/dangerzone/issues/700
[2]: https://github.com/pymupdf/PyMuPDF/pull/3137
2024-03-13 21:03:15 +02:00
Alex Pyrgiotis
a31f3370d0
Capture missing logs in second-stage conversion
For a while now, we didn't get logs for the second-stage conversion
when using containers. Extend the code to log any captured output from
the second stage conversion, only if we run Dangerzone via our dev
entrypoint.

Note that the Qubes isolation provider was always logging output from
the second stage of the conversion.
2024-03-13 20:59:50 +02:00
deeplow
0449840ec3
dz.ConvertDev: do not teleport .pyc files
On Qubes the conversion in dev mode would fail when converting from a
Fedora 38 development qube via a Fedora 39 disposable qube. The reason
was that dz.ConvertDev was receiving `.pyc` files, which were compiled
for python 3.11 but running on python 3.12.

Unfortunately PyZipFile objects cannot send source python files, even
though the documentation is a little bit unclear on this [1].

Fixes #723

[1]: https://docs.python.org/3/library/zipfile.html#pyzipfile-objects
2024-03-13 07:13:39 +00:00
Alex Pyrgiotis
f75d471ec8
Fix OCR bug in Qubes Fedora 38 templates
Provide a fix for an OCR bug that affected Fedora 38 templates of Qubes
OS. In that specific configuration, the PyMuPDF version accepts the
Tesseract data directory only from the `TESSDATA_PREFIX` environment
variable. Our mistake was that we were setting this environment variable
in a dev script, instead of setting it for all configurations.

In this commit, we set an attribute in the fitz.fitz module, so that
both dev scripts and end-user installations can work. This is hacky, but
it targets an old PyMuPDF release after all, so we don't expect things
to break in the long run.

Fixes #737
2024-03-04 16:53:04 +02:00
Alex Pyrgiotis
5b6911af84
Properly add new file extensions
Accept `.svg` and `.bmp` files when browsing via the Dangerzone GUI.
Support for these extensions has already been added in the converter
code that runs in the sandbox (cd99122385)
but they were erroneously left out from the filter in the Dangerzone
main window.
2024-02-20 16:02:38 +02:00
Alex Pyrgiotis
e73f10f99b
Handle gracefully unknown error codes
Do not throw exceptions for unknown error codes. If
`get_proc_exception()` gets called from within an exception context and
raises an exception itself, then this exception will not get caught, and
it will get lost.

Prefer instead to return an exception class that we have for this
purpose, and show to the user the unknown error code of the converesion
process.
2024-02-20 16:00:35 +02:00
Alex Pyrgiotis
634523dac9
Get underlying error when conversion fails
When we get an early EOF from the converter process, we should
immediately get the exit code of that process, to find out the actual
underlying error. Currently, the exception we raise masks the underlying
error.

Raise a ConverterProcException, that in turns makes our error handling
code read the exit code of the spawned process, and converts it to a
helpful error message.

Fixes #714
2024-02-20 15:55:45 +02:00
Alex Pyrgiotis
6ee1d14c9a
Start conversion process earlier
Start the conversion process earlier, so that we have a reference to the
Popen object in case of an exception.
2024-02-20 15:55:45 +02:00
deeplow
e4a5dbce46
Don't show 50% duplicated progress info
50% would show twice in the conversion progress due to an overlap in
conversion progress values. The doc_to_pixels would be from 0-50% and
the pixels_to_pdf from 50%-100%.

This commit makes the first part go from 0 to 49% instead.

Fixes #715
2024-02-20 13:47:15 +00:00
deeplow
75f8d76c5b
Appease new version of black lint tool 2024-02-13 11:36:10 +00:00
deeplow
879fca6f9f
Remove uneeded TESSDATA_PREFIX setting in container
The container image does not need the TESSDATA_PREFIX env variable since
its PyMuPDF version is new enough to support `tessdata` as an argument
when calling the PyMuPDF tesseract method.
2024-02-07 13:14:08 +00:00
deeplow
6006beeb03
Fix OCR on Qubes: PyMuPDF required TESSDATA_PREFIX
PyMuPDF versions lower than 1.22.5 pass the tesseract data path as
an argument to `pixmap.pdfocr_tobytes()` [1], but lower versions require
setting instead the TESSDATA_PREFIX environment variable [2].

Because on Qubes the pixels to pdf conversion happens on the host and
Qubes has a lower PyMuPDF package version, we need to pass instead via
environment variable.

NOTE: the TESSDATA_PREFIX env. variable was set in dangerzone-cli
instead of closer to the calling method in `doc_to_pixels.py` since
PyMuPDF reads this variable as soon as the fitz module is imported
[3][4].

[1]: https://pymupdf.readthedocs.io/en/latest/pixmap.html#Pixmap.pdfocr_tobytes
[2]: https://pymupdf.readthedocs.io/en/latest/installation.html#enabling-integrated-ocr-support
[3]: https://github.com/pymupdf/PyMuPDF/discussions/2439
[4]: https://github.com/pymupdf/PyMuPDF/blob/5d6a7db/src/__init__.py#L159

Fixes #682
2024-02-07 13:13:10 +00:00
deeplow
8a32d80762
Remove leftover progress variable in pixels_to_pdf
Since the progress information is now inferred on host based on the
number of pages obtained, progress-tracking variables should be removed
from the server.
2024-02-06 20:11:52 +00:00
deeplow
69c2a02d81
Remove timeouts
Remove timeouts due to several reasons:

1. Lost purpose: after implementing the containers page streaming the
   only subprocess we have left is LibreOffice. So don't have such a
   big risk of commands hanging (the original reason for timeouts).

2. Little benefit: predicting execution time is generically unsolvable
   computer science problem. Ultimately we were guessing an arbitrary
   time based on the number of pages and the document size. As a guess
   we made it pretty lax (30s per page or MB). A document hanging for
   this long will probably lead to user frustration in any case and the
   user may be compelled to abort the conversion.

3. Technical Challenges with non-blocking timeout: there have been
several technical challenges in keeping timeouts that we've made effort
to accommodate. A significant one was having to do non-blocking read to
ensure we could timeout when reading conversion stream (and then used
here)

Fixes #687
2024-02-06 20:11:43 +00:00
deeplow
4d3f2b32c7
Revert "Add Stopwatch implementation"
This reverts commit 344d6f7bfa.
Stopwatch is no longer needed now that we're removing timeouts.
2024-02-06 19:42:42 +00:00
deeplow
f31374e33c
Revert "Add non-blocking read utility"
This reverts commit fea193e935.

This is part of the purge of timeout-related code since we no longer
need it [1]. Non-blocking reads were introduced in the reverted commit
in order to be able to cut a stream mid-way due to a timeout. This is
no longer needed now that we're getting rid of timeouts.

[1]: https://github.com/freedomofpress/dangerzone/issues/687
2024-02-06 19:42:41 +00:00
deeplow
07dd54cd13
Fix hanging: disable container logging
The conversion was hanging arbitrarily [1] on some systems. Sometimes it
would send the full page other times stop half-way.

Originally found by @apyrgio.

Co-authored-by: @apyrgio

[1]: https://github.com/freedomofpress/dangerzone/pull/627#issuecomment-1892491968
2024-02-06 19:42:41 +00:00
deeplow
f3032a7142
Make big endian explicit in int to bytes
Fix issues in older distros that don't yet support python 3.11 where
endianness was not a default argument [1]. This is in response to CI
failures [2].

[1]: https://docs.python.org/3/library/stdtypes.html#int.to_bytes
[2]: https://app.circleci.com/pipelines/github/freedomofpress/dangerzone/2186/workflows/e340ca21-85ce-42b6-9bc3-09e66f96684a/jobs/27380y
2024-02-06 19:42:41 +00:00
deeplow
1835756b45
Allow each conversion to have its own proc
If we increased the number of parallel conversions, we'd run into an
issue where the streams were getting mixed together. This was because
the Converter.proc was a single attribute. This breaks it down into a
local variable such that this mixup doesn't happen.
2024-02-06 19:42:41 +00:00
deeplow
61e7a3c107
Fix isolation provider tests
Conversions methods had changed and that was part of the reason why
the tests were failing. Furthermore, due to the `provider.proc`, which
stores the associated qrexec / container process, "server" exceptions
raise a IterruptedConversion error (now ConverterProcException), which
then requires interpretation of the process exit code to obtain the
"real" exception.
2024-02-06 19:42:41 +00:00
deeplow
550786adfe
Remove untrusted progress parsing (stderr instead)
Now that only the second container can send JSON-encoded progress
information, we can the untrusted JSON parsing. The parse_progress was
also renamed to `parse_progress_trusted` to ensure future developers
don't mistake this as a safe method.

The old methods for sending untrusted JSON were repurposed to send the
progress instead to stderr for troubleshooting in development mode.

Fixes #456
2024-02-06 19:42:40 +00:00
deeplow
c991e530d0
Fix IsolationProvider.percentage variable reuse
If one converted more than one document, since the state of
IsolationProvider.percentage would be stored in the IsolationProvider
instance, it would get reused for the second document. The fix is to
keep it as a local variable, but we can explore having progress stored
on the document itself, for example. Or having one IsolationProvider per
conversion.
2024-02-06 19:42:40 +00:00
deeplow
0a099540c8
Stream pages in containers: merge isolation providers
Merge Qubes and Containers isolation providers core code into the class
parent IsolationProviders abstract class.

This is done by streaming pages in containers for exclusively in first
conversion process. The commit is rather large due to the multiple
interdependencies of the code, making it difficult to split into various
commits.

The main conversion method (_convert) now in the superclass simply calls
two methods:
  - doc_to_pixels()
  - pixels_to_pdf()

Critically, doc_to_pixels is implemented in the superclass, diverging
only in a specialized method called "start_doc_to_pixels_proc()". This
method obtains the process responsible that communicates with the
isolation provider (container / disp VM) via `podman/docker` and qrexec
on Containers and Qubes respectively.

Known regressions:
  - progress reports stopped working on containers

Fixes #443
2024-02-06 19:42:33 +00:00
deeplow
331b6514e8
Containers: remove debug messages (via files)
Remove container_log messages ahead of debug info being sent over
standard streams.
2024-02-06 18:54:39 +00:00
deeplow
dca46d0a6b
Homogenize qubes and containers inner convert method
Simple rename of the __convert() method in the Qubes conversion to make
the code structurally similar.
2024-02-06 18:54:31 +00:00
deeplow
cd99122385
Adds file formats: epub svg bmp pnm bpm ppm
Partially fix for #660. Missing some files due to limitations [1]:
- PSD - only available from PyMuPDF>=1.23.0 (qubes-fedora is lower)
- TXT - only available from PyMuPDF>=1.23.7 (qubes-fedora is lower)
- JXR - PyMuPDF was refusing to due to missing codec [1]
- JPX - Generated test file was rejected by PyMuPDF [2]
- FB2 - Most often cannot be detected by mime type alone [3]
- CBZ - (idem)
- XPS - (idem)
- MOBI - (idem)
- PAM - General version of other file format already included, so I
  decided not to include this extension [0]

New test files were generated locally:
 - epub - generated with calibre's convert-ebook from another
   sample file
 - svg - generated with inkscape from a mix of a default template
   (hexagons) and a logo's PNG file
 - bmp, pnm, bpm, ppm - generated with ImageMagick's 'convert' from
   tests/test_docs/sample-png.png

[0]: https://github.com/freedomofpress/dangerzone/issues/660#issuecomment-1914681487
[1]: https://github.com/freedomofpress/dangerzone/issues/660#issuecomment-1916803201
[2]: https://github.com/freedomofpress/dangerzone/issues/660#issuecomment-1916870347
[3]: https://github.com/freedomofpress/dangerzone/issues/688
2024-01-31 19:58:48 +00:00
deeplow
4e720aa6e2
Replace 'None' conversion type with "PyMuPDF"
Replaced for clarity over the fact that this conversion is in fact
handled by PyMuPDF.
2024-01-31 19:58:36 +00:00
sudwhiwdh
b4ef47e101
GUI header capitalisation 2024-01-22 11:38:54 +00:00
deeplow
f1d90c6fa9
Compress per page when not using OCR
Make the compression happen per page when OCR is not enabled [1].

[1]: https://github.com/freedomofpress/dangerzone/pull/622#discussion_r1410986342
2024-01-03 12:58:36 +00:00
deeplow
e2531279c0
FIXUP Revert "Disable image compression when saving PDF"
This reverts commit f074db0beaa50389634203657f9b46307164a353.
2024-01-03 12:58:36 +00:00
deeplow
ee35e28aa6
Disable image compression when saving PDF
Some tests [1] lead to the conclusion that ocr_compression does the same
to the file (performance and size-wise) to the file as deflating images
when saving the file. However, both methods active do add a bit of extra
time. For this reason we're disabling the image deflation (default
option).

[1]: https://github.com/freedomofpress/dangerzone/pull/622#discussion_r1434042296
2024-01-03 12:58:36 +00:00
deeplow
6f61e44502
Solve import errors by lazy-loading fitz module
Qubes does on-host pixels-to-pdf whereas the containers version doesn't.
This leads to an issue where on the containers version it tries to load
fitz, which isn't installed there, just because it's trying to check if
it should run the Qubes version.

The error it was showing was something like this:

    ImportError while loading conftest '/home/user/dangerzone/tests/conftest.py'.
        tests/__init__.py:8: in <module>
            from dangerzone.document import SAFE_EXTENSION
        dangerzone/__init__.py:16: in <module>
            from .gui import gui_main as main
        dangerzone/gui/__init__.py:28: in <module>
            from ..isolation_provider.qubes import Qubes, is_qubes_native_conversion
        dangerzone/isolation_provider/qubes.py:15: in <module>
            from ..conversion.pixels_to_pdf import PixelsToPDF
        dangerzone/conversion/pixels_to_pdf.py:16: in <module>
            import fitz
        E   ModuleNotFoundError: No module named 'fitz'

For context see discussion in [1].

[1]: https://github.com/freedomofpress/dangerzone/pull/622#issuecomment-1839164885
2024-01-03 12:58:36 +00:00
deeplow
80db7bb02e
Remove pre-pymupdf exceptions and detect pymupdf ones 2024-01-03 12:58:35 +00:00
deeplow
b75417bbec
Remove all server-side timeouts from doc to pixels
Now we're using client-side timeouts so the server side-ones are not
needed. Implemented following the suggestion from @apyrgio [1].

[1]: https://github.com/freedomofpress/dangerzone/pull/622#discussion_r1413906514
2024-01-03 12:58:35 +00:00
deeplow
576cbd3382
Fix DPI mismatch between doc2pixels and pixels2pdf
The original document was larger in dimensions than the original one due
to a mismatch in DPI settings. When converting documents to pixels we
were setting the DPI to 150 pixels per inch. Then when converting back
into a PDF we were using 70 DPI. This difference would result in an
overall larger document in dimensions (though not necessarily in file
size).

Fixes #626
2024-01-03 12:58:34 +00:00
deeplow
e5dbe25abb
Replace 'convert' with PyMuPDF for images
PyMuPDF can also convert images of the types we already support so we
don't need ImageMagick's 'convert'.
2024-01-03 12:58:34 +00:00
deeplow
77d5ea5940
Add PyMuPDF in pixels_to_pdf replacing old logic
Adding PyMuPDF essentially make the code much simpler since it can do
everything that we'd need multiple programs for. It also includes
tesseract-OCR integration, which this commit makes use of.
2024-01-03 12:56:33 +00:00
deeplow
ba17016643
Doc_to_pixels: remove unneeded timeout
Timeout can no longer be used since we're not calling a subprocess. We
could still implement it, but it's more worthy to reply in
yet-to-implement client-side timeouts (in containers).
2024-01-03 12:40:45 +00:00
deeplow
317deadbe4
Replace pdfinfo logic (get # pages) with PyMuPDF 2024-01-03 12:40:45 +00:00
deeplow
327ab8791f
Replace pdftoppm logic with PyMuPDF (native python)
Use PyMuPDF (AGPL-licensed) within the container conversion to replace
the pdf conversion to RGB. This massively simplifies the code since
PyMuPDF is a native python library.
2024-01-03 12:40:45 +00:00
Alex Pyrgiotis
5bf7549b55
Fix typo 2023-12-29 18:30:48 +02:00
Moon Sungjoon
63aea4cb45
Enable HWP conversion on MacOS (Apple silicon CPU)
This PR reverts the patch that disables HWP / HWPX conversion on MacOS M1.
It does not fix conversion on Qubes OS (#494).

Previously, HWP / HWPX conversion didn't work on MacOS (Apple silicon CPU) (#498)
because libreoffice wasn't built with Java support on Alpine Linux for ARM (aarch64).

Gratefully, the Alpine team has enabled Java support on the aarch64
system [1], so we can enable it again for ARM architectures.
And this patch is included in Alpine 3.19

This commit was included in #541 and reverted on #562 due to a stability issue.

Fixes #498

[1]: 74d443f479
2023-12-13 12:57:22 +02:00
Garrett Robinson
53115b3ffa
Use more descriptive button labels in update check prompt 2023-10-31 12:52:34 +00:00
Alex Pyrgiotis
ba5adb33c0
Fix a bug in "Change Selection"
Fix a bug in the "Change Selection" action, whereby changing your
selection and picking files from another directory results in:

    "Dangerzone does not support adding documents from multiple
    locations. The newly added documents were ignored."

To fix this, change the output directory when we change selection as
well.
2023-10-13 22:45:11 +03:00
Alex Pyrgiotis
edfba0c783
Qubes: Fix progress in first stage of Qubes conversion 2023-10-13 22:44:37 +03:00
deeplow
186ddd6b1e
Allow user to override update checking on Linux
The original intention of leaving the update checkbox in the hamburger
menu was to let non-supported Linux distros (e.g. compiled from source)
to check for updates. However, on Linux it ended up being disabled
forcefully by default on startup.

This takes into account an overriden update checkbox.

Fixes #596
2023-10-13 17:01:53 +01:00
Alex Pyrgiotis
3daf0e2cb7
Do not show file previews in case of exceptions
If a Qubes conversion encounters an exception that is not a subclass of
ConversionException, it will still show a preview of a file that does
not exist.

Send an error progress report in that case, so that the GUI code can
detect that an error occurred and not open a file preview

Fixes #581
2023-10-05 11:11:42 +03:00
Alex Pyrgiotis
bdf3f8babc
qubes: Clean up temporary files
Create a temporary dir before the conversion begins, and store every
file necessary for the conversion there. We are mostly concerned about
the second stage of the conversion, which runs in the host. The first
stage runs in a disposable qube and cleanup is implicit.

Fixes #575
Fixes #436
2023-10-04 14:05:23 +03:00
Alex Pyrgiotis
f37d89f042
conversion: Allow using a temp dir other than /tmp
Extend the PixelsToPDF converter by adding an additional `tempdir`
argument. This argument can be used to make the conversion use a
different temporary directory other than `/tmp`.

For containers, this extra arguments makes no difference, as it won't be
used. For Qubes, this argument will allow storing files in a temporary
dir that will be cleaned up once the conversion completes. Previously,
these files would linger in the user's `/tmp`.

Refs #575
2023-10-04 14:00:53 +03:00
Alex Pyrgiotis
4f66353639
Add dark mode logic in our dialogs
Make our dialogs set the OSColorMode CSS property, so that we can
properly style them.

Refs #528
2023-10-02 16:34:56 +03:00
Alex Pyrgiotis
6232062146
Add missing newline char 2023-10-02 15:41:29 +03:00
Alex Pyrgiotis
b7b76174ab
qubes: Log captured output for the second stage
Log the captured command output during the second stage, only in dev
environments. This follows what we have already done for the first
stage.
2023-10-02 15:41:29 +03:00
Alex Pyrgiotis
16603875d6
qubes: Display all errors in second stage
If a command encounters an error or times out during the second stage of
the conversion in Qubes, handle it the same way as we would have handled
it in the first stage:

1. Get its error message.
2. Throw an UnexpectedConversionError exception, with the original
   message.

Note that, because the second stage takes place locally, users will see
the original content of the error.

Refs #567
Closes #430
2023-10-02 15:41:17 +03:00
Alex Pyrgiotis
2016965c84
Revert "Enable HWP conversion on MacOS M1"
This reverts commit 214ce9720d. The
rationale is that we want to wait until the LibreOffice package that
allows HWP conversion in Alpine Linux lands in `alpine:latest`.

For more info, read
https://github.com/freedomofpress/dangerzone/issues/498#issuecomment-1739894100
2023-10-02 14:22:47 +03:00
deeplow
7daeccdfea
Prevent PDF from overwriting num_pages in Qubes
This should only affect the alpha version of Qubes OS (in containers
it only allows the attacker to control the timeout). In short, an
attacker could have PDF metadata that would show before "Pages:" in
the `pdfinfo` command output and this would essentially override the
number of pages measured in the server. This could enable the attacker
to shorten the number of pages of a document for example.

Fixes #565
2023-10-02 12:18:12 +01:00
deeplow
dabdf6c286
FIXUP: rename to QubesQrexecFailed instead 2023-10-02 12:06:18 +01:00
deeplow
eb488b16c5
FIXUP: rename QubesNotEnoughRAMError to QubesConversionStartFailed 2023-10-02 11:51:55 +01:00
deeplow
9cfac7ac2a
Generalize "out of RAM" error to reflect other issues
When qrexec-client-vm fails, it could be a symptom of various issues:
  - the system being out of RAM
  - dz-dvm not existing

The exit code is the same in all cases (126), which makes it
particularly tricky to solve in the client application. For this reason
the approach is now to tell the user to see the qubes error notification
on the top right of their screen.
2023-10-02 11:06:17 +01:00
Alex Pyrgiotis
ccf4132ea0
conversion: Add sanity check for page count
Add a sanity check at the end of the conversion from doc to pixels, to
ensure that the resulting document will have the same number of pages as
the original one.

Refs #560
2023-09-28 22:50:54 +03:00
Alex Pyrgiotis
b4e5cf5be7
qubes: Stream page data in real time
Stream page data back to the caller, immediately after we read them from
pdftoppm. This way, we have more accurate progress reports and timeouts.

Fixes #557
2023-09-28 22:50:54 +03:00
Alex Pyrgiotis
4bb959f220
conversion: Add anchor points for streaming page data/metadata
Introduce 4 new methods that can be overloaded by the Qubes isolation
provider to stream page data/metadata back to the caller. For the time
being, these methods do what they did before, i.e., write this info in
files within the pixels directory.
2023-09-28 22:50:53 +03:00
Alex Pyrgiotis
6012cd1491
Improve EOF detection when reading command output
Do not read a line from the command output and then check if
we are at EOF, because it's possible that the writer immediately exited
after writing the last line of output. Instead, switch the order of
actions.

This is a very serious bug that can lead to Dangerzone excluding the
last page of the document. It should have bit us right from the start
(see aeeed411a0), but it seems that the
small period of time it takes the kernel to close the file descriptors
was hiding this bug.

Fixes #560
2023-09-28 22:50:53 +03:00
Garrett Robinson
46f978e6f0
Detect OS color mode and set as property for stylesheets
Sets the detected OS color mode (dark/light) as a property on the
QApplication so it can be referenced in stylesheets to select style
rules suited to the OS color mode.
2023-09-28 17:20:34 +03:00
deeplow
0a6b33ebed
Qubes: detect qube failing to start (missing RAM)
In Qubes OS it's often the case that the user doesn't have enough
RAM to start the conversion. In this case it raises BrokenPipeException
and exits with code 126.

It didn't seem possible to distinguish this kind of failure to one
where the user has misconfigured qrexec policies.

NOTE: this approach is not ideal UX-wise. After the first doc failing
the next one will also try and fail. Upon first failure we should
inform the user that they need to close some programs or qubes.
2023-09-28 11:08:50 +01:00
deeplow
63f03d5bcd
Add limit and test to max width and height of docs 2023-09-28 11:08:47 +01:00
deeplow
54b8ffbf96
Add page limit of 10000
Theoretically the max pages would be 65536 (2byte unsigned int.
However this limit is much higher than practical documents have
and larger ones can lead to unforseen problems, for example RAM
limitations.

We thus opted to use a lower limit of 10K. The limit must be
detected client-side, given that the server is distrusted. However
we also check it in the server, just as a fail-early mechanism.
2023-09-28 11:01:14 +01:00
Alex Pyrgiotis
18b73d94b0
qubes: Find out reason of interrupted conversions
If a conversion has been interrupted (usually due to an EOF), figure out
why this happened by checking the exit code of the spawned process.
2023-09-26 17:35:26 +03:00
Alex Pyrgiotis
30196ff35b
errors: Add error for interrupted conversions
Add an error for interrupted conversions, in order to better
differentiate this scenario from other ValueErrors that may be raised
throughout the code's lifetime.
2023-09-26 17:35:26 +03:00
Alex Pyrgiotis
0273522fb1
qubes: Store the process for the spawned qube
Store, in an instance attribute, the process that we have started for
the spawned disposable qube. In subsequent commits, we will use it from
other places as well, aside from the `_convert` method.

Note that this commit does not alter the conversion logic, and only does
the following:
1. Renames `p.` to `self.proc.`
2. Adds an `__init__` method to the Qubes isolation provider, and
   initializes the `self.proc` attribute to `None`.
3. Adds an assert that `self.proc` is not `None` after it's spawned, to
   placate Mypy.
2023-09-26 17:35:25 +03:00
deeplow
e08b6defc3
Round conversion progress from float to int
Fixes #553
2023-09-26 15:20:41 +01:00
deeplow
8d37ff15e0
Remove duplicated Qubes message: "Safe PDF Created"
Fixes #555.  This is a leftover from when we didn't have progress
reports from the second stage conversion (AKA. pixels to PDF) in #429.
2023-09-26 12:16:48 +01:00
Alex Pyrgiotis
e64d1da61f
qubes: Pass OCR parameters properly
Pass OCR parameters to conversion functions as arguments, instead of
setting environment variables.

Fixes #455
2023-09-20 18:04:40 +03:00
Alex Pyrgiotis
8a0c0a4673
Make parameter actually optional 2023-09-20 17:58:39 +03:00
Alex Pyrgiotis
20157bef58
Fix typo 2023-09-20 17:45:44 +03:00
Alex Pyrgiotis
99dd5f5139
qubes: Add client-side timeouts
Extend the client-side capabilities of the Qubes isolation provider, by
adding client-side timeout logic.

This implementation brings the same logic that we used server-side to
the client, by taking into account the original file size and the number
of pages that the server returns.

Since the code does not have the exact same insight as the server has,
the calculated timeouts are in two places:

1. The timeout for getting the number of pages. This timeout takes into
   account:
   * the disposable qube startup time, and
   * the time it takes to convert a file type to PDF
2. The total timeout for converting the PDF into pixels, in the same way
   that we do it on the server-side.

Besides these changes, we also ensure that partial reads (e.g., due to
EOF) are detected (see exact=... argument)

Some things that are not resolved in this commit are:
* We have both client-side and server-side timeouts for the first phase
  of the conversion. Once containers can stream data back to the
  application (see #443), these server-side timeouts can be removed.
* We do not show a proper error message when a timeout occurs. This will
  be part of the error handling PR (see #430)

Fixes #446
Refs #443
Refs #430
2023-09-20 17:32:42 +03:00
Alex Pyrgiotis
55a4491ced
Consolidate import statements 2023-09-20 17:14:24 +03:00
Alex Pyrgiotis
c547ffc3b4
conversion: Factor out calculate_timeout
Factor out the logic behind the calculate_timeout() method, used in
Dangerzone conversions, so that isolation providers can call it
directly.
2023-09-20 17:14:24 +03:00
Alex Pyrgiotis
fea193e935
Add non-blocking read utility
Add a function that can read data from non-blocking fds, which we will
used later on to read from standard streams with a timeout.
2023-09-20 17:14:24 +03:00
Alex Pyrgiotis
344d6f7bfa
Add Stopwatch implementation
Add a simple stopwatch implementation to track the elapsed time since an
event, or the remaining time until a timeout.
2023-09-20 17:14:23 +03:00
deeplow
94f569cdf5
Add error code for unexpected errors in conversion 2023-09-19 15:52:47 +01:00
deeplow
8e4f04a52e
Shift to conversion exit codes by 128
Distinguish from podman or other errors in called binaries by shifting
the error codes by 128.
2023-09-19 15:34:00 +01:00
deeplow
b4c3e07d36
Remove attacker-controlled error messages
Creates exceptions in the server code to be shared with the client via an
identifying exit code. These exceptions are then reconstructed in the
client.

Refs #456 but does not completely fix it. Unexpected exceptions and
progress descriptions are still passed in Containers.
2023-09-19 15:33:20 +01:00
Moon Sungjoon
214ce9720d
Enable HWP conversion on MacOS M1
This PR reverts the patch that disables HWP / HWPX conversion on MacOS
M1. It does not fix conversion on Qubes OS (#494)

Previously, HWP / HWPX conversion didn't work on MacOS M1 systems (#498)
because libreoffice wasn't built with Java support on Alpine Linux for
ARM (aarch64).

Gratefully, the Alpine team has enabled Java support on the aarch64
system [1], so we can enable it again for ARM architectures.

Fixes #498

[1]: 74d443f479
2023-09-06 13:10:18 +03:00
deeplow
8ae88eb10a
Ensure updates checkbox updated after updates accepted
Ensure the status of the toggle updates checkbox is updated, after the user is
prompted to enable updates.
2023-08-23 16:46:45 +01:00
deeplow
8221a56c7d
Revert "Propagate "update check" prompt to UI checkbox"
This reverts commit 3915a86642502b673aa0e47931823acbe66f1043.
2023-08-23 16:46:44 +01:00
deeplow
1695cc7a6c
Propagate "update check" prompt to UI checkbox
The "check for updates" button wasn't showing up immediately as checked
as soon as the user is prompted for checking updates. This fixes that.

Fixes #513
2023-08-23 16:46:33 +01:00
deeplow
9ec9cc5f87
Replace armor guards that indicate isolated output 2023-08-22 16:11:41 +01:00