Find all references to the `container.tar.gz` file, and replace them
with references to `container.tar`. Moreover, remove the `--no-save`
argument of `build-image.py` since we now always save the image.
Finally, fix some stale references to Poetry, which are not necessary
anymore.
Add the following two methods in the isolation provider:
1. `.is_available()`: Mainly used for the Container isolation provider,
it specifies whether the container runtime is up and running. May be
used in the future by other similar providers.
2. `.should_wait_install()`: Whether the isolation provider takes a
while to be installed. Should be `True` only for the Container
isolation provider, for the time being.
Do not close stderr as part of the Qubes termination logic, since we
need to read the debug logs. This shouldn't affect typical termination
scenarios, since we expect our disposable qube to be either busy reading
from stdin, or writing to stdout. If this is not the case, then
forcefully killing the `qrexec-client-vm` process should unblock the
qube.
Start the conversion process in a new session, so that we can later on
kill the process group, without killing the controlling script (i.e.,
the Dangezone UI). This should not affect the conversion process in any
other way.
A few minor changes about when to use `==` and when to use `is`.
Basically, this uses `is` for booleans, and `==` for other values.
With a few other changes about coding style which was enforced by
`ruff`.
This commit removes code that's not being used, it can be exceptions
with the `as e` where the exception itself is not used, the same with
`with` statements, and some other parts where there were duplicated
code.
Extend the IsolationProvider class with a
`terminate_doc_to_pixels_proc()` method, which must be implemented by
the Qubes/Container providers and gracefully terminate a process started
for the doc to pixels phase.
Refs #563
Pass the Document instance that will be converted to the
`IsolationProvider.start_doc_to_pixels_proc()` method. Concrete classes
can then associate this name with the started process, so that they can
later on kill it.
On Qubes the conversion in dev mode would fail when converting from a
Fedora 38 development qube via a Fedora 39 disposable qube. The reason
was that dz.ConvertDev was receiving `.pyc` files, which were compiled
for python 3.11 but running on python 3.12.
Unfortunately PyZipFile objects cannot send source python files, even
though the documentation is a little bit unclear on this [1].
Fixes#723
[1]: https://docs.python.org/3/library/zipfile.html#pyzipfile-objects
Remove timeouts due to several reasons:
1. Lost purpose: after implementing the containers page streaming the
only subprocess we have left is LibreOffice. So don't have such a
big risk of commands hanging (the original reason for timeouts).
2. Little benefit: predicting execution time is generically unsolvable
computer science problem. Ultimately we were guessing an arbitrary
time based on the number of pages and the document size. As a guess
we made it pretty lax (30s per page or MB). A document hanging for
this long will probably lead to user frustration in any case and the
user may be compelled to abort the conversion.
3. Technical Challenges with non-blocking timeout: there have been
several technical challenges in keeping timeouts that we've made effort
to accommodate. A significant one was having to do non-blocking read to
ensure we could timeout when reading conversion stream (and then used
here)
Fixes#687
If we increased the number of parallel conversions, we'd run into an
issue where the streams were getting mixed together. This was because
the Converter.proc was a single attribute. This breaks it down into a
local variable such that this mixup doesn't happen.
Now that only the second container can send JSON-encoded progress
information, we can the untrusted JSON parsing. The parse_progress was
also renamed to `parse_progress_trusted` to ensure future developers
don't mistake this as a safe method.
The old methods for sending untrusted JSON were repurposed to send the
progress instead to stderr for troubleshooting in development mode.
Fixes#456
Merge Qubes and Containers isolation providers core code into the class
parent IsolationProviders abstract class.
This is done by streaming pages in containers for exclusively in first
conversion process. The commit is rather large due to the multiple
interdependencies of the code, making it difficult to split into various
commits.
The main conversion method (_convert) now in the superclass simply calls
two methods:
- doc_to_pixels()
- pixels_to_pdf()
Critically, doc_to_pixels is implemented in the superclass, diverging
only in a specialized method called "start_doc_to_pixels_proc()". This
method obtains the process responsible that communicates with the
isolation provider (container / disp VM) via `podman/docker` and qrexec
on Containers and Qubes respectively.
Known regressions:
- progress reports stopped working on containers
Fixes#443
Create a temporary dir before the conversion begins, and store every
file necessary for the conversion there. We are mostly concerned about
the second stage of the conversion, which runs in the host. The first
stage runs in a disposable qube and cleanup is implicit.
Fixes#575Fixes#436
If a command encounters an error or times out during the second stage of
the conversion in Qubes, handle it the same way as we would have handled
it in the first stage:
1. Get its error message.
2. Throw an UnexpectedConversionError exception, with the original
message.
Note that, because the second stage takes place locally, users will see
the original content of the error.
Refs #567Closes#430
In Qubes OS it's often the case that the user doesn't have enough
RAM to start the conversion. In this case it raises BrokenPipeException
and exits with code 126.
It didn't seem possible to distinguish this kind of failure to one
where the user has misconfigured qrexec policies.
NOTE: this approach is not ideal UX-wise. After the first doc failing
the next one will also try and fail. Upon first failure we should
inform the user that they need to close some programs or qubes.
Theoretically the max pages would be 65536 (2byte unsigned int.
However this limit is much higher than practical documents have
and larger ones can lead to unforseen problems, for example RAM
limitations.
We thus opted to use a lower limit of 10K. The limit must be
detected client-side, given that the server is distrusted. However
we also check it in the server, just as a fail-early mechanism.
Add an error for interrupted conversions, in order to better
differentiate this scenario from other ValueErrors that may be raised
throughout the code's lifetime.
Store, in an instance attribute, the process that we have started for
the spawned disposable qube. In subsequent commits, we will use it from
other places as well, aside from the `_convert` method.
Note that this commit does not alter the conversion logic, and only does
the following:
1. Renames `p.` to `self.proc.`
2. Adds an `__init__` method to the Qubes isolation provider, and
initializes the `self.proc` attribute to `None`.
3. Adds an assert that `self.proc` is not `None` after it's spawned, to
placate Mypy.
Extend the client-side capabilities of the Qubes isolation provider, by
adding client-side timeout logic.
This implementation brings the same logic that we used server-side to
the client, by taking into account the original file size and the number
of pages that the server returns.
Since the code does not have the exact same insight as the server has,
the calculated timeouts are in two places:
1. The timeout for getting the number of pages. This timeout takes into
account:
* the disposable qube startup time, and
* the time it takes to convert a file type to PDF
2. The total timeout for converting the PDF into pixels, in the same way
that we do it on the server-side.
Besides these changes, we also ensure that partial reads (e.g., due to
EOF) are detected (see exact=... argument)
Some things that are not resolved in this commit are:
* We have both client-side and server-side timeouts for the first phase
of the conversion. Once containers can stream data back to the
application (see #443), these server-side timeouts can be removed.
* We do not show a proper error message when a timeout occurs. This will
be part of the error handling PR (see #430)
Fixes#446
Refs #443
Refs #430
Creates exceptions in the server code to be shared with the client via an
identifying exit code. These exceptions are then reconstructed in the
client.
Refs #456 but does not completely fix it. Unexpected exceptions and
progress descriptions are still passed in Containers.
Certain characters may be abused. Particularly ANSI escape codes.
Solution inspired by Qubes OS's hardening of ther RPC mechanism [1]:
> Terminal control characters are a security issue, which in worst case
> amount to arbitrary command execution. In the simplest case this
> requires two often found codes: terminal title setting (which puts
> arbitrary string in the window title) and title repo reporting (which
> puts that string on the shell's standard input. [sic]
>
> -- qvm-run.rst [2]
[1]: e005836286
[2]: c70da44702/doc/manpages/qvm-run.rst (L126)
Store the conversion log to a file (captured-output.txt) in the
container and when in development mode, have its output displayed on the
terminal output.
Use qrexec stdout to send conversion data (pixels) and stderr to send
conversion progress at the end of the conversion. This happens
regardless of whether or not the conversion is in developer mode or not.
It's the client that decides if it reads the debug data from stderr or
not. In this case, it only reads it if developer mode is enabled.
Reverse the logic in Qubes to run in containers by default and only
perform the conversion with VMs when explicitly set by the env var
QUBES_CONVERSION=1. This will avoid surprises when someone installs
Dangerzone on Qubes expecting it to work out of the box just like any
other Linux.
Fixes#451
stdout_callback is used to flow progress information from the conversion
to some front-end. It was always used in tandem with printing to the
terminal (which is kind of a front-end). So it made sense to put them
always together.
Add an isolation provider for Qubes, that performs the document
conversion as follows:
Document to pixels phase
------------------------
1. Starts a disposable qube by calling either the dz.Convert or the
dz.ConvertDev RPC call, depending on the execution context.
2. Sends the file to disposable qube through its stdin.
* If we call the conversion from the development environment, also
pass the conversion module as a Python zipfile, before the
suspicious document.
3. Reads the number of pages, their dimensions, and the page data.
Pixels to PDF phase
-------------------
1. Writes the page data under /tmp/dangerzone, so that the
`pixels_to_pdf` module can read them.
2. Pass OCR parameters as envvars.
3. Call the `pixels_to_pdf` main function, as if it was running within a
container. Wait until the PDF gets created.
4. Move the resulting PDF to the proper directory.
Fixes#414