Stream page data back to the caller, immediately after we read them from
pdftoppm. This way, we have more accurate progress reports and timeouts.
Fixes#557
Introduce 4 new methods that can be overloaded by the Qubes isolation
provider to stream page data/metadata back to the caller. For the time
being, these methods do what they did before, i.e., write this info in
files within the pixels directory.
Do not read a line from the command output and then check if
we are at EOF, because it's possible that the writer immediately exited
after writing the last line of output. Instead, switch the order of
actions.
This is a very serious bug that can lead to Dangerzone excluding the
last page of the document. It should have bit us right from the start
(see aeeed411a0), but it seems that the
small period of time it takes the kernel to close the file descriptors
was hiding this bug.
Fixes#560
Sets the detected OS color mode (dark/light) as a property on the
QApplication so it can be referenced in stylesheets to select style
rules suited to the OS color mode.
In Qubes OS it's often the case that the user doesn't have enough
RAM to start the conversion. In this case it raises BrokenPipeException
and exits with code 126.
It didn't seem possible to distinguish this kind of failure to one
where the user has misconfigured qrexec policies.
NOTE: this approach is not ideal UX-wise. After the first doc failing
the next one will also try and fail. Upon first failure we should
inform the user that they need to close some programs or qubes.
Theoretically the max pages would be 65536 (2byte unsigned int.
However this limit is much higher than practical documents have
and larger ones can lead to unforseen problems, for example RAM
limitations.
We thus opted to use a lower limit of 10K. The limit must be
detected client-side, given that the server is distrusted. However
we also check it in the server, just as a fail-early mechanism.
Add an error for interrupted conversions, in order to better
differentiate this scenario from other ValueErrors that may be raised
throughout the code's lifetime.
Store, in an instance attribute, the process that we have started for
the spawned disposable qube. In subsequent commits, we will use it from
other places as well, aside from the `_convert` method.
Note that this commit does not alter the conversion logic, and only does
the following:
1. Renames `p.` to `self.proc.`
2. Adds an `__init__` method to the Qubes isolation provider, and
initializes the `self.proc` attribute to `None`.
3. Adds an assert that `self.proc` is not `None` after it's spawned, to
placate Mypy.
Extend the client-side capabilities of the Qubes isolation provider, by
adding client-side timeout logic.
This implementation brings the same logic that we used server-side to
the client, by taking into account the original file size and the number
of pages that the server returns.
Since the code does not have the exact same insight as the server has,
the calculated timeouts are in two places:
1. The timeout for getting the number of pages. This timeout takes into
account:
* the disposable qube startup time, and
* the time it takes to convert a file type to PDF
2. The total timeout for converting the PDF into pixels, in the same way
that we do it on the server-side.
Besides these changes, we also ensure that partial reads (e.g., due to
EOF) are detected (see exact=... argument)
Some things that are not resolved in this commit are:
* We have both client-side and server-side timeouts for the first phase
of the conversion. Once containers can stream data back to the
application (see #443), these server-side timeouts can be removed.
* We do not show a proper error message when a timeout occurs. This will
be part of the error handling PR (see #430)
Fixes#446
Refs #443
Refs #430
Creates exceptions in the server code to be shared with the client via an
identifying exit code. These exceptions are then reconstructed in the
client.
Refs #456 but does not completely fix it. Unexpected exceptions and
progress descriptions are still passed in Containers.
This PR reverts the patch that disables HWP / HWPX conversion on MacOS
M1. It does not fix conversion on Qubes OS (#494)
Previously, HWP / HWPX conversion didn't work on MacOS M1 systems (#498)
because libreoffice wasn't built with Java support on Alpine Linux for
ARM (aarch64).
Gratefully, the Alpine team has enabled Java support on the aarch64
system [1], so we can enable it again for ARM architectures.
Fixes#498
[1]: 74d443f479
The "check for updates" button wasn't showing up immediately as checked
as soon as the user is prompted for checking updates. This fixes that.
Fixes#513
Reporting script now parses JunitXML instead of a series of
".container_log" files. The script in in changed submodule.
Additionally it makes failed tests actually fail so that this is
recorded in the JunitXML report.
Certain characters may be abused. Particularly ANSI escape codes.
Solution inspired by Qubes OS's hardening of ther RPC mechanism [1]:
> Terminal control characters are a security issue, which in worst case
> amount to arbitrary command execution. In the simplest case this
> requires two often found codes: terminal title setting (which puts
> arbitrary string in the window title) and title repo reporting (which
> puts that string on the shell's standard input. [sic]
>
> -- qvm-run.rst [2]
[1]: e005836286
[2]: c70da44702/doc/manpages/qvm-run.rst (L126)
Store the conversion log to a file (captured-output.txt) in the
container and when in development mode, have its output displayed on the
terminal output.
Use qrexec stdout to send conversion data (pixels) and stderr to send
conversion progress at the end of the conversion. This happens
regardless of whether or not the conversion is in developer mode or not.
It's the client that decides if it reads the debug data from stderr or
not. In this case, it only reads it if developer mode is enabled.
The markdown dependency uses importlib to monkeypatch 'html.parser'
[1]. Due to this approach 'html.parser' is never explicitly stated
as a dependency. This works fine in most cases, since it's part of
the python standard lib. But on Windows the build tool (CxFreeze)
ships in the .exe only the modules needed. And because html.parser
is never mentioned, it fails with an error (see issue #501).
Fixes#501
[1]: https://github.com/Python-Markdown/markdown/blob/master/markdown/htmlparser.py#L29
The HWP / HWPX conversion feature does not work on the following
platforms:
* MacOS with Apple Silicon CPU
* Native Qubes OS
For this reason, we need to:
1. Disable it on the GUI side, by not allowing the user to select these
files.
2. Throw an error on the isolation provider side, in case the user
directly attempts to convert the file (either through CLI or via
"Open With").
Refs #494
Refs #498
Sometimes, LibreOffice returns with status code 0, but in reality, it
fails. It doesn't create a file, and Dangerzone does not detect this.
What happens next is that it fails in the next command, and throws an
unrelated error.
Detect that LibreOffice fails, by checking if the output file exists,
after the PDF conversion.
Use the MIME types actually used by the `file` command, which was
recently changed for the detection of the HWPX format [1].
application/hwp+zip -> application/x-hwp+zip
But the HWPX format includes a 'mimetype' file, which contains the
MIME type string "application/hwp+zip", so that was left so because
it may be possible to detect it as "application/hwp+zip".
[1]: ceef7ead3a
HWPX MIME type is recognized as 'application/zip' with current version of file command (file-5.44).
It will be recognized as 'application/hwp+zip' when new version of file is released.
For a temporary fix, when MIME type of file is 'application/zip',
check the file type again (without the MIME option).
And then check if it's 'Zip data (MIME type "application/hwp+zip"?)' or not.
Only load the LibreOffice extension for opening hwp/hwpx when it is
actually needed. Adding an extension to libreoffice may allow for it to
run arbitrary code. This makes it trust more scalable by trusting
LibreOffice extensions only for the filetypes which they target.
Reasoning
---------
Assuming a malicious `.oxt` extension this means that the extension has
arbitrary code execution in the container. While this is not an
existential threat in itself, we should not expose every Dangerzone user
to it. This is achieved by dynamically loading the extension at runtime
only when needed.
This ensures that a compromised extension will in its least malicious
form be able to modify the visual content of any hancom office files but
not *every file*. In the more malicious version, if the code execution
manages to do a container escape, this will only affect users that have
converted a Hancom office file.
Improve the `parse_progress()` method of the container isolation
provider in the following ways:
1. Make sure that the fields of the progress report have the expected
type.
2. In case of a JSON parsing error, sanitize the invalid string so that
it doesn't contain escape sequences, or the user considers it as
trusted.
Update the common `print_progress()` method in the base
`IsolationProvider` class, with two extra features:
1. Always sanitize the provided text argument.
2. Mark the sanitized text argument as untrusted.
This is default behavior from now on, since this function is commonly
used to parse progress reports from the conversion sandbox.
Sanitize filenames in various places in the code, before we write them
to the user's terminal. Filenames, especially in Linux, can contain
virtually any character except for '\0' and '/', so it's important to
sanitize them.
Make the `error_label` widget always render messages as plain text,
instead of auto discovering if the text is rich. We need this because
the error message may contain input from the sandbox, which we consider
untrusted.
Move the "Ok" button in the prompt that asks users if they want to
enable update checks to the right, to further reinforce that this is
the default action.
Fully test the update check logic, by introducing several Qt tests.
Also, improve the `UpdaterThread.get_letest_info()` method, that gets
the latest version and changelog from GitHub, with several checks.
These checks are also tested in our newly added tests.
We want to differentiate between the user clicking on "Cancel" and
clicking on "X", since in the second case, we want to remind them again
on the next run.
Reverse the logic in Qubes to run in containers by default and only
perform the conversion with VMs when explicitly set by the env var
QUBES_CONVERSION=1. This will avoid surprises when someone installs
Dangerzone on Qubes expecting it to work out of the box just like any
other Linux.
Fixes#451
Implements the GUI logic necessary to change the selected document. When
"Change Selection" is clicked, it opens a File Dialog on the directory
of the previously selected files (if any)
Fixes#428
Change the signal type in `UpdaterThread.check_for_updates()` from
`dict` to `UpdateReport`. The `dict` parameter is stale and should have
never been used.
Add a hamburger button in the main window of Dangerzone, that will be
the entry point for update information. Whenever a new update is
released, users will see a green notification bubble. If an update error
happens, they will see a red notification bubble.
In the hamburger menu, users have the option to enable or disable update
checks. Depending on the update check status, users will see in a pop-up
dialog more info about the new update or the error.
Closes#189
Add a dialog that we will show for update-related tasks. This dialog has
a different layout than the Alert class: it has a message, followed by
a widget that the user chooses (can be a text box or collapsible
element), and then one last message.
Add a Qt widget called "CollapsibleBox", in order to build sections that
you can hide/show with a single click. There is no native widget for
this functionality, so we borrow some code from a StackOverflow user:
https://stackoverflow.com/a/52617714
Factor out some parts of the Alert class into a more generic dialog
class. This class will be used for a new type of dialog that we will
introduce in a subsequent commit.
Note that this commit does not alter the functionality of the Alert
class.
Add a new Python module called "updater", which contains the logic for
prompting the user to enable updates, and checking our GitHub releases
for new updates.
This class has some light dependency to Qt functionality, since it needs
to:
* Show a prompt to the user,
* Run update checks asynchronously in a Qt thread,
* Provide the main window with the result of the update check
Refs #189
Get the default settings of Dangezone for the current version, without
having to instantiate the Settings class. Note that instantiating the
Settings class also writes the settings to the underlying
`settings.json` file, and there are cases where we don't want this
behavior.
Add the following two features in the Settings class:
1. Add a way to save the settings, if the contents of a key have
changed.
2. Add a way to get all the updater settings, by getting fetching the
keys that start with `"updater_"`.
stdout_callback is used to flow progress information from the conversion
to some front-end. It was always used in tandem with printing to the
terminal (which is kind of a front-end). So it made sense to put them
always together.
Add an isolation provider for Qubes, that performs the document
conversion as follows:
Document to pixels phase
------------------------
1. Starts a disposable qube by calling either the dz.Convert or the
dz.ConvertDev RPC call, depending on the execution context.
2. Sends the file to disposable qube through its stdin.
* If we call the conversion from the development environment, also
pass the conversion module as a Python zipfile, before the
suspicious document.
3. Reads the number of pages, their dimensions, and the page data.
Pixels to PDF phase
-------------------
1. Writes the page data under /tmp/dangerzone, so that the
`pixels_to_pdf` module can read them.
2. Pass OCR parameters as envvars.
3. Call the `pixels_to_pdf` main function, as if it was running within a
container. Wait until the PDF gets created.
4. Move the resulting PDF to the proper directory.
Fixes#414
The "document to pixels" code assumes that the client has called it with
some mount points in which it can write files. This is true for the
container isolation provider, but not for Qubes, who can communicate
with the client only via stdin/stdout.
Add a Qubes wrapper for this code that reads the suspicious document
from stdin and writes the pages to stdout. The on-wire format is the
same as the one that TrustedPDF uses.
It seems that there are at least two Python libraries with libmagic
support:
* PyPI: python-magic (https://pypi.org/project/python-magic/)
On Fedora it's `python3-magic`
* PyPI: filemagic (https://pypi.org/project/filemagic/)
On Fedora it's `python3-file-magic`
The first package corresponds to the `py3-magic` package on Alpine
Linux, and it's the one we install in the container. The second package
uses a different API, and it's the only one we can use on Qubes.
To make matters worse, we:
* Can't install the first package on Fedora, because it installs the
second under the hood:
https://bugzilla.redhat.com/show_bug.cgi?id=1899279
* Can't install the second package on Alpine Linux (untested), due to
Musl being used instead of libC:
https://stackoverflow.com/a/53936722
Ultimately, we need to support both, by trying the first API, and on
failure using the other API.
The files in `container/` no longer make sense to have that name since
the "document to pixels" part will run in Qubes OS in its own virtual
machine.
To adapt to this, this PR does the following:
- Moves all the files in `container` to `dangerzone/conversion`
- Splits the old `container/dangerzone.py` into its two components
`dangerzone/conversion/{doc_to_pixels,pixels_to_pdf}.py` with a
`common.py` file for shared functions
- Moves the Dockerfile to the project root and adapts it to the new
container code location
- Updates the CircleCI config to properly cache Docker images.
- Updates our install scripts to properly build Docker images.
- Adds the new conversion module to the container image, so that it can
be imported as a package.
- Adapts the container isolation provider to use the new way of calling
the code.
NOTE: We have made zero changes to the conversion code in this commit,
except for necessary imports in order to factor out some common parts.
Any changes necessary for Qubes integration follow in the subsequent
commits.
Due to a bump in our Python dependencies, we now install Mypy 1.1.1
instead of 0.982. This change triggered the following errors:
* Incompatible default for argument <a> (default has type
None, argument has type <t>):
Mypy further explains here that PEP 484 prohibits implicit Optional,
so we need to make these types explicit Optional.
* Unused "type: ignore" comment, use narrower [method-assign] instead of
[assignment]:
Mypy has specialized some of its lints, meaning that we should switch
to the newer variants.
Also, it detected several other small inconsistencies. We fix all of
these errors in this commit.
When clicking on the "Choose..." button nothing would happen visually
and it would show the error:
Traceback (most recent call last):
File "/home/user/dangerzone/dangerzone/gui/main_window.py", line 614, in select_output_directory
dialog.setFileMode(QtWidgets.QFileDialog.DirectoryOnly)
According to the PySide docs, QFileDialog.DirectoryOnly has been
deprecated in Qt4.6 [1]. This was not an issue probably on PySide2
because it must have used an earlier Qt version.
Fixes#360
[1]: https://doc.qt.io/qtforpython-5/PySide2/QtWidgets/QFileDialog.html#PySide2.QtWidgets.PySide2.QtWidgets.QFileDialog.FileMode
Provide a fallback for QRegularExpressionValidator specifically for
Ubuntu Focal, because it's not present in PySide2 5.14. Instead,
fallback to QRegExpValidator if it doesn't exist.
Fixes#339
Copy input files in a temporary dir before mounting them, thereby
changing their permissions, without affecting the original files. This
way, we can avoid cases where a file is accessible to the user only due
to a supplemental user group, which does not work for containers.
Fixes#157Fixes#260Fixes#335
Take SELinux labels into account when mounting a file to the Dangerzone
container. Use the `:Z` flag (which is a no-op in non-SELinux systems)
to clear the existing SELinux label for a file, and apply one that
matches the container's.
Refs #335
Do not leave stale temporary directories when conversion fails
unexpectedly. Instead, wrap the conversion operation in a context
manager that wipes the temporary dir afterwards.
Fixes#317
Do not store temporary directories in the Dangerzone's config directory.
There are two reasons for that:
1. They are ephemeral, and they need a temporary place to be stored,
preferably RAM-backed.
2. We need to set them while running our CI tests.
Allow users to disable timeouts via the CLI, with the
`--disable-timeouts` argument. By default, the timeouts are always
enabled.
This option applies both to the CLI version of Dangerzone, and the GUI
one. For the latter, the user must start the GUI from their CLI (i.e.,
`dangerzone --disable-timeouts ...`)