Extend the client-side capabilities of the Qubes isolation provider, by
adding client-side timeout logic.
This implementation brings the same logic that we used server-side to
the client, by taking into account the original file size and the number
of pages that the server returns.
Since the code does not have the exact same insight as the server has,
the calculated timeouts are in two places:
1. The timeout for getting the number of pages. This timeout takes into
account:
* the disposable qube startup time, and
* the time it takes to convert a file type to PDF
2. The total timeout for converting the PDF into pixels, in the same way
that we do it on the server-side.
Besides these changes, we also ensure that partial reads (e.g., due to
EOF) are detected (see exact=... argument)
Some things that are not resolved in this commit are:
* We have both client-side and server-side timeouts for the first phase
of the conversion. Once containers can stream data back to the
application (see #443), these server-side timeouts can be removed.
* We do not show a proper error message when a timeout occurs. This will
be part of the error handling PR (see #430)
Fixes#446
Refs #443
Refs #430
Creates exceptions in the server code to be shared with the client via an
identifying exit code. These exceptions are then reconstructed in the
client.
Refs #456 but does not completely fix it. Unexpected exceptions and
progress descriptions are still passed in Containers.
This PR reverts the patch that disables HWP / HWPX conversion on MacOS
M1. It does not fix conversion on Qubes OS (#494)
Previously, HWP / HWPX conversion didn't work on MacOS M1 systems (#498)
because libreoffice wasn't built with Java support on Alpine Linux for
ARM (aarch64).
Gratefully, the Alpine team has enabled Java support on the aarch64
system [1], so we can enable it again for ARM architectures.
Fixes#498
[1]: 74d443f479
The "check for updates" button wasn't showing up immediately as checked
as soon as the user is prompted for checking updates. This fixes that.
Fixes#513
Reporting script now parses JunitXML instead of a series of
".container_log" files. The script in in changed submodule.
Additionally it makes failed tests actually fail so that this is
recorded in the JunitXML report.
Certain characters may be abused. Particularly ANSI escape codes.
Solution inspired by Qubes OS's hardening of ther RPC mechanism [1]:
> Terminal control characters are a security issue, which in worst case
> amount to arbitrary command execution. In the simplest case this
> requires two often found codes: terminal title setting (which puts
> arbitrary string in the window title) and title repo reporting (which
> puts that string on the shell's standard input. [sic]
>
> -- qvm-run.rst [2]
[1]: e005836286
[2]: c70da44702/doc/manpages/qvm-run.rst (L126)
Store the conversion log to a file (captured-output.txt) in the
container and when in development mode, have its output displayed on the
terminal output.
Use qrexec stdout to send conversion data (pixels) and stderr to send
conversion progress at the end of the conversion. This happens
regardless of whether or not the conversion is in developer mode or not.
It's the client that decides if it reads the debug data from stderr or
not. In this case, it only reads it if developer mode is enabled.
The markdown dependency uses importlib to monkeypatch 'html.parser'
[1]. Due to this approach 'html.parser' is never explicitly stated
as a dependency. This works fine in most cases, since it's part of
the python standard lib. But on Windows the build tool (CxFreeze)
ships in the .exe only the modules needed. And because html.parser
is never mentioned, it fails with an error (see issue #501).
Fixes#501
[1]: https://github.com/Python-Markdown/markdown/blob/master/markdown/htmlparser.py#L29
The HWP / HWPX conversion feature does not work on the following
platforms:
* MacOS with Apple Silicon CPU
* Native Qubes OS
For this reason, we need to:
1. Disable it on the GUI side, by not allowing the user to select these
files.
2. Throw an error on the isolation provider side, in case the user
directly attempts to convert the file (either through CLI or via
"Open With").
Refs #494
Refs #498
Sometimes, LibreOffice returns with status code 0, but in reality, it
fails. It doesn't create a file, and Dangerzone does not detect this.
What happens next is that it fails in the next command, and throws an
unrelated error.
Detect that LibreOffice fails, by checking if the output file exists,
after the PDF conversion.
Use the MIME types actually used by the `file` command, which was
recently changed for the detection of the HWPX format [1].
application/hwp+zip -> application/x-hwp+zip
But the HWPX format includes a 'mimetype' file, which contains the
MIME type string "application/hwp+zip", so that was left so because
it may be possible to detect it as "application/hwp+zip".
[1]: ceef7ead3a
HWPX MIME type is recognized as 'application/zip' with current version of file command (file-5.44).
It will be recognized as 'application/hwp+zip' when new version of file is released.
For a temporary fix, when MIME type of file is 'application/zip',
check the file type again (without the MIME option).
And then check if it's 'Zip data (MIME type "application/hwp+zip"?)' or not.
Only load the LibreOffice extension for opening hwp/hwpx when it is
actually needed. Adding an extension to libreoffice may allow for it to
run arbitrary code. This makes it trust more scalable by trusting
LibreOffice extensions only for the filetypes which they target.
Reasoning
---------
Assuming a malicious `.oxt` extension this means that the extension has
arbitrary code execution in the container. While this is not an
existential threat in itself, we should not expose every Dangerzone user
to it. This is achieved by dynamically loading the extension at runtime
only when needed.
This ensures that a compromised extension will in its least malicious
form be able to modify the visual content of any hancom office files but
not *every file*. In the more malicious version, if the code execution
manages to do a container escape, this will only affect users that have
converted a Hancom office file.
Improve the `parse_progress()` method of the container isolation
provider in the following ways:
1. Make sure that the fields of the progress report have the expected
type.
2. In case of a JSON parsing error, sanitize the invalid string so that
it doesn't contain escape sequences, or the user considers it as
trusted.
Update the common `print_progress()` method in the base
`IsolationProvider` class, with two extra features:
1. Always sanitize the provided text argument.
2. Mark the sanitized text argument as untrusted.
This is default behavior from now on, since this function is commonly
used to parse progress reports from the conversion sandbox.
Sanitize filenames in various places in the code, before we write them
to the user's terminal. Filenames, especially in Linux, can contain
virtually any character except for '\0' and '/', so it's important to
sanitize them.
Make the `error_label` widget always render messages as plain text,
instead of auto discovering if the text is rich. We need this because
the error message may contain input from the sandbox, which we consider
untrusted.
Move the "Ok" button in the prompt that asks users if they want to
enable update checks to the right, to further reinforce that this is
the default action.
Fully test the update check logic, by introducing several Qt tests.
Also, improve the `UpdaterThread.get_letest_info()` method, that gets
the latest version and changelog from GitHub, with several checks.
These checks are also tested in our newly added tests.
We want to differentiate between the user clicking on "Cancel" and
clicking on "X", since in the second case, we want to remind them again
on the next run.
Reverse the logic in Qubes to run in containers by default and only
perform the conversion with VMs when explicitly set by the env var
QUBES_CONVERSION=1. This will avoid surprises when someone installs
Dangerzone on Qubes expecting it to work out of the box just like any
other Linux.
Fixes#451
Implements the GUI logic necessary to change the selected document. When
"Change Selection" is clicked, it opens a File Dialog on the directory
of the previously selected files (if any)
Fixes#428
Change the signal type in `UpdaterThread.check_for_updates()` from
`dict` to `UpdateReport`. The `dict` parameter is stale and should have
never been used.
Add a hamburger button in the main window of Dangerzone, that will be
the entry point for update information. Whenever a new update is
released, users will see a green notification bubble. If an update error
happens, they will see a red notification bubble.
In the hamburger menu, users have the option to enable or disable update
checks. Depending on the update check status, users will see in a pop-up
dialog more info about the new update or the error.
Closes#189
Add a dialog that we will show for update-related tasks. This dialog has
a different layout than the Alert class: it has a message, followed by
a widget that the user chooses (can be a text box or collapsible
element), and then one last message.
Add a Qt widget called "CollapsibleBox", in order to build sections that
you can hide/show with a single click. There is no native widget for
this functionality, so we borrow some code from a StackOverflow user:
https://stackoverflow.com/a/52617714
Factor out some parts of the Alert class into a more generic dialog
class. This class will be used for a new type of dialog that we will
introduce in a subsequent commit.
Note that this commit does not alter the functionality of the Alert
class.
Add a new Python module called "updater", which contains the logic for
prompting the user to enable updates, and checking our GitHub releases
for new updates.
This class has some light dependency to Qt functionality, since it needs
to:
* Show a prompt to the user,
* Run update checks asynchronously in a Qt thread,
* Provide the main window with the result of the update check
Refs #189
Get the default settings of Dangezone for the current version, without
having to instantiate the Settings class. Note that instantiating the
Settings class also writes the settings to the underlying
`settings.json` file, and there are cases where we don't want this
behavior.
Add the following two features in the Settings class:
1. Add a way to save the settings, if the contents of a key have
changed.
2. Add a way to get all the updater settings, by getting fetching the
keys that start with `"updater_"`.
stdout_callback is used to flow progress information from the conversion
to some front-end. It was always used in tandem with printing to the
terminal (which is kind of a front-end). So it made sense to put them
always together.
Add an isolation provider for Qubes, that performs the document
conversion as follows:
Document to pixels phase
------------------------
1. Starts a disposable qube by calling either the dz.Convert or the
dz.ConvertDev RPC call, depending on the execution context.
2. Sends the file to disposable qube through its stdin.
* If we call the conversion from the development environment, also
pass the conversion module as a Python zipfile, before the
suspicious document.
3. Reads the number of pages, their dimensions, and the page data.
Pixels to PDF phase
-------------------
1. Writes the page data under /tmp/dangerzone, so that the
`pixels_to_pdf` module can read them.
2. Pass OCR parameters as envvars.
3. Call the `pixels_to_pdf` main function, as if it was running within a
container. Wait until the PDF gets created.
4. Move the resulting PDF to the proper directory.
Fixes#414
The "document to pixels" code assumes that the client has called it with
some mount points in which it can write files. This is true for the
container isolation provider, but not for Qubes, who can communicate
with the client only via stdin/stdout.
Add a Qubes wrapper for this code that reads the suspicious document
from stdin and writes the pages to stdout. The on-wire format is the
same as the one that TrustedPDF uses.
It seems that there are at least two Python libraries with libmagic
support:
* PyPI: python-magic (https://pypi.org/project/python-magic/)
On Fedora it's `python3-magic`
* PyPI: filemagic (https://pypi.org/project/filemagic/)
On Fedora it's `python3-file-magic`
The first package corresponds to the `py3-magic` package on Alpine
Linux, and it's the one we install in the container. The second package
uses a different API, and it's the only one we can use on Qubes.
To make matters worse, we:
* Can't install the first package on Fedora, because it installs the
second under the hood:
https://bugzilla.redhat.com/show_bug.cgi?id=1899279
* Can't install the second package on Alpine Linux (untested), due to
Musl being used instead of libC:
https://stackoverflow.com/a/53936722
Ultimately, we need to support both, by trying the first API, and on
failure using the other API.
The files in `container/` no longer make sense to have that name since
the "document to pixels" part will run in Qubes OS in its own virtual
machine.
To adapt to this, this PR does the following:
- Moves all the files in `container` to `dangerzone/conversion`
- Splits the old `container/dangerzone.py` into its two components
`dangerzone/conversion/{doc_to_pixels,pixels_to_pdf}.py` with a
`common.py` file for shared functions
- Moves the Dockerfile to the project root and adapts it to the new
container code location
- Updates the CircleCI config to properly cache Docker images.
- Updates our install scripts to properly build Docker images.
- Adds the new conversion module to the container image, so that it can
be imported as a package.
- Adapts the container isolation provider to use the new way of calling
the code.
NOTE: We have made zero changes to the conversion code in this commit,
except for necessary imports in order to factor out some common parts.
Any changes necessary for Qubes integration follow in the subsequent
commits.
Due to a bump in our Python dependencies, we now install Mypy 1.1.1
instead of 0.982. This change triggered the following errors:
* Incompatible default for argument <a> (default has type
None, argument has type <t>):
Mypy further explains here that PEP 484 prohibits implicit Optional,
so we need to make these types explicit Optional.
* Unused "type: ignore" comment, use narrower [method-assign] instead of
[assignment]:
Mypy has specialized some of its lints, meaning that we should switch
to the newer variants.
Also, it detected several other small inconsistencies. We fix all of
these errors in this commit.
When clicking on the "Choose..." button nothing would happen visually
and it would show the error:
Traceback (most recent call last):
File "/home/user/dangerzone/dangerzone/gui/main_window.py", line 614, in select_output_directory
dialog.setFileMode(QtWidgets.QFileDialog.DirectoryOnly)
According to the PySide docs, QFileDialog.DirectoryOnly has been
deprecated in Qt4.6 [1]. This was not an issue probably on PySide2
because it must have used an earlier Qt version.
Fixes#360
[1]: https://doc.qt.io/qtforpython-5/PySide2/QtWidgets/QFileDialog.html#PySide2.QtWidgets.PySide2.QtWidgets.QFileDialog.FileMode
Provide a fallback for QRegularExpressionValidator specifically for
Ubuntu Focal, because it's not present in PySide2 5.14. Instead,
fallback to QRegExpValidator if it doesn't exist.
Fixes#339
Copy input files in a temporary dir before mounting them, thereby
changing their permissions, without affecting the original files. This
way, we can avoid cases where a file is accessible to the user only due
to a supplemental user group, which does not work for containers.
Fixes#157Fixes#260Fixes#335
Take SELinux labels into account when mounting a file to the Dangerzone
container. Use the `:Z` flag (which is a no-op in non-SELinux systems)
to clear the existing SELinux label for a file, and apply one that
matches the container's.
Refs #335
Do not leave stale temporary directories when conversion fails
unexpectedly. Instead, wrap the conversion operation in a context
manager that wipes the temporary dir afterwards.
Fixes#317
Do not store temporary directories in the Dangerzone's config directory.
There are two reasons for that:
1. They are ephemeral, and they need a temporary place to be stored,
preferably RAM-backed.
2. We need to set them while running our CI tests.
Allow users to disable timeouts via the CLI, with the
`--disable-timeouts` argument. By default, the timeouts are always
enabled.
This option applies both to the CLI version of Dangerzone, and the GUI
one. For the latter, the user must start the GUI from their CLI (i.e.,
`dangerzone --disable-timeouts ...`)
We're not yet adding them to Linux, since PySide6 is not yet available
in Linux distros' packages, whereas with Linux and macOS our packaging
process includes the shipped binaries.
Fixes#211
Replace PySide2-stubs with types-PySide2, both of which are projects
that provide PySide2 typing hints, for the following reasons:
1. types-PySide2 is more complete and allows us to ditch some 'type:
ignore' comments for Mypy.
2. PySide2-stubs also brings PySide2 as a dependency, which cannot be
installed in MacOS M1 machines.
Refs #177
Exceptions raised during the document conversion process would be
silently hidden. This was because ThreadPoolExecuter in logic.py created
various contexts and hid any exceptions raised.
Fixes#309
When enabled, the conversion part does nothing but print some simulated
output. This can be useful for testing non-conversion code (e.g. GUI).
Activated with the hidden flag --unsafe-dummy-conversion.
All isolation providers will some similar steps when convert() is
called. For this reason, all the common parts are captured in convert()
and then each isolation provider implements its own specific conversion
process in _convert() (which is called from the convert() method).
Encapsulate container logic into an implementation of
AbstractIsolationProvider. This flexibility will allow for other types
of isolation managers, such as a Dummy one.
In non-development mode, the CLI shows the user information via the INFO
log level. The message is shown directly without [INFO] as a prefix.
Otherwise it would quickly get annoying to the user seeing [INFO] on
every line of a CLI application.
However, if an error happens it's important for the user to recognize
it's an error or a warning. This commit prints the log level in these
cases.
Now that safe PDFs can open on Windows right after conversion
(implemented in commit 5b2fefd), we need to save/load the "Open safe
documents after converting" setting.
Allowing this would lead to several UI edge-cases related to where the
files would be saved. Avoiding this is the easiest solution at the
moment.
In the future we should consider other options.
It was possible that users would add duplicate documents via 'open with
Dangerzone'. This would lead to unexpected situations and preventing it
both in the CLI and the GUI solves those issues.
Handle the case where a user has already added some documents (either
through 'open with' or via Dangerzone 'select documents' button) and
then they want to add some more via the 'open_with' dialog.
It updates the settings to reflect the newly added documents and blocks
the user from adding them if a conversion is already in progress.
Makes the ContentWidget a choke-point, where we can allow or prevent
adding more documents and where we can ensure that newly selected
documents are added immediately to the DangerzoneGui class.
Logically, the application flow should not change in any way.
Closing windows on macOS would not actually close Dangerzone. Now that
it is a single-window program, it makes sense for it to close
immediately.
Fixes#271
The save group box would get partially trimmed when running in macOS
this appears to be due to differences in rendering fonts and widget
sizes.
Refs #270
If a user has provided an output filename for a document, then we should
no longer accept suffixes. The reason is that we can't do something
meaningful with it, as we can't alter the provided output filename.
The proper behavior is to reject this action with an exception. Note
that this acts more of a safeguard, since (currently) there is no path
where a user may add a suffix to a document that already has an output
filename.
In preparation for adding a limit on how many convert threads exist, we
are simplifying its logic. Getting ocr_lang doesn't seem to belong to
the thread.
Aligns document labels following the design specified in issue #117.
It did not specify how it would change with window resize, so it
currently expands the progress bar / error message width and keeps the
document name fixed in size.
Mypy complains about a line being unreachable. This is probably a false
positive. It must assume the code is not using a framework and thus it
can't when a PySide 'connect()' is being called.
Since everything now happens in a single window, there is no need
to have a way to keep track other windows. They simply won't exist.
But on windows and Linux it will still be possible to start
multiple windows by starting various Dangerzone processes. On MacOS
this doesn't seem to be as easy from the launcher, but it should
not be critical as multiple documents can be converted at the same
time in the one window.
To help debugging and visualizing what was happening, we set all
widgets to be visible at the same time. Now that is no longer needed,
we can hide them.
This keeps the original program flow:
1. select the documents
2. set the settings
3. see the conversion progress
This diverges from the proposed design in issue #117 for simplification
and consistency (with past program flow) purposes.
The user is supposed to only be able to select the safe PDF extension.
In a multi-file scenario, the extension will be the same for all files.
We follow here the design document [1]. To achieve this, we needed a
QLabel right next to a QLineEdit, to give the user the illusion that
it is the same graphical object.
[1]: https://github.com/firstlookmedia/dangerzone/files/6657536/DangerZone_NA02a.pdf
Avoid conversion issues when saving the output file when it is set
wrongly. Inform the user with a red box saying "must end in .pdf"
and prevent the user from clicking "convert" before that is fixed.
Combines the validation logic with the already-existing 'update_ui()'
Set the default directory for saving the file as the one from
the first document. This one will show just the directory name.
If the user changes it by choosing another directory, then show the new
directory name and its full path.
- shows settings again
- removes documents arg from settings widget - this is now stored
under DangerzoneGui instance.
- removes widget 'dangerous_doc_label' - the doc label is already
shown next to each document.
- 'Save as' button now serves the purpose of selecting where all
output files should be saved. Before, it was for selecting where
the file would be saved.
- 'save_lineedit' widget which was read-only and showed the path
where the file would be saved, it now called 'safe_suffix' and is
writable. It is where the user can type the safe file extension
(e.g. '-safe.pdf'). Validation is not yet implemented.
- when 'start_button' is clicked it now changes the output_filename
of all the documents to set their output directory to the one the
user has selected (if 'save_checkbox' enabled) and to set their
new 'safe_suffix'
- change to plural text for selection of multiple documents
These will be needed in for the GUI's settings. This also adds test
cases for these documents. The methods are the following:
- set_output_dir()
For changing the output directory of the safe file
- suffix setter and getter - for changing the suffix of the file
Allows the user to:
a) specify filenames via the terminal (for the GUI)
b) select multiple documents via the GUI
The conversion process can't yet be started since the settings are
broken and disabled (expect mypy complaints).
Allow opening multiple documents at the same time from the terminal
by calling
$ dangerzone document1.pdf document2.pdf
It will open each document in its own window, making use of the
already existing 'multi-document multi-window' parallel conversion
implementation.
plistlib:
- originaly added in commit 3be1d63330
- no longer needed
grp, getpass:
- originally added in commit ae7c919d8e
- used for finding the 'docker' executable. No longer needed.
Checking if files were writeable created files in the process. In the
case where someone adds a list of N files to dangerzone but exits before
converting, they would be left with N 0-byte files for the -safe
version. Now they don't.
Fixes#214
All filename-related exceptions were of class DocumentFilenameException.
This made it difficult to disambiguate them. Specializing them makes it
it easier for tests to detect which exception in particular we want to
verify.
The document's state update is better update in the convert() function.
This is because this function is always called for the conversion
progress regardless of the frontend.
The container output logging logic was in both the CLI and the GUI.
This change moves the core parsing logic to container.py.
Since the code was largely the same, now cli does need to specify
a stdout_callback since all the necessary logging already happens.
The GUI now only adds an stdout_callback to detect if there was an
error during the conversion process.
Initial parallel document conversion: creates a pool of N threads
defined by the setting 'parallel_conversions'. Each thread calls
convert() on a document.
Wildcard arguments like `*` can lead to security vulnerabilities
if files are maliciously named as would-be parameters. In the following
scenario if a file in the current directory was named '--help', running
the following command would show the help.
$ dangerzone-cli *
By checking if parameters also happen to be files, we mitigate this
risk and have a chance to warn the user.
Rename the `gui.common` module and `gui.common.GuiCommon` class
to `gui.logic` and `gui.logic.DangerzoneGui` respectively. We keep as is
the original names of the variables that hold instances of this class,
since they will change in subsequent commits.
This change is part of the initial refactor to make the DangerzoneGui
class handle the GUI logic of the Dangerzone project.
Rename the `global_common` module and `global_common.GlobalCommon` class
to `logic` and `logic.Dangerzone` respectively. Also rename variables
that hold instances of this class.
This change is part of the initial refactor to make the Dangerzone class
handle the core logic of the Dangerzone project.
Let the Document class suggest the default filename for the safe PDF,
based on the provided input filename, appended with the extension
`-safe.pdf`.
Previously, this logic was copy-pasted throughout the code, which made
it difficult to maintain.
Make the Document class always resolve relative input/output file paths,
which are usually passed as arguments by users.
Previously, resolving relative filepaths was a job left to the
instantiators of the Document class. This was error-prone since this
conversion must happen in all the places where we instantiated the
Document class.
Implement Click's callback interface and create validators for the
input/output filenames, using the logic from the Document class. This
way, we can catch user errors as early as possible.
Factor out the filename validation logic and move it into the Document
class. Previously, the filename validation logic was scattered across
the CLI and GUI code.
Also, introduce a new errors.py module whose purpose is to handle
document-related errors, by providing:
* A special exception for them (DocumentFilenameExcpetion)
* A decorator that handles DocumentFilenameException, logs it and the
underlying cause, and exits the program gracefully.
Avoid setting document's filename via document.filename and instead
do it via object instantiation where possible.
Incidentally this has to change some window logic. When
select_document() is called it no longer checks if there is already an
open window with no document selected yet. The user can open as many
windows with unselected documents as they want.
Rename the `common` module and `common.Common` class to `document` and
`document.Document` respectively. Also, rename the variables that hold
instances of this class.
This change reflects the fact that the class is responsible for tracking
the state of the document. When we add bulk document conversion,
allowing us to keep track of a document's state will be key. This name
change is a step towards that.
Container-related methods recently moved to container.py no longer
need to have 'container' in their name as they are within the
container scope already.
Additonally it made it awkward to call from another module:
from .. import container
container.get_container_runtime()
The logic for detecting if we were are running on docker or podman
and identifying its respective binary were scattered across the
codebase. This centralizes it all in container.py
- display_banner() was only displayed in CLI mode so it makes sense
for it to be in the CLI.
- get_version(), was mvoed to util since it is a static function
that is needed in multiple parts of the application.
static methods that are used application-wide should belong to
the utilities python file.
inspired by @gmarmstrong's PR #166 on refactoring global_common
methods to be static and have a dzutil.py
Input_filename and output_filename could be None or Str. This lead
to typing issues where the static analysis type hint tool could not
check that the type colisions would not happen in runtime.
So the logic was replaced by throwing a runtime exception if either
of these valiables is ever used without first having been set.