Commit graph

955 commits

Author SHA1 Message Date
Alex Pyrgiotis
215fa8b558
install: Add conflict if Dangerzone is installed
Add a "Conflicts:" entry in the RPM spec, in case another version of
Dangerzone is already installed.
2023-09-25 12:49:58 +03:00
Alex Pyrgiotis
81b4a8deb5
Minor fixes in Fedora installation section 2023-09-25 12:49:58 +03:00
Alex Pyrgiotis
cbca9110ca
Switch to tessdata-fast Tesseract model
Switch to the tessdata-fast Tesseract model, instead of the tessdata
one. The tessdata-fast Tesseract model is much smaller, and a bit faster
than the other one. Also, it's the model that Debian/Fedora ship by
default.

Closes #545
2023-09-25 12:48:05 +03:00
Alex Pyrgiotis
e64d1da61f
qubes: Pass OCR parameters properly
Pass OCR parameters to conversion functions as arguments, instead of
setting environment variables.

Fixes #455
2023-09-20 18:04:40 +03:00
Alex Pyrgiotis
8a0c0a4673
Make parameter actually optional 2023-09-20 17:58:39 +03:00
Alex Pyrgiotis
20157bef58
Fix typo 2023-09-20 17:45:44 +03:00
Alex Pyrgiotis
99dd5f5139
qubes: Add client-side timeouts
Extend the client-side capabilities of the Qubes isolation provider, by
adding client-side timeout logic.

This implementation brings the same logic that we used server-side to
the client, by taking into account the original file size and the number
of pages that the server returns.

Since the code does not have the exact same insight as the server has,
the calculated timeouts are in two places:

1. The timeout for getting the number of pages. This timeout takes into
   account:
   * the disposable qube startup time, and
   * the time it takes to convert a file type to PDF
2. The total timeout for converting the PDF into pixels, in the same way
   that we do it on the server-side.

Besides these changes, we also ensure that partial reads (e.g., due to
EOF) are detected (see exact=... argument)

Some things that are not resolved in this commit are:
* We have both client-side and server-side timeouts for the first phase
  of the conversion. Once containers can stream data back to the
  application (see #443), these server-side timeouts can be removed.
* We do not show a proper error message when a timeout occurs. This will
  be part of the error handling PR (see #430)

Fixes #446
Refs #443
Refs #430
2023-09-20 17:32:42 +03:00
Alex Pyrgiotis
55a4491ced
Consolidate import statements 2023-09-20 17:14:24 +03:00
Alex Pyrgiotis
c547ffc3b4
conversion: Factor out calculate_timeout
Factor out the logic behind the calculate_timeout() method, used in
Dangerzone conversions, so that isolation providers can call it
directly.
2023-09-20 17:14:24 +03:00
Alex Pyrgiotis
fea193e935
Add non-blocking read utility
Add a function that can read data from non-blocking fds, which we will
used later on to read from standard streams with a timeout.
2023-09-20 17:14:24 +03:00
Alex Pyrgiotis
344d6f7bfa
Add Stopwatch implementation
Add a simple stopwatch implementation to track the elapsed time since an
event, or the remaining time until a timeout.
2023-09-20 17:14:23 +03:00
Alex Pyrgiotis
fbe13bb114
Refer to Qubes in the project's description 2023-09-20 16:48:53 +03:00
Alex Pyrgiotis
a3bb740b19
Remove some stale Qubes refs in setup.py 2023-09-20 16:48:53 +03:00
Alex Pyrgiotis
01d63e4eda
install: Build Dangerzone RPMs using our SPEC file
Replace the deprecated `bdist_rpm` method of creating RPMs for
Dangerzone. Instead, update our `install/linux/build-rpm.py` script, to
build Dangerzone RPMs using our SPEC file under
`install/linux/dangerzone.spec`. The script now essentially creates a
source distribution (sdist) using `poetry build`, and then uses
`rpmbuild` to create binary and source RPMs.

Fixes #298
2023-09-20 16:48:53 +03:00
Alex Pyrgiotis
6cc2a953ff
install: Add directory for building Dangerzone RPMs
Add an `rpm-build` directory under `install/linux`, which will be used
for building Dangerzone RPMs. For the time being, it only has a
.gitignore file there, but in the future, invoking
`install/linux/build-rpm.py` will populate it.
2023-09-20 16:48:53 +03:00
Alex Pyrgiotis
f5abe0abd0
Update RPM dependencies
Update the dependencies required to build RPM packages. More
specifically, remove the older python3-setuptools dependency, and depend
instead on python3-devel and python3-poetry-core.

Note that this commit may break our CI, but it will be resolved in
subsequent commits.
2023-09-20 16:48:53 +03:00
Alex Pyrgiotis
33197f26b7
install: Introduce a SPEC file for creating RPMs
Introduce a SPEC file that can be used to create an RPM from a Python
source distribution. Some notable features of this SPEC file follow:

1. We can use this SPEC file to create both regular RPM packages and
   ones targeted for Qubes.
2. It has a post installation script that removes stale .egg-info
   directories, which previously caused issues to our users.
3. It automatically creates a changelog from our Git logs, which differs
   from the actual CHANGELOG.md.
4. It folloes the latest Fedora guidelines (as of writing this) for
   packaging Python projects.

Fixes #514
2023-09-20 16:48:52 +03:00
Alex Pyrgiotis
3dea16bcd2
Include non-Python data files into Python package
Update our pyproject.toml file to include some non-Python data files,
e.g., our container image and assets. This way, we can use `poetry
build` to create a source distribution / Python wheel from our source
repository.

Note that this list of data files is already defined in our `setup.py`
script. In that script, one can find some extra goodies:

1. We can conditionally include data files in our Python package. We use
   this to include Qubes data only in our Qubes packages.
2. We can specify where will the data files be installed in the end-user
   system.

The above are non-goals for Poetry [1], especially (2), because modern
Python wheels are not supposed to install files in arbitrary places
within the user's host, nor should the install invocation use sudo.
Instead, this is a task that's better suited for the .deb / .rpm
packages.

So, why do we bother updating our `pyproject.toml` and not use
`setup.py` instead? Because `setup.py` is deprecated [2,3], and the
latest Python packaging RFCs [4], as well as most recent Fedora
guidelines [5] use `pyproject.toml` as the source of truth, instead of
`setup.py`.

In subsequent commits, we will also use just `pyproject.toml` for RPM
packaging.

[1]: https://github.com/python-poetry/poetry/issues/890
[2]: https://peps.python.org/pep-0517/#source-trees
[3]: https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html
[4]: https://peps.python.org/pep-0517/
[5]: https://docs.fedoraproject.org/en-US/packaging-guidelines/Python/
2023-09-20 16:38:55 +03:00
Alex Pyrgiotis
5431e059bf
Update build-system entry in pyproject.toml
Update the `build-backend` attribute, in accordance with the Python
Poetry docs [1]. Also, bump the minimum required poetry-core version to
1.2.0, since this is the version that introduced the Poetry dependency
groups [2], i.e., the [tool.poetry.group] sections in pyproject.toml.

[1]: https://python-poetry.org/docs/pyproject/#poetry-and-pep-517
[2]: https://python-poetry.org/docs/managing-dependencies/#dependency-groups
2023-09-20 16:38:55 +03:00
Alex Pyrgiotis
b83d2495eb
Remove stale dangerzone-container entrypoint
The dangerzone-container entrypoint, as specified in pyproject.toml, is
stale, for the following reasons:

1. It's not mentioned in the setup.py script, so it was never included
   in our Linux distributions.
2. The code in `dangerzone.__init__.py` that decides if it will invoke
   the GUI or CLI backend, just takes `dangerzone-cli` into account for
   this decision, and does not mention dangerzone-container anywhere.
2023-09-20 16:38:55 +03:00
Alex Pyrgiotis
7bc0129f94
Let black and isort respect .gitignore
In order to let isort respect .gitignore, we need to specify this in the
tool.isort entry, in pyproject.toml.

For black, we don't need any extra tweaks. This is weird, since until a
few months ago black did not respect .gitignore. Maybe something has
changed in the meantime but if not, we should revert this change.
2023-09-20 16:38:55 +03:00
Alex Pyrgiotis
29c0181b4d
Add test_docs_large in our .gitignore 2023-09-20 16:38:54 +03:00
deeplow
94f569cdf5
Add error code for unexpected errors in conversion 2023-09-19 15:52:47 +01:00
deeplow
8e4f04a52e
Shift to conversion exit codes by 128
Distinguish from podman or other errors in called binaries by shifting
the error codes by 128.
2023-09-19 15:34:00 +01:00
deeplow
b4c3e07d36
Remove attacker-controlled error messages
Creates exceptions in the server code to be shared with the client via an
identifying exit code. These exceptions are then reconstructed in the
client.

Refs #456 but does not completely fix it. Unexpected exceptions and
progress descriptions are still passed in Containers.
2023-09-19 15:33:20 +01:00
Moon Sungjoon
214ce9720d
Enable HWP conversion on MacOS M1
This PR reverts the patch that disables HWP / HWPX conversion on MacOS
M1. It does not fix conversion on Qubes OS (#494)

Previously, HWP / HWPX conversion didn't work on MacOS M1 systems (#498)
because libreoffice wasn't built with Java support on Alpine Linux for
ARM (aarch64).

Gratefully, the Alpine team has enabled Java support on the aarch64
system [1], so we can enable it again for ARM architectures.

Fixes #498

[1]: 74d443f479
2023-09-06 13:10:18 +03:00
Moon Sungjoon
acd615e0e1
Switch to the edge repo of Alpine Linux
The Alpine Linux team has enabled Java support for LibreOffice on ARM
architecture:

    74d443f479

This commit is included in 7.5.5.2-r2, so the installed LibreOffice
package should be 7.5.5.2-r2 or higher to fix this issue.

However 3.18 doesn't have the 7.5.5.2-r2 package:

    https://pkgs.alpinelinux.org/package/v3.18/community/aarch64/libreoffice

The Dangerzone image uses the alpine:latest image which is 3.18 as of
writing this.

For this reason, we switch to the edge repo of Alpine Linux, which
includes this fix.

Refs #498
Refs #540
Refs #542
2023-09-06 13:09:34 +03:00
deeplow
ed298ec5b0
BUILD.md fix typo: dz-dvm is not a template 2023-08-29 19:29:43 +01:00
deeplow
ab3293ff70
BUILD.md replace deprecated cmd qvm-copy-to-vm
qvm-copy-to-vm since a long time doesn't respect the qube name
provided. Instead it is enforced by the dom0 policy prompt. This is
probably a leftover from a command ran in dom0, where this command
actually works.
2023-08-29 19:29:41 +01:00
deeplow
688bfe056b
BUILD.md: cd into dangerzone/ after cloning 2023-08-29 19:29:31 +01:00
deeplow
831c3250c2
Add overview table of qubes 2023-08-29 19:20:36 +01:00
deeplow
4f2de90f93
Add overview table of qubes 2023-08-24 14:50:53 +01:00
deeplow
c3cdca977f
Qubes alpha: bump fedora version (37 -> 38) 2023-08-24 14:42:54 +01:00
deeplow
8ae88eb10a
Ensure updates checkbox updated after updates accepted
Ensure the status of the toggle updates checkbox is updated, after the user is
prompted to enable updates.
2023-08-23 16:46:45 +01:00
deeplow
8221a56c7d
Revert "Propagate "update check" prompt to UI checkbox"
This reverts commit 3915a86642502b673aa0e47931823acbe66f1043.
2023-08-23 16:46:44 +01:00
deeplow
1695cc7a6c
Propagate "update check" prompt to UI checkbox
The "check for updates" button wasn't showing up immediately as checked
as soon as the user is prompted for checking updates. This fixes that.

Fixes #513
2023-08-23 16:46:33 +01:00
deeplow
89365b585c
Add tests documentation 2023-08-22 16:11:44 +01:00
deeplow
9ec9cc5f87
Replace armor guards that indicate isolated output 2023-08-22 16:11:41 +01:00
deeplow
a0bcd12635
Large test run: hide traceback to avoid spam
Some tests are expected to fail. To avoid having potentially thousands
of tracebacks of the failed docs at the end, we're deactivating that
reporting.
2023-08-22 16:11:39 +01:00
deeplow
fa215063ee
Add logging for second container 2023-08-22 16:11:38 +01:00
deeplow
75369cf621
Adapt code so it works for reporting script
Reporting script now parses JunitXML instead of a series of
".container_log" files. The script in in changed submodule.

Additionally it makes failed tests actually fail so that this is
recorded in the JunitXML report.
2023-08-22 16:11:36 +01:00
deeplow
eb16285790
Replace container output command prefix ">>>"
In the junitxml this prefix would look ugly ("&gt&gt&gt") because it has
to escape any non-xml tags.
2023-08-22 16:11:35 +01:00
deeplow
48b2e7bc3c
Log command to debug log for traceback purposes
Log commands so we can trace back which errors / outputs are from each
command.
2023-08-22 16:11:34 +01:00
deeplow
b73ce5bf6a
Add large test logic and documentation
Adds a large pool of document that can and should be used prior to a
release to understand effects of the new release over a real-world
scenario.

Documents are stored in an external git LFS repo under
`tests/test_docs_large` and currently it's about 11K documents gathered
from multiple PDF readers and office suite's test sets.

Documentation on how to run the tests is under
`docs/developer/TESTING.md`
2023-08-22 16:11:31 +01:00
deeplow
f41cefde1d
Add "armor" around conversion log
Add GPG-styled "armor" around conversion logs

    -----CONVERSION LOG START-----
    Creator:         Writer
    Producer:        LibreOffice 6.4
    [...]
    -----CONVERSION LOG END-----
2023-08-22 16:11:28 +01:00
deeplow
9f1abe2836
Replace non-printable ascii in conversion log
Certain characters may be abused. Particularly ANSI escape codes.
Solution inspired by Qubes OS's hardening of ther RPC mechanism [1]:

> Terminal control characters are a security issue, which in worst case
> amount to arbitrary command execution. In the simplest case this
> requires two often found codes: terminal title setting (which puts
> arbitrary string in the window title) and title repo reporting (which
> puts that string on the shell's standard input. [sic]
>
>  -- qvm-run.rst [2]

[1]: e005836286
[2]: c70da44702/doc/manpages/qvm-run.rst (L126)
2023-08-22 16:11:27 +01:00
deeplow
95cef8cf0a
Containers: capture conversion logs
Store the conversion log to a file (captured-output.txt) in the
container and when in development mode, have its output displayed on the
terminal output.
2023-08-22 16:11:26 +01:00
deeplow
e2accc2da1
Ignore large tests when doing "make test" 2023-08-22 16:11:24 +01:00
deeplow
d6bce4dec5
Qubes: close qrexec stdin and stout
Ensure a server cannon keep the client hannging if more data than
necessary is sent. This applies to container and the Qubes
implmentation.
2023-08-22 16:11:23 +01:00
deeplow
874b8865e2
Qubes: strategy for capturing conversion logs
Use qrexec stdout to send conversion data (pixels) and stderr to send
conversion progress at the end of the conversion. This happens
regardless of whether or not the conversion is in developer mode or not.

It's the client that decides if it reads the debug data from stderr or
not. In this case, it only reads it if developer mode is enabled.
2023-08-22 16:11:20 +01:00