Commit graph

1078 commits

Author SHA1 Message Date
Alex Pyrgiotis
ccf4132ea0
conversion: Add sanity check for page count
Add a sanity check at the end of the conversion from doc to pixels, to
ensure that the resulting document will have the same number of pages as
the original one.

Refs #560
2023-09-28 22:50:54 +03:00
Alex Pyrgiotis
b4e5cf5be7
qubes: Stream page data in real time
Stream page data back to the caller, immediately after we read them from
pdftoppm. This way, we have more accurate progress reports and timeouts.

Fixes #557
2023-09-28 22:50:54 +03:00
Alex Pyrgiotis
4bb959f220
conversion: Add anchor points for streaming page data/metadata
Introduce 4 new methods that can be overloaded by the Qubes isolation
provider to stream page data/metadata back to the caller. For the time
being, these methods do what they did before, i.e., write this info in
files within the pixels directory.
2023-09-28 22:50:53 +03:00
Alex Pyrgiotis
6012cd1491
Improve EOF detection when reading command output
Do not read a line from the command output and then check if
we are at EOF, because it's possible that the writer immediately exited
after writing the last line of output. Instead, switch the order of
actions.

This is a very serious bug that can lead to Dangerzone excluding the
last page of the document. It should have bit us right from the start
(see aeeed411a0), but it seems that the
small period of time it takes the kernel to close the file descriptors
was hiding this bug.

Fixes #560
2023-09-28 22:50:53 +03:00
Garrett Robinson
79c1d6db0f
Use extend_skip to avoid overriding isort's skip default
This preserves isort's default behavior of ignoring virtualenvs with
common names like `venv` or `.venv`, which is helpful when running
`isort` in a local development environment that uses such a
virtualenv.
2023-09-28 17:21:00 +03:00
Garrett Robinson
eab768f950
Style safe_extension_filename consistently in Dark Mode
To be consistent with Light Mode, the background of the
safe_extension_filename QLabel should match the adjacent QTextField,
but the text should be "grayed out"/disabled to indicate that it's not
supposed to be editable.
2023-09-28 17:20:54 +03:00
Garrett Robinson
40b6240097
Only set certain colors in light mode 2023-09-28 17:20:50 +03:00
Garrett Robinson
46f978e6f0
Detect OS color mode and set as property for stylesheets
Sets the detected OS color mode (dark/light) as a property on the
QApplication so it can be referenced in stylesheets to select style
rules suited to the OS color mode.
2023-09-28 17:20:34 +03:00
deeplow
23bee23d81
Disable isolation_provider tests on dummy conversion
Windows and macOS in CI (which don't support nested virtualization)
and thus Docker aren't really candidates for isolation_provider tests.
2023-09-28 11:08:53 +01:00
deeplow
0a6b33ebed
Qubes: detect qube failing to start (missing RAM)
In Qubes OS it's often the case that the user doesn't have enough
RAM to start the conversion. In this case it raises BrokenPipeException
and exits with code 126.

It didn't seem possible to distinguish this kind of failure to one
where the user has misconfigured qrexec policies.

NOTE: this approach is not ideal UX-wise. After the first doc failing
the next one will also try and fail. Upon first failure we should
inform the user that they need to close some programs or qubes.
2023-09-28 11:08:50 +01:00
deeplow
63f03d5bcd
Add limit and test to max width and height of docs 2023-09-28 11:08:47 +01:00
deeplow
6f26fc6303
Qubes: add test if MAX_PAGES is enforced in client
Because the server also checks the MAX_PAGES limit, the test in base
would hide the fact that the client is not enforcing the limit. This
ensures that's not the case.

When the pages in containers are streamed (#443), then this test should
be in base.py.
2023-09-28 11:06:36 +01:00
deeplow
54b8ffbf96
Add page limit of 10000
Theoretically the max pages would be 65536 (2byte unsigned int.
However this limit is much higher than practical documents have
and larger ones can lead to unforseen problems, for example RAM
limitations.

We thus opted to use a lower limit of 10K. The limit must be
detected client-side, given that the server is distrusted. However
we also check it in the server, just as a fail-early mechanism.
2023-09-28 11:01:14 +01:00
deeplow
afba362d22
Tests: split isolation provider tests per provider
Isolation provider tests done in tests/test_base.py and had
pytest.mark.parameterize() for each isolation provider. This logic
would not work well when we had test that diverge. We could have marked
each one as compatible with one provider or another, but in the end it
turned out to be better to have the common ones in a base class and
the divergent ones in each.

NOTE: this has a strange side-effect: inherited test classes need to
have imports for all of the fixtures even if they are not explictly used
2023-09-28 09:53:29 +01:00
Alex Pyrgiotis
18b73d94b0
qubes: Find out reason of interrupted conversions
If a conversion has been interrupted (usually due to an EOF), figure out
why this happened by checking the exit code of the spawned process.
2023-09-26 17:35:26 +03:00
Alex Pyrgiotis
30196ff35b
errors: Add error for interrupted conversions
Add an error for interrupted conversions, in order to better
differentiate this scenario from other ValueErrors that may be raised
throughout the code's lifetime.
2023-09-26 17:35:26 +03:00
Alex Pyrgiotis
0273522fb1
qubes: Store the process for the spawned qube
Store, in an instance attribute, the process that we have started for
the spawned disposable qube. In subsequent commits, we will use it from
other places as well, aside from the `_convert` method.

Note that this commit does not alter the conversion logic, and only does
the following:
1. Renames `p.` to `self.proc.`
2. Adds an `__init__` method to the Qubes isolation provider, and
   initializes the `self.proc` attribute to `None`.
3. Adds an assert that `self.proc` is not `None` after it's spawned, to
   placate Mypy.
2023-09-26 17:35:25 +03:00
deeplow
e08b6defc3
Round conversion progress from float to int
Fixes #553
2023-09-26 15:20:41 +01:00
deeplow
8d37ff15e0
Remove duplicated Qubes message: "Safe PDF Created"
Fixes #555.  This is a leftover from when we didn't have progress
reports from the second stage conversion (AKA. pixels to PDF) in #429.
2023-09-26 12:16:48 +01:00
Alex Pyrgiotis
a67c080898
Add changelog entry for Qubes beta integration 2023-09-25 12:51:41 +03:00
Alex Pyrgiotis
af7087af65
Update our release/QA instructions for Qubes
Update the release/QA instructions for Qubes, so that they take into
account the fact that we can now publish a Qubes RPM through our
official repos.
2023-09-25 12:51:41 +03:00
Alex Pyrgiotis
c94c8c8ba5
Add installation instructions for Qubes
Add instructions for installing Dangerzone on Qubes from our official
repos. These instructions are adapted from the build instructions, but
have been greatly simplified because we don't need some of the qubes
that the development environment needs.

Closes #431
2023-09-25 12:51:40 +03:00
Alex Pyrgiotis
22a58d83df
install: Add Tesseract models as package reqs
Add Tesseract models for the 10 most spoken languages as package
requirements for Qubes. For containers, this problem is already solved
since we install all Tesseract models.

If a user is not covered by the installed models, they can install
extras on their own. We will add a note for this in subsequent commits.

Refs #431
2023-09-25 12:51:40 +03:00
Alex Pyrgiotis
215fa8b558
install: Add conflict if Dangerzone is installed
Add a "Conflicts:" entry in the RPM spec, in case another version of
Dangerzone is already installed.
2023-09-25 12:49:58 +03:00
Alex Pyrgiotis
81b4a8deb5
Minor fixes in Fedora installation section 2023-09-25 12:49:58 +03:00
Alex Pyrgiotis
cbca9110ca
Switch to tessdata-fast Tesseract model
Switch to the tessdata-fast Tesseract model, instead of the tessdata
one. The tessdata-fast Tesseract model is much smaller, and a bit faster
than the other one. Also, it's the model that Debian/Fedora ship by
default.

Closes #545
2023-09-25 12:48:05 +03:00
Alex Pyrgiotis
e64d1da61f
qubes: Pass OCR parameters properly
Pass OCR parameters to conversion functions as arguments, instead of
setting environment variables.

Fixes #455
2023-09-20 18:04:40 +03:00
Alex Pyrgiotis
8a0c0a4673
Make parameter actually optional 2023-09-20 17:58:39 +03:00
Alex Pyrgiotis
20157bef58
Fix typo 2023-09-20 17:45:44 +03:00
Alex Pyrgiotis
99dd5f5139
qubes: Add client-side timeouts
Extend the client-side capabilities of the Qubes isolation provider, by
adding client-side timeout logic.

This implementation brings the same logic that we used server-side to
the client, by taking into account the original file size and the number
of pages that the server returns.

Since the code does not have the exact same insight as the server has,
the calculated timeouts are in two places:

1. The timeout for getting the number of pages. This timeout takes into
   account:
   * the disposable qube startup time, and
   * the time it takes to convert a file type to PDF
2. The total timeout for converting the PDF into pixels, in the same way
   that we do it on the server-side.

Besides these changes, we also ensure that partial reads (e.g., due to
EOF) are detected (see exact=... argument)

Some things that are not resolved in this commit are:
* We have both client-side and server-side timeouts for the first phase
  of the conversion. Once containers can stream data back to the
  application (see #443), these server-side timeouts can be removed.
* We do not show a proper error message when a timeout occurs. This will
  be part of the error handling PR (see #430)

Fixes #446
Refs #443
Refs #430
2023-09-20 17:32:42 +03:00
Alex Pyrgiotis
55a4491ced
Consolidate import statements 2023-09-20 17:14:24 +03:00
Alex Pyrgiotis
c547ffc3b4
conversion: Factor out calculate_timeout
Factor out the logic behind the calculate_timeout() method, used in
Dangerzone conversions, so that isolation providers can call it
directly.
2023-09-20 17:14:24 +03:00
Alex Pyrgiotis
fea193e935
Add non-blocking read utility
Add a function that can read data from non-blocking fds, which we will
used later on to read from standard streams with a timeout.
2023-09-20 17:14:24 +03:00
Alex Pyrgiotis
344d6f7bfa
Add Stopwatch implementation
Add a simple stopwatch implementation to track the elapsed time since an
event, or the remaining time until a timeout.
2023-09-20 17:14:23 +03:00
Alex Pyrgiotis
fbe13bb114
Refer to Qubes in the project's description 2023-09-20 16:48:53 +03:00
Alex Pyrgiotis
a3bb740b19
Remove some stale Qubes refs in setup.py 2023-09-20 16:48:53 +03:00
Alex Pyrgiotis
01d63e4eda
install: Build Dangerzone RPMs using our SPEC file
Replace the deprecated `bdist_rpm` method of creating RPMs for
Dangerzone. Instead, update our `install/linux/build-rpm.py` script, to
build Dangerzone RPMs using our SPEC file under
`install/linux/dangerzone.spec`. The script now essentially creates a
source distribution (sdist) using `poetry build`, and then uses
`rpmbuild` to create binary and source RPMs.

Fixes #298
2023-09-20 16:48:53 +03:00
Alex Pyrgiotis
6cc2a953ff
install: Add directory for building Dangerzone RPMs
Add an `rpm-build` directory under `install/linux`, which will be used
for building Dangerzone RPMs. For the time being, it only has a
.gitignore file there, but in the future, invoking
`install/linux/build-rpm.py` will populate it.
2023-09-20 16:48:53 +03:00
Alex Pyrgiotis
f5abe0abd0
Update RPM dependencies
Update the dependencies required to build RPM packages. More
specifically, remove the older python3-setuptools dependency, and depend
instead on python3-devel and python3-poetry-core.

Note that this commit may break our CI, but it will be resolved in
subsequent commits.
2023-09-20 16:48:53 +03:00
Alex Pyrgiotis
33197f26b7
install: Introduce a SPEC file for creating RPMs
Introduce a SPEC file that can be used to create an RPM from a Python
source distribution. Some notable features of this SPEC file follow:

1. We can use this SPEC file to create both regular RPM packages and
   ones targeted for Qubes.
2. It has a post installation script that removes stale .egg-info
   directories, which previously caused issues to our users.
3. It automatically creates a changelog from our Git logs, which differs
   from the actual CHANGELOG.md.
4. It folloes the latest Fedora guidelines (as of writing this) for
   packaging Python projects.

Fixes #514
2023-09-20 16:48:52 +03:00
Alex Pyrgiotis
3dea16bcd2
Include non-Python data files into Python package
Update our pyproject.toml file to include some non-Python data files,
e.g., our container image and assets. This way, we can use `poetry
build` to create a source distribution / Python wheel from our source
repository.

Note that this list of data files is already defined in our `setup.py`
script. In that script, one can find some extra goodies:

1. We can conditionally include data files in our Python package. We use
   this to include Qubes data only in our Qubes packages.
2. We can specify where will the data files be installed in the end-user
   system.

The above are non-goals for Poetry [1], especially (2), because modern
Python wheels are not supposed to install files in arbitrary places
within the user's host, nor should the install invocation use sudo.
Instead, this is a task that's better suited for the .deb / .rpm
packages.

So, why do we bother updating our `pyproject.toml` and not use
`setup.py` instead? Because `setup.py` is deprecated [2,3], and the
latest Python packaging RFCs [4], as well as most recent Fedora
guidelines [5] use `pyproject.toml` as the source of truth, instead of
`setup.py`.

In subsequent commits, we will also use just `pyproject.toml` for RPM
packaging.

[1]: https://github.com/python-poetry/poetry/issues/890
[2]: https://peps.python.org/pep-0517/#source-trees
[3]: https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html
[4]: https://peps.python.org/pep-0517/
[5]: https://docs.fedoraproject.org/en-US/packaging-guidelines/Python/
2023-09-20 16:38:55 +03:00
Alex Pyrgiotis
5431e059bf
Update build-system entry in pyproject.toml
Update the `build-backend` attribute, in accordance with the Python
Poetry docs [1]. Also, bump the minimum required poetry-core version to
1.2.0, since this is the version that introduced the Poetry dependency
groups [2], i.e., the [tool.poetry.group] sections in pyproject.toml.

[1]: https://python-poetry.org/docs/pyproject/#poetry-and-pep-517
[2]: https://python-poetry.org/docs/managing-dependencies/#dependency-groups
2023-09-20 16:38:55 +03:00
Alex Pyrgiotis
b83d2495eb
Remove stale dangerzone-container entrypoint
The dangerzone-container entrypoint, as specified in pyproject.toml, is
stale, for the following reasons:

1. It's not mentioned in the setup.py script, so it was never included
   in our Linux distributions.
2. The code in `dangerzone.__init__.py` that decides if it will invoke
   the GUI or CLI backend, just takes `dangerzone-cli` into account for
   this decision, and does not mention dangerzone-container anywhere.
2023-09-20 16:38:55 +03:00
Alex Pyrgiotis
7bc0129f94
Let black and isort respect .gitignore
In order to let isort respect .gitignore, we need to specify this in the
tool.isort entry, in pyproject.toml.

For black, we don't need any extra tweaks. This is weird, since until a
few months ago black did not respect .gitignore. Maybe something has
changed in the meantime but if not, we should revert this change.
2023-09-20 16:38:55 +03:00
Alex Pyrgiotis
29c0181b4d
Add test_docs_large in our .gitignore 2023-09-20 16:38:54 +03:00
deeplow
94f569cdf5
Add error code for unexpected errors in conversion 2023-09-19 15:52:47 +01:00
deeplow
8e4f04a52e
Shift to conversion exit codes by 128
Distinguish from podman or other errors in called binaries by shifting
the error codes by 128.
2023-09-19 15:34:00 +01:00
deeplow
b4c3e07d36
Remove attacker-controlled error messages
Creates exceptions in the server code to be shared with the client via an
identifying exit code. These exceptions are then reconstructed in the
client.

Refs #456 but does not completely fix it. Unexpected exceptions and
progress descriptions are still passed in Containers.
2023-09-19 15:33:20 +01:00
Moon Sungjoon
214ce9720d
Enable HWP conversion on MacOS M1
This PR reverts the patch that disables HWP / HWPX conversion on MacOS
M1. It does not fix conversion on Qubes OS (#494)

Previously, HWP / HWPX conversion didn't work on MacOS M1 systems (#498)
because libreoffice wasn't built with Java support on Alpine Linux for
ARM (aarch64).

Gratefully, the Alpine team has enabled Java support on the aarch64
system [1], so we can enable it again for ARM architectures.

Fixes #498

[1]: 74d443f479
2023-09-06 13:10:18 +03:00
Moon Sungjoon
acd615e0e1
Switch to the edge repo of Alpine Linux
The Alpine Linux team has enabled Java support for LibreOffice on ARM
architecture:

    74d443f479

This commit is included in 7.5.5.2-r2, so the installed LibreOffice
package should be 7.5.5.2-r2 or higher to fix this issue.

However 3.18 doesn't have the 7.5.5.2-r2 package:

    https://pkgs.alpinelinux.org/package/v3.18/community/aarch64/libreoffice

The Dangerzone image uses the alpine:latest image which is 3.18 as of
writing this.

For this reason, we switch to the edge repo of Alpine Linux, which
includes this fix.

Refs #498
Refs #540
Refs #542
2023-09-06 13:09:34 +03:00