Commit graph

912 commits

Author SHA1 Message Date
Alex Pyrgiotis
1f308e9cc5
Reformat code with Black 23
Due to a bump in our Python dependencies, we now install Black 23
instead of 22, which detects some of our files as badly formatted.
2023-03-27 15:17:23 +03:00
Alex Pyrgiotis
b102b2bd49
Update Poetry lock file
Run `poetry lock` and allow updating the existing dependencies. This
fixes a CI regression that was introduced by Poetry 1.4.1, which added
stricter Python wheels validation

Fixes #376
2023-03-27 15:15:26 +03:00
Alex Pyrgiotis
7613941e1f
ci: Do not deploy to PackageCloud
Pave the way for deploying .deb and .rpm packages to
packages.freedom.press. Remove the code that deploys to PackageCloud
once we tag a commit with `v<semver>`.

Refs #291
2023-03-27 13:41:08 +03:00
Alex Pyrgiotis
8a7d52b471
Update Changelog for 0.4.1 2023-03-27 12:32:36 +03:00
deeplow
bc50917362
Sort OCR languages when loading them from json
Because now the ocr-languages.json is sorted by tesseract language arg
name, we'll want to sort the languages the user sees alphabetically.
2023-03-16 14:23:31 +00:00
deeplow
58332fdd6e
tesseract: add new lanaguages and others
Tagalo was replaced with filipino [1] in newer tesseract versions, so it
doesn't make sense for us to use the new name and map it to the old
"tgl" name (Tagalo) under the hood.

Language names obtained from tesseract's man page [2].

[1]: 58f7a72f00
[2]: https://github.com/tesseract-ocr/tesseract/blob/main/doc/tesseract.1.asc
2023-03-16 14:23:30 +00:00
deeplow
d8d83ff036
Remove languages not supported
When the ocr languages list was originally introduced (commit b527776),
the container was running in a ubuntu 18.04 [1]. Later it changed to
alpine linux. Unfortunately it has less languages than in ubuntu.

This commit removes those languages. Fixes #355

[1]: b527776e28 (diff-ec032b25a6c2af24eaf4128c85090c5ce0dcbab64e64eace10be9f4e4683a71bR1)
2023-03-16 14:23:28 +00:00
deeplow
66d3c40163
Sort OCR languages by tesseract arg name
Make it easier to compare the list of languages with the output of
`tesseract --list-langs`.
2023-03-16 14:23:25 +00:00
Alex Pyrgiotis
d768099912
Grab just the image ID
When building the image, grab the image id using `-q`, which removes all
the decorations in the output and just keeps the image ID.
2023-03-09 19:04:59 +02:00
Alex Pyrgiotis
a33dcfbb51
Replace First Look Media references
Update several references to First Look Media in the code, to better
reflect the current status, where Freedom of the Press Foundation has
taken over the stewardship of the project.

Fixes #343
2023-03-08 18:40:55 +02:00
Alex Pyrgiotis
330766665d
Update instructions in qa.py 2023-03-08 17:56:25 +02:00
Alex Pyrgiotis
b74258b6d2
Remove stale QA requirement
Remove a stale QA requirement for running the tests manually in the rest
of our Linux distros. Our CI jobs take care of that, so we don't need to
do it.
2023-03-08 17:40:26 +02:00
Alex Pyrgiotis
4668443be6
install: Use the full image tag
Use the full image tag (dangerzone.rocks/dangerzone:latest) when
building the image. Else, we risk creating a `share/image-id.txt` file
with multiple IDs in it, if we have another
`dangerzone.rocks/dangerzone` image (with a different tag) in our dev
environment.
2023-03-08 17:40:26 +02:00
Alex Pyrgiotis
c719fc4f54
Update our MacOS QA instructions
Update our QA instructions for ARM-based MacOS systems. The main change
in 0.4.1 is that we can build an ARM container image for Dangerzone,
which is different from Intel Macs. So, we need to build and test it
during release.
2023-03-08 17:40:26 +02:00
Alex Pyrgiotis
5a0c4d0a03
Bump timeouts
Perform the following timeout bumps:

1. Increase the minimum timeout per page/MiB by x3. The rationale is that
   10 seconds is a reasonable timeout, but to be on the safe side, it's
   best if we multiply it by a safety factor.
2. Increase the minimum timeout from 10 seconds to 60 seconds. 10
   seconds may be too little if the application runtime (e.g.,
   LibreOffice) is slow to start due to background CPU thrashing.
2023-03-08 17:38:59 +02:00
Alex Pyrgiotis
a2049349b1
ci: Add missing CI tests for Ubuntu Focal / Debian Bullseye 2023-03-08 17:36:42 +02:00
Alex Pyrgiotis
b32f215c7c
dev_scripts: Handle alt name for Ubuntu Focal 2023-03-08 17:36:42 +02:00
Alex Pyrgiotis
aaecfdb63e
dev_scripts: Immitate mkdir -p when creating state dirs
The first time we run the env.py script, we may not have the necessary
dirs under envs. It's best to create them with `parents=True`.
2023-03-08 17:36:42 +02:00
Alex Pyrgiotis
96d8cdef94
Suggest users to install Poetry via pipx
Replace the command to install Poetry globally via `pip` in our build
instructions, with a command that installs Poetry under ~/.local/bin
via `pipx`. The rationale is the same as in the previous commit, i.e.,
PEP 668 does not allow it.

Note that in this case, we don't have any CI restrictions, so we could
use the official installer instead. However, for security reasons, we
prefer suggesting `pipx` to the users, and of course give them a list of
alternatives.

Note that for Windows and MacOS we leave the command as is, until we
figure out how PEP 668 applies in there.
2023-03-08 17:36:42 +02:00
Alex Pyrgiotis
7310977343
dev_scripts: Install Poetry via pipx
We can no longer install Poetry via `pip`, since Debian Bookworm now
enforces PEP 668, meaning that both `pip install poetry` and `pip
install --user poetry` cannot work [1]. Since we use the same
installation steps for all of our dev environments, we need to find a
common way to install Poetry.

Poetry's website provides several ways to install Poetry [2]. Moreover,
it also has a special section with CI recommendations [3]. In this
section, it strongly suggests to install Poetry via `pipx`, instead of
the installer script that you download from the Internet.

Follow Poetry's suggestion to install it via `pipx` in CI environments,
with one minor change. Do not use `pipx ensurepath`, as that will
affect the `.bashrc` of the dev environment, which at some point in the
future may be mounted by the dev. Instead, set a PATH environment
variable that includes `~/.local/bin`.

[1]: https://github.com/freedomofpress/dangerzone/issues/351
[2]: https://python-poetry.org/docs/#installation
[3]: https://python-poetry.org/docs/#ci-recommendations

Fixes #351
2023-03-08 17:36:42 +02:00
Alex Pyrgiotis
7979dbd653
ci: Install Poetry via APT on Debian Bookworm
We no longer need to install Poetry via PyPI, since the upstream Debian
issues have been fixed. Moreover, PEP 668 [1] is now enforced in Debian
Bookworm, so we can't install Poetry globally via `pip` in any case.

For these reasons, prefer installing Poetry via APT.

[1]: https://peps.python.org/pep-0668/

Refs #351
2023-03-08 17:23:06 +02:00
deeplow
e840c7a18c
Fix "Choose..." dialog not opening on Qt6
When clicking on the "Choose..." button nothing would happen visually
and it would show the error:

  Traceback (most recent call last):
    File "/home/user/dangerzone/dangerzone/gui/main_window.py", line 614, in select_output_directory
      dialog.setFileMode(QtWidgets.QFileDialog.DirectoryOnly)

According to the PySide docs, QFileDialog.DirectoryOnly has been
deprecated in Qt4.6 [1]. This was not an issue probably on PySide2
because it must have used an earlier Qt version.

Fixes #360

[1]: https://doc.qt.io/qtforpython-5/PySide2/QtWidgets/QFileDialog.html#PySide2.QtWidgets.PySide2.QtWidgets.QFileDialog.FileMode
2023-03-01 12:49:46 +00:00
Alex Pyrgiotis
56c5d77afd
Build Windows MSI/.exe in GitHub actions
Update our GitHub actions manifest to also build a dummy Windows MSI
installer for Dangerzone, so that we don't find out issues during
release.
2023-02-23 09:12:06 +00:00
deeplow
f307e03215
Windows build: link to adding Wix to PATH 2023-02-23 09:12:04 +00:00
deeplow
fb85421db8
Fix Windows build for PySide6 (illegal file names)
Building the `.msi` on Windows was failing in the `candle.exe` step due
to some files in the PySide6 library being too long (PySide6/examples)
or having illegal character (`+`) in their file names
(PySide6/qml/QtQuick).

Skipping copying these files to the `.msi` fixes the issue. Skipping
`examples/` should be of no impact since they're just examples and
skipping `qml/QtQuick` shouldn't cause issues because we don't use QML.

Reverts commit `bbbf822` and adapts it from PySide2 to PySide6.
2023-02-23 09:12:02 +00:00
deeplow
541fe7f382
Container: ignore non-progress pdftoppm output
pdftoppm raises Syntax issues and Errors on a variety of documents.
But it still produces usable results despite the failures. From the
user's perspective it's best to have a document even if imperfect than
having none at all. For this reason, we ignore non-relevant output.
2023-02-21 19:05:21 +00:00
deeplow
dbd0450542
Add poppler-data package due to missing fonts
Some documents were reporting the following error when running them
over pdftoppm:

    Syntax Error: Missing language pack for 'Adobe-Japan1' mapping

This did not necessarily make the document fail but it could be
that some fonts were not properly rendered due to the missing package.
2023-02-21 18:39:14 +00:00
Alex Pyrgiotis
9bf65bc829
dev_scripts: Add extra distros in QA script
Add some distros in the QA script that were missing from the list of our
supported ones.
2023-02-21 20:20:04 +02:00
Alex Pyrgiotis
ce86c1b126
dev_scripts: Enable building envs on Ubuntu Focal
Enable installing Podman in Ubuntu Focal, by re-using the instructions
we have in our installation section. This enables us building a dev
environment for Ubuntu Focal, which we couldn't previously.
2023-02-21 20:20:04 +02:00
Alex Pyrgiotis
5100e15213
Add missing build dependencies for Ubuntu Focal
Add some missing build dependencies that we encountered for Ubuntu
Focal, but they apply to the rest of the Debian-based distros as well.
2023-02-21 20:20:03 +02:00
Alex Pyrgiotis
79ccd14d5d
Fix PySide2 issue for Ubuntu Focal
Provide a fallback for QRegularExpressionValidator specifically for
Ubuntu Focal, because it's not present in PySide2 5.14. Instead,
fallback to QRegExpValidator if it doesn't exist.

Fixes #339
2023-02-21 20:17:05 +02:00
Alex Pyrgiotis
b94d0712c8
Minor corrections in test code 2023-02-17 01:15:08 +02:00
Alex Pyrgiotis
2042591964
container: Copy files before mounting them
Copy input files in a temporary dir before mounting them, thereby
changing their permissions, without affecting the original files. This
way, we can avoid cases where a file is accessible to the user only due
to a supplemental user group, which does not work for containers.

Fixes #157
Fixes #260
Fixes #335
2023-02-17 01:15:08 +02:00
Alex Pyrgiotis
ea73f5d820
container: Take SELinux labels into account
Take SELinux labels into account when mounting a file to the Dangerzone
container. Use the `:Z` flag (which is a no-op in non-SELinux systems)
to clear the existing SELinux label for a file, and apply one that
matches the container's.

Refs #335
2023-02-17 01:15:08 +02:00
Alex Pyrgiotis
d733890ca0
container: Do not leave stale temporary dirs
Do not leave stale temporary directories when conversion fails
unexpectedly. Instead, wrap the conversion operation in a context
manager that wipes the temporary dir afterwards.

Fixes #317
2023-02-17 01:15:08 +02:00
Alex Pyrgiotis
18bc77332d
tests: Run each test in separate config/cache dirs
Run each CLI command in a separate config/cache dir, to avoid leaks
between tests. Moreover, this way we are able to check the contents of
the config/cache dirs for a single CLI run.
2023-02-17 01:15:07 +02:00
Alex Pyrgiotis
44c324f9ac
Separate config dirs from temp dirs
Do not store temporary directories in the Dangerzone's config directory.
There are two reasons for that:

1. They are ephemeral, and they need a temporary place to be stored,
   preferably RAM-backed.
2. We need to set them while running our CI tests.
2023-02-17 01:06:44 +02:00
deeplow
9b3d98b20b
Build arm64 docker image for arm-based Macs
Remove --patform args completely so that by default we build natively
on each platform.

Partial fix for #50
2023-02-16 10:59:00 +00:00
Alex Pyrgiotis
93a06d72f0
Allow users to disable timeouts
Allow users to disable timeouts via the CLI, with the
`--disable-timeouts` argument. By default, the timeouts are always
enabled.

This option applies both to the CLI version of Dangerzone, and the GUI
one. For the latter, the user must start the GUI from their CLI (i.e.,
`dangerzone --disable-timeouts ...`)
2023-02-15 23:48:36 +02:00
Alex Pyrgiotis
f2a4f29cff
container: Introduce proportional timeouts
Introduce proportional timeouts in the container code, where the
conversion logic runs.

Previously, we had a single timeout for each command (120 seconds),
which didn't scale well either with the number of pages in a document,
or with the size of the document.

In this commit, we look into each operation, and we're trying to figure
out the following:

1. What's the number of pages we will operate on?
2. How large is the document?

Knowing the above, we can break down a command into multiple operations,
at least conceptually. Having a number of operations and a sane timeout
value per operation (10 seconds), we can multiply those and reach to a
timeout that fits the command better.

Fixes #306
Fixes #314
Refs #327
2023-02-15 23:46:53 +02:00
Maeve Andrews
c26326450b
Add a --distro option to build-deb.py
Add an optional --distro argument to build-deb.py, to specify the Debian
version in the package name, which currently is "1". This option may
prove useful when publishing packages to freedomofpress/apt-tools-prod,
where packages from different distros with the same names but different
contents are not accepted.
2023-02-14 15:49:51 +02:00
deeplow
b49d6de6bd
Sample PDFs: rename to include file format in name
Make it so all samples when converted don't map to the same file. This
makes it easier to manually inspect files.
2023-02-09 09:02:33 +00:00
deeplow
275df80484
GUI: exit with 1 when some conversion failed
Fixes: #318
2023-02-08 17:24:55 +00:00
Alex Pyrgiotis
23ee60d3f3
Add missing Dangerzone module in setup.py
While creating a Debian package for Dangerzone, we found out that the
`dangerzone.isolation_provider` submodule was not copied to the final
package. Turns out that it was missing from the packages list that we
define in `setup.py`.

Include this package in the proper section in `setup.py`.
2023-02-07 20:34:24 +02:00
Alex Pyrgiotis
aeeed411a0
container: Run commands asynchronously
Convert the Dangerzone script that in the container to run commands
asynchronously, via the asyncio module.

The main advantage of this approach is that it's fast, easy, and safe to
consume the command's streams, while the command is running in the
background.

Previously, we had implemented an approach that used non-blocking
sockets, but those are easy to get wrong. For instance, timeouts were
not exact, capturing output was brittle.

Fixes #325
2023-02-07 18:52:49 +02:00
Alex Pyrgiotis
24975fabd5
container: Reinstate OpenJDK 8 dependency
Commit d7be28ec2a assumed that OpenJDK was
required for the PDFtk package, which is no longer installed in the
Dangerzone image, and thus was removed.

Turns out that while LibreOffice does not depend on OpenJDK, it may
produce corrupted PDFs if installed without it, and will not abort the
operation.

Reinstate OpenJDK to fix the issue of corrupted PDFs.

Fixes #315
2023-02-07 18:52:49 +02:00
Alex Pyrgiotis
e5368b1ea0
ci: Run CI tests for Fedora 37
Run CI tests for Fedora 37 environments, now that we no longer require
PySide2 as a dev dependency.

Fixes #294
2023-02-07 18:52:09 +02:00
Alex Pyrgiotis
16375bfdf9
Use PySide6 in our dev environments
Drop PySide2 from our dependencies (previously used only on Linux
environments) and use PySide6 in all dev environments. The reason is
that PySide2 (from PyPI) does not support Python 3.11, and the variants
that do (Fedora/Debian packages) need to backport fixes from PySide6.

Our original attempt was to build PySide2 wheels for Python 3.11 but
it was not simple, nor maintainable. So, we were left with two options:

1. Install Python 3.10 in dev environments that have Python 3.11 by
   default.
2. Use PySide6 in all of our environments.

In both cases, we break package parity with the user's system, since we
are not testing Dangerzone under the same conditions. However, since
option (2) is forwards-compatible with where we want to move the
project (use Qt6 and PySide6), we chose that one.

Fixes #330
2023-02-07 18:52:09 +02:00
Alex Pyrgiotis
081c68c27f
dev_scripts: Alter the shadow-utils fix
Instead of reinstalling shadow-utils, use the actual fix that the Fedora
devs have suggested (rpm --restore shadow-utils). The previous method
does not seem to work on Fedora 37, and it threw the following error
when building the development environment:

    Installed package shadow-utils-2:4.12.3-3.fc37.x86_64 (from koji-override-0) not available.
    Error: No packages marked for reinstall.
    Error: building at STEP "RUN dnf reinstall -y shadow-utils && dnf clean all": while running runtime: exit status 1
2023-02-07 18:52:08 +02:00
Alex Pyrgiotis
e7eb3bf18b
dev_scripts: Fix a recursion issue in our PyTest wrapper
Fix an issue in our PyTest wrapper, that caused this recursion error:

```
  File "shibokensupport/signature/loader.py", line 61, in feature_importedgc
  File "shibokensupport/feature.py", line 137, in feature_importedgc
  File "shibokensupport/feature.py", line 148, in _mod_uses_pysidegc
  File "/usr/lib/python3.10/inspect.py", line 1147, in getsourcegc
    lines, lnum = getsourcelines(object)gc
  File "/usr/lib/python3.10/inspect.py", line 1129, in getsourcelinesgc
    lines, lnum = findsource(object)gc
  File "/usr/lib/python3.10/inspect.py", line 954, in findsourcegc
    lines = linecache.getlines(file, module.__dict__)gc
  File "/home/user/.cache/pypoetry/virtualenvs/dangerzone-hQU0mwlP-py3.10/lib/python3.10/site-packages/py/_vendored_packages/apipkg/__init__.py", line 177, in __dict__gc
    self.__makeattr(name)gc
  File "/home/user/.cache/pypoetry/virtualenvs/dangerzone-hQU0mwlP-py3.10/lib/python3.10/site-packages/py/_vendored_packages/apipkg/__init__.py", line 157, in __makeattrgc
    result = importobj(modpath, attrname)gc
  File "/home/user/.cache/pypoetry/virtualenvs/dangerzone-hQU0mwlP-py3.10/lib/python3.10/site-packages/py/_vendored_packages/apipkg/__init__.py", line 75, in importobjgc
    module = __import__(modpath, None, None, ["__doc__"])gc
  File "shibokensupport/signature/loader.py", line 54, in feature_importgc
RecursionError: maximum recursion depth exceededgc
```

This error seems to be related to
https://github.com/pytest-dev/pytest/issues/1794. By not importing
`pytest` in our test wrapper, and instead executing directly, we can
avoid it.

Note that this seems to be triggered only by Shiboken6, which is why we
hadn't previously encountered it.
2023-02-07 18:52:08 +02:00