Due to a bump in our Python dependencies, we now install Mypy 1.1.1
instead of 0.982. This change triggered the following errors:
* Incompatible default for argument <a> (default has type
None, argument has type <t>):
Mypy further explains here that PEP 484 prohibits implicit Optional,
so we need to make these types explicit Optional.
* Unused "type: ignore" comment, use narrower [method-assign] instead of
[assignment]:
Mypy has specialized some of its lints, meaning that we should switch
to the newer variants.
Also, it detected several other small inconsistencies. We fix all of
these errors in this commit.
Perform the following timeout bumps:
1. Increase the minimum timeout per page/MiB by x3. The rationale is that
10 seconds is a reasonable timeout, but to be on the safe side, it's
best if we multiply it by a safety factor.
2. Increase the minimum timeout from 10 seconds to 60 seconds. 10
seconds may be too little if the application runtime (e.g.,
LibreOffice) is slow to start due to background CPU thrashing.
pdftoppm raises Syntax issues and Errors on a variety of documents.
But it still produces usable results despite the failures. From the
user's perspective it's best to have a document even if imperfect than
having none at all. For this reason, we ignore non-relevant output.
Allow users to disable timeouts via the CLI, with the
`--disable-timeouts` argument. By default, the timeouts are always
enabled.
This option applies both to the CLI version of Dangerzone, and the GUI
one. For the latter, the user must start the GUI from their CLI (i.e.,
`dangerzone --disable-timeouts ...`)
Introduce proportional timeouts in the container code, where the
conversion logic runs.
Previously, we had a single timeout for each command (120 seconds),
which didn't scale well either with the number of pages in a document,
or with the size of the document.
In this commit, we look into each operation, and we're trying to figure
out the following:
1. What's the number of pages we will operate on?
2. How large is the document?
Knowing the above, we can break down a command into multiple operations,
at least conceptually. Having a number of operations and a sane timeout
value per operation (10 seconds), we can multiply those and reach to a
timeout that fits the command better.
Fixes#306Fixes#314
Refs #327
Convert the Dangerzone script that in the container to run commands
asynchronously, via the asyncio module.
The main advantage of this approach is that it's fast, easy, and safe to
consume the command's streams, while the command is running in the
background.
Previously, we had implemented an approach that used non-blocking
sockets, but those are easy to get wrong. For instance, timeouts were
not exact, capturing output was brittle.
Fixes#325
PDFtk actually isn't needed. It was being used for breaking a PDF
into pages but this is something that be replaced by the already present
'pdftoppm'. Furthermore, by removing this dependency we contribute to
reproducible builds and overall supply chain security because it was
obtained from gitlab with no signature verification or version pinning.
The replacement 'pdftoppm' enabled us to do a shortcut:
- before: PDF -> PDF pages -> PNG images -> RGB images
- after: PDF -> PPM images -> RGB images
And this last conversion step is trivial since the RGB format we were
using is just a PPM file without the metadata in its header.
Bump the global timeout used for various steps from 1 minute to 2
minutes. The reason is that we've seen several reports of operations
failing due to timeout reasons, that were otherwise legitimately
running.
Also, bump the timeout used for compression, which has been reported as
problematic as well.
Refs #146
Refs #149