dangerzone

mirror of https://github.com/freedomofpress/dangerzone.git synced 2025-05-02 19:51:49 +02:00

Author	SHA1	Message	Date
Alex Pyrgiotis	68f8338d20	Revert "Disable gVisor's DirectFS feature." This reverts commit `73b0f8b7d4`. Unfortunately, disabling DirectFS causes a problem in Linux systems that enable Yama mode 2. Turns out that Tails is such a system, so we have to revert this change, if we want to support it. Refs #982	2024-10-30 19:10:26 +01:00
Alexis Métaireau	71cc4b37e5	feat: show a deprecation warning for Ubuntu Focal (20.04)	2024-10-30 01:21:38 +01:00
Alex Pyrgiotis	5ed4a048a0	qubes: Do not close stderr Some checks are pending Tests / build-deb (debian trixie) (push) Blocked by required conditions Details Tests / build-deb (ubuntu 20.04) (push) Blocked by required conditions Details Tests / build-deb (ubuntu 22.04) (push) Blocked by required conditions Details Tests / build-deb (ubuntu 23.10) (push) Blocked by required conditions Details Tests / build-deb (ubuntu 24.04) (push) Blocked by required conditions Details Tests / build-deb (ubuntu 24.10) (push) Blocked by required conditions Details Tests / install-deb (debian bookworm) (push) Blocked by required conditions Details Tests / install-deb (debian bullseye) (push) Blocked by required conditions Details Tests / install-deb (debian trixie) (push) Blocked by required conditions Details Tests / install-deb (ubuntu 20.04) (push) Blocked by required conditions Details Tests / install-deb (ubuntu 22.04) (push) Blocked by required conditions Details Tests / install-deb (ubuntu 23.10) (push) Blocked by required conditions Details Tests / install-deb (ubuntu 24.04) (push) Blocked by required conditions Details Tests / install-deb (ubuntu 24.10) (push) Blocked by required conditions Details Tests / build-install-rpm (fedora 39) (push) Blocked by required conditions Details Tests / build-install-rpm (fedora 40) (push) Blocked by required conditions Details Tests / build-install-rpm (fedora 41) (push) Blocked by required conditions Details Tests / run tests (debian bookworm) (push) Blocked by required conditions Details Tests / run tests (debian bullseye) (push) Blocked by required conditions Details Tests / run tests (debian trixie) (push) Blocked by required conditions Details Tests / run tests (fedora 39) (push) Blocked by required conditions Details Tests / run tests (fedora 40) (push) Blocked by required conditions Details Tests / run tests (fedora 41) (push) Blocked by required conditions Details Tests / run tests (ubuntu 20.04) (push) Blocked by required conditions Details Tests / run tests (ubuntu 22.04) (push) Blocked by required conditions Details Tests / run tests (ubuntu 23.10) (push) Blocked by required conditions Details Tests / run tests (ubuntu 24.04) (push) Blocked by required conditions Details Tests / run tests (ubuntu 24.10) (push) Blocked by required conditions Details Scan latest app and container / security-scan-container (push) Waiting to run Details Scan latest app and container / security-scan-app (push) Waiting to run Details Do not close stderr as part of the Qubes termination logic, since we need to read the debug logs. This shouldn't affect typical termination scenarios, since we expect our disposable qube to be either busy reading from stdin, or writing to stdout. If this is not the case, then forcefully killing the `qrexec-client-vm` process should unblock the qube.	2024-10-22 20:33:29 +03:00
Alex Pyrgiotis	50627d375c	Fix a small typo	2024-10-22 19:07:09 +03:00
Alexis Métaireau	a95b612e78	Catch installation errors and display them. Fixes #193	2024-10-17 16:20:56 +02:00
Alex Pyrgiotis	6e55e43fef	Make Dummy isolation provider more realistic Make the Dummy isolation provider follow the rest of the isolation providers and perform the second part of the conversion on the host. The first part of the conversion is just a dummy script that reads a file from stdin and prints pixels to stdout.	2024-10-17 15:50:12 +03:00
Alex Pyrgiotis	7ea7c8a0cc	Remove dead code	2024-10-17 15:50:12 +03:00
Alex Pyrgiotis	f42bb23229	Update the way we get debug logs Move the logic for grabbing debug logs to a new place, now that we have merged the two conversion stages (doc to pixels, pixels to PDF).	2024-10-17 15:50:12 +03:00
Alex Pyrgiotis	e34c36f7bc	Perform on-host pixels to PDF conversion Extend the base isolation provider to immediately convert each page to a PDF, and optionally use OCR. In contract with the way we did things previously, there are no more two separate stages (document to pixels, pixels to PDF). We now handle each page individually, for two main reasons: 1. We don't want to buffer pixel data, either on disk or in memory, since they take a lot of space, and can potentially leave traces. 2. We can perform these operations in parallel, saving time. This is more evident when OCR is not used, where the time to convert a page to pixels, and then back to a PDF are comparable.	2024-10-17 15:50:12 +03:00
Alex Pyrgiotis	28b7249a6a	Add new way to detect tessdata dir Add a new way to detect where the Tesseract data are stored in a user's system. On Linux, the Tesseract data should be installed via the package manager. On macOS and Windows, they should be bundled with the Dangerzone application. There is also the exception of running Dangerzone locally, where even on Linux, we should get the Tesseract data from the Dangerzone share/ folder.	2024-10-17 15:50:11 +03:00
Alex Pyrgiotis	5bba249c87	Provide sanitized version of output filename	2024-10-17 15:33:58 +03:00
Alex Pyrgiotis	91fbc466c5	Add an import preference for vendored packages Prefer importing packages from ./dangerzone/vendor, if there is one there, instead of using the system ones.	2024-10-15 14:58:06 +03:00
bnewc	752eff02d8	Prevent user from using illegal characters in output filename Add some checks in the Dangerzone GUI and CLI that will prevent a user from mistakenly adding illegal characters in the output filename.	2024-10-07 18:04:47 +03:00
Alex Pyrgiotis	dc8a22c8e7	Fix the dummy provider Make the dummy provider behave a bit more like the other providers, with a proper function and termination logic. This will be helpful soon in the tests.	2024-10-07 17:37:42 +03:00
Alex Pyrgiotis	d6410652cb	Kill the process group when conversion terminates Instead of killing just the invoked Podman/Docker/qrexec process, kill the whole process group, to make sure that other components that have been spawned die as well. In the case of Podman, conmon is one of the processes that lingers, so that's one way to kill it.	2024-10-07 17:37:39 +03:00
Alex Pyrgiotis	b9a3dd63ad	Always start conversion process in new session Start the conversion process in a new session, so that we can later on kill the process group, without killing the controlling script (i.e., the Dangezone UI). This should not affect the conversion process in any other way.	2024-10-07 17:27:38 +03:00
Alex Pyrgiotis	95660c3ec7	Make dummy tests faster Remove the unnecessary sleep command in our dummy tests, which made them run much slower.	2024-10-07 12:48:03 +03:00
Alexis Métaireau	3e434d08d1	Always use our own seccomp policy as a default. As per Etienne Perot's comment on #908: > Then it seems to me like it would be easy to simply apply this seccomp profile under all container runtimes (since there's no reason why the same image and the same command-line would call different syscalls under different container runtimes).	2024-10-02 14:12:48 +02:00
Alexis Métaireau	eb10082a62	Merge branch 'hotfix-0.7.1' into main	2024-10-01 15:16:25 +02:00
Alex Pyrgiotis	4423fc6232	Handle multiple image IDs in the `image-ids.txt` file. Docker Desktop 4.30.0 uses the containerd image store by default, which generates different IDs for the images, and as a result breaks the logic we are using when verifying the images IDs are present. Now, multiple IDs can be stored in the `image-id.txt` file. Fixes #933	2024-09-30 12:34:34 +02:00
Alex Pyrgiotis	27d201a95b	container: Avoid pop-ups on Windows Avoid window pop-ups on Windows systems, by using the `startupinfo` argument of `subprocess.run`.	2024-09-27 12:55:46 +03:00
Alexis Métaireau	c3c7fbbc20	Fix wrong container-runtime detection on Linux Use "podman" when on Linux, and "docker" otherwise. This commit also adds a text widget to the interface, showing the actual content fo the error that happened, to help debug further if needed. Fixes #212	2024-09-18 15:04:57 +02:00
amnak613	9b9e265b11	Added try excepts for unhandled exceptions Fixes #776	2024-09-17 16:26:46 +03:00
Etienne Perot	73b0f8b7d4	Disable gVisor's DirectFS feature. DirectFS is enabled by default in gVisor to improve I/O performance, but comes at the cost of enabling the `openat(2)` syscall (with severe restrictions, but still). As Dangerzone is not performance-sensitive, and that it is desirable to guarantee for the document conversion process to not open any files (to mimic some of what SELinux provides), might as well disable it by default. See #226.	2024-09-10 17:32:31 +03:00
Alexis Métaireau	0c9f426b68	Do not throw on malformed Desktop Entries on Linux. This just skips the malformed entry when it's found. Fixes #899	2024-09-10 15:25:45 +02:00
Alex Pyrgiotis	3f86e7b465	Make PyMuPDF always log to stderr PyMUPDF logs to stdout by default, which is problematic because we use the stdout of the conversion process to read the pixel stream of a document. Make PyMuPDF always log to stderr, by setting the following environment variables: PYMUPDF_MESSAGE and PYMUPDF_LOG. Fixes #877	2024-08-09 14:32:19 +03:00
Alex Pyrgiotis	0a181a3342	container: Set `container_engine_t` SELinux label Set the `container_engine_t` SELinux on the outer Podman container, so that gVisor does not break on systems where SELinux is enforcing. This label is provided for container engines running within a container, which fits our `runsc` within `crun` situation. We have considered using the more permissive `label=disable` option, to disable SELinux labels altogether, but we want to take advantage of as many SELinux protections as we can, even for the outer container. Cherry-picked from `e1e63d14f8` Fixes #880	2024-07-30 16:41:13 +03:00
Alex Pyrgiotis	e1e63d14f8	container: Set `container_engine_t` SELinux label Set the `container_engine_t` SELinux on the outer Podman container, so that gVisor does not break on systems where SELinux is enforcing. This label is provided for container engines running within a container, which fits our `runsc` within `crun` situation. We have considered using the more permissive `label=disable` option, to disable SELinux labels altogether, but we want to take advantage of as many SELinux protections as we can, even for the outer container. Fixes #880	2024-07-26 16:34:19 +03:00
Alex Pyrgiotis	b6f399be6e	container: Avoid pop-ups on Windows Avoid window pop-ups on Windows systems, by using the `startupinfo` argument of `subprocess.run`.	2024-07-02 20:41:58 +03:00
Alex Pyrgiotis	756945931f	container: Handle case where `docker kill` hangs We have encountered several conversions where the `docker kill` command hangs. Handle this case by specifying a timeout to this command. If the timeout expires, log a warning and proceed with the rest of the termination logic (i.e., kill the conversion process). Fixes #854	2024-07-01 17:56:21 +03:00
deeplow	d0e1df5546	Add drag and drop support for document selection	2024-06-27 11:51:41 +02:00
Alex Pyrgiotis	e7e3430ca1	Use a custom seccomp policy for older Docker Desktop releases We are aware that some Docker Desktop releases before 25.0.0 ship with a seccomp policy which disables the `ptrace(2)` system call. In such cases, we opt to use our own seccomp policy which allows this system call. This seccomp policy is the default one in the latest releases of Podman, and we use it in Linux distributions where Podman version is < 4.0. Fixes #846	2024-06-26 18:49:03 +03:00
Ro	fb66946694	Add __future__ annotations for backwards-compatible typehint	2024-06-12 22:41:05 +02:00
Ro	54ab9ce98f	Order list of PDF viewers and return default application first (Linux).	2024-06-12 22:41:04 +02:00
Etienne Perot	f03bc71855	Sandbox all Dangerzone document processing within gVisor. This wraps the existing container image inside a gVisor-based sandbox. gVisor is an open-source OCI-compliant container runtime. It is a userspace reimplementation of the Linux kernel in a memory-safe language. It works by creating a sandboxed environment in which regular Linux applications run, but their system calls are intercepted by gVisor. gVisor then redirects these system calls and reinterprets them in its own kernel. This means the host Linux kernel is isolated from the sandboxed application, thereby providing protection against Linux container escape attacks. It also uses `seccomp-bpf` to provide a secondary layer of defense against container escapes. Even if its userspace kernel gets compromised, attackers would have to additionally have a Linux container escape vector, and that exploit would have to fit within the restricted `seccomp-bpf` rules that gVisor adds on itself. Fixes #126 Fixes #224 Fixes #225 Fixes #228	2024-06-12 13:40:04 +03:00
Alex Pyrgiotis	7179d6f734	Get container runtime version Get the (major, minor) parts of the Docker/Podman version, to check if some specific features can be used, or if we need a fallback. These features are related with the upcoming gVisor integration, and will be added in subsequent commits.	2024-06-12 13:40:04 +03:00
Alex Pyrgiotis	cf9a545c1a	Use TESSDATA_PREFIX if explicitly passed Our logic for detecting the appropriate Tesseract data directory should also take into account the canonical envvar, if explicitly passed.	2024-06-12 13:40:03 +03:00
Alexis Métaireau	d9d9ab91a3	docs: document why `get_tmp_dir` is required in the imports	2024-06-05 14:19:32 +02:00
Alexis Métaireau	55850bfe2f	refactor: use pathlib `/` separator rather than `.joinpath` Mainly to help readability	2024-06-05 14:19:31 +02:00
Alexis Métaireau	eba30f3c17	fix: do not catch bare exceptions Bare excepts will catch keyboard-exit exceptions, system-exit etc. which is probably not what we want.	2024-06-05 14:19:31 +02:00
Alexis Métaireau	65a8827daa	chore: minor linting A few minor changes about when to use `==` and when to use `is`. Basically, this uses `is` for booleans, and `==` for other values. With a few other changes about coding style which was enforced by `ruff`.	2024-06-05 14:19:31 +02:00
Alexis Métaireau	cbbd6afcc1	chore: remove unused code This commit removes code that's not being used, it can be exceptions with the `as e` where the exception itself is not used, the same with `with` statements, and some other parts where there were duplicated code.	2024-06-05 14:19:31 +02:00
Alexis Métaireau	99f1e15fd2	chore: Do not use fstrings without placeholders > f-strings are a convenient way to format strings, but they are not > necessary if there are no placeholder expressions to format. In this > case, a regular string should be used instead, as an f-string without > placeholders can be confusing for readers, who may expect such a > placeholder to be present. > > — [ruff docs](https://docs.astral.sh/ruff/rules/f-string-missing-placeholders/)	2024-06-05 14:19:31 +02:00
Alexis Métaireau	5aa4863b52	chore(imports): remove useless imports As detected by [ruff](https://github.com/astral-sh/ruff) Related to #254, although it doesn't provide the command to lint the codebase itself.	2024-06-05 14:19:30 +02:00
Alex Pyrgiotis	2aee6f4ad2	Fix some minor lint issues	2024-06-04 13:16:06 +03:00
Alex Pyrgiotis	1e1d9274f0	Handle complaints about shebangs during RPM build When building the Dangerzone RPMs, we were seeing the following shebang warnings: + /usr/lib/rpm/redhat/brp-mangle-shebangs mangling shebang in /usr/lib/python3.12/site-packages/dangerzone/conversion/doc_to_pixels.py from /usr/bin/env python3 to #!/usr/bin/python3 mangling shebang in /usr/lib/python3.12/site-packages/dangerzone/conversion/common.py from /usr/bin/env python3 to #!/usr/bin/python3 mangling shebang in /usr/lib/python3.12/site-packages/dangerzone/conversion/pixels_to_pdf.py from /usr/bin/env python3 to #!/usr/bin/python3 mangling shebang in /etc/qubes-rpc/dz.ConvertDev from /usr/bin/env python3 to #!/usr/bin/python3 mangling shebang in /etc/qubes-rpc/dz.Convert from /bin/sh to #!/usr/bin/sh These warnings are benign in nature, but coupled with #727, they could lead to incorrect file permissions. Remove shebangs from the following files, since they are not executed directly, but are imported instead: dangerzone/conversion/common.py dangerzone/conversion/doc_to_pixels.py dangerzone/conversion/pixels_to_pdf.py Also, accept the suggestions by Fedora (/bin/sh -> /usr/bin/sh, /usr/bin/env python3 -> /usr/bin/python3) for the following files: qubes/dz.Convert qubes/dz.ConvertDev Refs #727	2024-05-28 18:06:34 +03:00
Naglis Jonaitis	210405b9fd	Fix Qt QAction import In PySide2 QAction is available under `PySide2.QtWidgets`[1] whereas in PySide6 it resides under `PySide6.QtGui`[2]. Closes #788 [1]: https://doc.qt.io/qtforpython-5/PySide2/QtWidgets/QAction.html#PySide2.QtWidgets.PySide2.QtWidgets.QAction [2]: https://doc.qt.io/qtforpython-6/PySide6/QtGui/QAction.html	2024-05-14 16:27:44 +03:00
Naglis Jonaitis	8694fb21ec	Use `exec` instead of `exec_` for Qt dialogs `exec_` is being deprecated in favor of `exec`. Also use `launch()` helper method for `Dialog` subclasses. Fixes #595	2024-05-14 16:23:20 +03:00
Alex Pyrgiotis	ff25fa3045	Fix stuck conversion processes Gracefully terminate certain conversion processes that may get stuck when writing lots of data to stdout. Also, handle a race condition when a conversion process terminates slightly after the associated container. Fixes #791	2024-05-09 16:46:15 +03:00
Alex Pyrgiotis	0557e34429	Exclude Dangerzone from the discovered PDF viewers We have recently [1] changed the name of the Dangerzone application to capital-case "Dangerzone", but this breaks our PDF viewer detection logic. Adjust our check to exclude Dangerzone from the list. Fixes #790 [1]: See commit `3d426ed36b`	2024-05-09 15:57:42 +03:00
Alex Pyrgiotis	37bf9badf4	Remove extraneous log sanitization Remove an extra call to `replace_control_chars()`, as well as an unnecessary method.	2024-05-09 15:57:42 +03:00
Alex Pyrgiotis	0b45360384	Keep newlines when reading debug logs In `d632908a44` we improved our `replace_control_chars()` function, by replacing every control or invalid Unicode character with a placeholder one. This change, however, made our debug logs harder to read, since newlines were not preserved. There are indeed various cases in which replacing newlines is wise (e.g., in filenames), so we should keep this behavior by default. However, specifically for reading debug logs, we add an option to keep newlines to improve readability, at no expense to security.	2024-05-09 15:57:42 +03:00
Alex Pyrgiotis	e11aaec3ac	Always use sys.exit when exiting the application The `exit()` [1] function is not necessarily present in every Python environment, as it's added by the `site` module. Also, this function is "[...] useful for the interactive interpreter shell and should not be used in programs" For this reason, we replace all such occurrences with `sys.exit()` [2], which is the canonical function to exit Python programs. [1]: https://docs.python.org/3/library/constants.html#exit [2]: https://docs.python.org/3/library/sys.html#sys.exit	2024-05-09 15:57:42 +03:00
Alex Pyrgiotis	d6202cd028	Invoke external command on Windows properly On Windows, if we don't use the `startupinfo=` argument of subprocess.Popen, then a terminal window will flash while running the command. Use `startupinfo=` when killing a container, as we do for every other command.	2024-05-09 15:57:42 +03:00
Alex Pyrgiotis	1c70ee6771	Fix archiving the same doc twice on Windows On Windows, if we somehow attempt to archive the same document twice (e.g, because it got archived once, and then we copy it back), we will get an error, because Windows does not overwrite the target path, if it already exists. Fix this issue by always removing the previously archived version, when performing the next archival action, and update our tests.	2024-05-09 15:57:42 +03:00
Naglis Jonaitis	8cdb2d5720	Set the desktop filename and app name of the Qt application Currently, the app ID of the Dangerzone GUI application when running under Wayland is `python3`, which is not very useful if one wants to automate some action related to the Dangerzone application window (e.g. to always start Dangerzone window in floating mode under Sway WM). Setting the desktop filename property also sets the app ID of the application under Wayland. According to Qt documentation[1], the property value should be the name of the application's .desktop file but without the extension. Qt documentation also states: > This property gives a precise indication of what desktop entry > represents the application and it is needed by the windowing system to > retrieve such information without resorting to imprecise heuristics. Therefore I also think that setting this property is needed to display the correct application name and icon (taken from the .desktop entry) when running under certain windowing systems (like Wayland) (see also #402). Note that this property is not enough, as we've encountered systems where setting just the desktop file name does not alter the detected application name by the window manager. For this reason, we also use set the application name [2] to `dangerzone`, to remove any ambiguity. [1]: https://doc.qt.io/qt-6/qguiapplication.html#desktopFileName-prop [2]: https://doc.qt.io/qt-6/qcoreapplication.html#applicationName-prop Fixes #402	2024-04-25 17:23:02 +03:00
Naglis Jonaitis	d632908a44	Fix printing of filenames with surrogate escapes On Unix systems a filename can be a sequence of bytes that is not valid UTF-8. Python uses[1] surrogate escapes to allow to decode such filenames to Unicode (bytes that cannot be decoded are replaced by a surrogate; upon encoding the surrogate is converted to the original byte). From `click` docs[2]: > Invalid bytes or surrogate escapes will raise an error when written > to a stream with `errors="strict"`. This will typically happen with > `stdout` when the locale is something like `en_GB.UTF-8`. To fix that, we use `utils.replace_control_chars()` before printing the filenames to `stdout` so that surrogate escapes are replaced by �. Fixes #768	2024-04-25 14:11:25 +03:00
Naglis Jonaitis	52ced04507	Relax the restrictions of util.replace_control_chars The `util.replace_control_chars()` function was overly strict, and would replace every non-ASCII character with "_". This included both control characters, as well as normal characters in a non-English alphabet. Relax these restrictions by checking each character and deciding if it's a Unicode control character, using the `unicodedata` Python package. With this change, emojis and non-English letters are now allowed.	2024-04-25 14:11:16 +03:00
Alex Pyrgiotis	f57d2f7191	isolation_provider: Always terminate spawned process Previously, we always assumed that the spawned process would quit within 3 seconds. This was an arbitrary call, and did not work in practice. We can improve our standing here by doing the following: 1. Make `Popen.wait()` calls take a generous amount of time (since they are usually on the sad path), and handle any timeout errors that they throw. This way, a slow conversion process cleanup does not take too much of our users time, nor is it reported as an error. 2. Always make sure that once the conversion of doc to pixels is over, the corresponding process will finish within a reasonable amount of time as well. Fixes #749	2024-04-24 14:39:15 +03:00
Alex Pyrgiotis	cd4cbdb00a	isolation_provider: Get exit code without timing out Get the exit code of the spawned process for the doc-to-pixels phase, without timing out. More specifically, if the spawned process has not finished within a generous amount of time (hardcode to 15 seconds), return UnexpectedConversionError, with a custom message. This way, the happy path is not affected, and we still make our best to learn the underlying cause of the I/O error.	2024-04-24 14:36:14 +03:00
Alex Pyrgiotis	171a7eca52	isolation_provider: Terminate doc-to-pixels proc Extend the IsolationProvider class with a `terminate_doc_to_pixels_proc()` method, which must be implemented by the Qubes/Container providers and gracefully terminate a process started for the doc to pixels phase. Refs #563	2024-04-24 14:36:14 +03:00
Alex Pyrgiotis	a63f4b85eb	isolation_provider: Set a unique name for spawned containers Set a unique name for spawned containers, based on the ID of the provided document. This ID is not globally unique, as it has few bits of entropy. However, since we only want to avoid collisions within a single Dangerzone invocation, and since we can't support multiple containers running in parallel, this ID will suffice.	2024-04-24 14:33:33 +03:00
Alex Pyrgiotis	6850d31edc	isolation_provider: Pass doc when creating doc-to-pixels proc Pass the Document instance that will be converted to the `IsolationProvider.start_doc_to_pixels_proc()` method. Concrete classes can then associate this name with the started process, so that they can later on kill it.	2024-04-24 14:33:33 +03:00
deeplow	dfcb10c494	Move settings.json into constant Move settings.json into a constant so that they can later be referred to by the testing module.	2024-04-01 18:18:41 +03:00
deeplow	ad16a0e471	Fix Settings().set() when setting new setting Settings().set() would fail if we were trying to set a setting that did not exist before. The reason is because before setting it would try to get the previous value, but though direct key access, which would lead to an exception.	2024-04-01 18:18:41 +03:00
Alex Pyrgiotis	74c467eaf7	conversion: Do not let PyMuPDF print to stdout PyMuPDF has some hardcoded log messages that print to stdout [1]. We don't have a way to silence them, because they don't use the Python logging infrastructure. What we can do here is silence a particular call that's been creating debug messages. For a long term solution, we have sent a PR to the PyMuPDF team, and we will follow up there [2]. Fixes #700 [1]: https://github.com/freedomofpress/dangerzone/issues/700 [2]: https://github.com/pymupdf/PyMuPDF/pull/3137	2024-03-13 21:03:15 +02:00
Alex Pyrgiotis	a31f3370d0	Capture missing logs in second-stage conversion For a while now, we didn't get logs for the second-stage conversion when using containers. Extend the code to log any captured output from the second stage conversion, only if we run Dangerzone via our dev entrypoint. Note that the Qubes isolation provider was always logging output from the second stage of the conversion.	2024-03-13 20:59:50 +02:00
deeplow	0449840ec3	dz.ConvertDev: do not teleport .pyc files On Qubes the conversion in dev mode would fail when converting from a Fedora 38 development qube via a Fedora 39 disposable qube. The reason was that dz.ConvertDev was receiving `.pyc` files, which were compiled for python 3.11 but running on python 3.12. Unfortunately PyZipFile objects cannot send source python files, even though the documentation is a little bit unclear on this [1]. Fixes #723 [1]: https://docs.python.org/3/library/zipfile.html#pyzipfile-objects	2024-03-13 07:13:39 +00:00
Alex Pyrgiotis	f75d471ec8	Fix OCR bug in Qubes Fedora 38 templates Provide a fix for an OCR bug that affected Fedora 38 templates of Qubes OS. In that specific configuration, the PyMuPDF version accepts the Tesseract data directory only from the `TESSDATA_PREFIX` environment variable. Our mistake was that we were setting this environment variable in a dev script, instead of setting it for all configurations. In this commit, we set an attribute in the fitz.fitz module, so that both dev scripts and end-user installations can work. This is hacky, but it targets an old PyMuPDF release after all, so we don't expect things to break in the long run. Fixes #737	2024-03-04 16:53:04 +02:00
Alex Pyrgiotis	5b6911af84	Properly add new file extensions Accept `.svg` and `.bmp` files when browsing via the Dangerzone GUI. Support for these extensions has already been added in the converter code that runs in the sandbox (`cd99122385`) but they were erroneously left out from the filter in the Dangerzone main window.	2024-02-20 16:02:38 +02:00
Alex Pyrgiotis	e73f10f99b	Handle gracefully unknown error codes Do not throw exceptions for unknown error codes. If `get_proc_exception()` gets called from within an exception context and raises an exception itself, then this exception will not get caught, and it will get lost. Prefer instead to return an exception class that we have for this purpose, and show to the user the unknown error code of the converesion process.	2024-02-20 16:00:35 +02:00
Alex Pyrgiotis	634523dac9	Get underlying error when conversion fails When we get an early EOF from the converter process, we should immediately get the exit code of that process, to find out the actual underlying error. Currently, the exception we raise masks the underlying error. Raise a ConverterProcException, that in turns makes our error handling code read the exit code of the spawned process, and converts it to a helpful error message. Fixes #714	2024-02-20 15:55:45 +02:00
Alex Pyrgiotis	6ee1d14c9a	Start conversion process earlier Start the conversion process earlier, so that we have a reference to the Popen object in case of an exception.	2024-02-20 15:55:45 +02:00
deeplow	e4a5dbce46	Don't show 50% duplicated progress info 50% would show twice in the conversion progress due to an overlap in conversion progress values. The doc_to_pixels would be from 0-50% and the pixels_to_pdf from 50%-100%. This commit makes the first part go from 0 to 49% instead. Fixes #715	2024-02-20 13:47:15 +00:00
deeplow	75f8d76c5b	Appease new version of black lint tool	2024-02-13 11:36:10 +00:00
deeplow	879fca6f9f	Remove uneeded TESSDATA_PREFIX setting in container The container image does not need the TESSDATA_PREFIX env variable since its PyMuPDF version is new enough to support `tessdata` as an argument when calling the PyMuPDF tesseract method.	2024-02-07 13:14:08 +00:00
deeplow	6006beeb03	Fix OCR on Qubes: PyMuPDF required TESSDATA_PREFIX PyMuPDF versions lower than 1.22.5 pass the tesseract data path as an argument to `pixmap.pdfocr_tobytes()` [1], but lower versions require setting instead the TESSDATA_PREFIX environment variable [2]. Because on Qubes the pixels to pdf conversion happens on the host and Qubes has a lower PyMuPDF package version, we need to pass instead via environment variable. NOTE: the TESSDATA_PREFIX env. variable was set in dangerzone-cli instead of closer to the calling method in `doc_to_pixels.py` since PyMuPDF reads this variable as soon as the fitz module is imported [3][4]. [1]: https://pymupdf.readthedocs.io/en/latest/pixmap.html#Pixmap.pdfocr_tobytes [2]: https://pymupdf.readthedocs.io/en/latest/installation.html#enabling-integrated-ocr-support [3]: https://github.com/pymupdf/PyMuPDF/discussions/2439 [4]: https://github.com/pymupdf/PyMuPDF/blob/5d6a7db/src/__init__.py#L159 Fixes #682	2024-02-07 13:13:10 +00:00
deeplow	8a32d80762	Remove leftover progress variable in pixels_to_pdf Since the progress information is now inferred on host based on the number of pages obtained, progress-tracking variables should be removed from the server.	2024-02-06 20:11:52 +00:00
deeplow	69c2a02d81	Remove timeouts Remove timeouts due to several reasons: 1. Lost purpose: after implementing the containers page streaming the only subprocess we have left is LibreOffice. So don't have such a big risk of commands hanging (the original reason for timeouts). 2. Little benefit: predicting execution time is generically unsolvable computer science problem. Ultimately we were guessing an arbitrary time based on the number of pages and the document size. As a guess we made it pretty lax (30s per page or MB). A document hanging for this long will probably lead to user frustration in any case and the user may be compelled to abort the conversion. 3. Technical Challenges with non-blocking timeout: there have been several technical challenges in keeping timeouts that we've made effort to accommodate. A significant one was having to do non-blocking read to ensure we could timeout when reading conversion stream (and then used here) Fixes #687	2024-02-06 20:11:43 +00:00
deeplow	4d3f2b32c7	Revert "Add Stopwatch implementation" This reverts commit `344d6f7bfa`. Stopwatch is no longer needed now that we're removing timeouts.	2024-02-06 19:42:42 +00:00
deeplow	f31374e33c	Revert "Add non-blocking read utility" This reverts commit `fea193e935`. This is part of the purge of timeout-related code since we no longer need it [1]. Non-blocking reads were introduced in the reverted commit in order to be able to cut a stream mid-way due to a timeout. This is no longer needed now that we're getting rid of timeouts. [1]: https://github.com/freedomofpress/dangerzone/issues/687	2024-02-06 19:42:41 +00:00
deeplow	07dd54cd13	Fix hanging: disable container logging The conversion was hanging arbitrarily [1] on some systems. Sometimes it would send the full page other times stop half-way. Originally found by @apyrgio. Co-authored-by: @apyrgio [1]: https://github.com/freedomofpress/dangerzone/pull/627#issuecomment-1892491968	2024-02-06 19:42:41 +00:00
deeplow	f3032a7142	Make big endian explicit in int to bytes Fix issues in older distros that don't yet support python 3.11 where endianness was not a default argument [1]. This is in response to CI failures [2]. [1]: https://docs.python.org/3/library/stdtypes.html#int.to_bytes [2]: https://app.circleci.com/pipelines/github/freedomofpress/dangerzone/2186/workflows/e340ca21-85ce-42b6-9bc3-09e66f96684a/jobs/27380y	2024-02-06 19:42:41 +00:00
deeplow	1835756b45	Allow each conversion to have its own proc If we increased the number of parallel conversions, we'd run into an issue where the streams were getting mixed together. This was because the Converter.proc was a single attribute. This breaks it down into a local variable such that this mixup doesn't happen.	2024-02-06 19:42:41 +00:00
deeplow	61e7a3c107	Fix isolation provider tests Conversions methods had changed and that was part of the reason why the tests were failing. Furthermore, due to the `provider.proc`, which stores the associated qrexec / container process, "server" exceptions raise a IterruptedConversion error (now ConverterProcException), which then requires interpretation of the process exit code to obtain the "real" exception.	2024-02-06 19:42:41 +00:00
deeplow	550786adfe	Remove untrusted progress parsing (stderr instead) Now that only the second container can send JSON-encoded progress information, we can the untrusted JSON parsing. The parse_progress was also renamed to `parse_progress_trusted` to ensure future developers don't mistake this as a safe method. The old methods for sending untrusted JSON were repurposed to send the progress instead to stderr for troubleshooting in development mode. Fixes #456	2024-02-06 19:42:40 +00:00
deeplow	c991e530d0	Fix IsolationProvider.percentage variable reuse If one converted more than one document, since the state of IsolationProvider.percentage would be stored in the IsolationProvider instance, it would get reused for the second document. The fix is to keep it as a local variable, but we can explore having progress stored on the document itself, for example. Or having one IsolationProvider per conversion.	2024-02-06 19:42:40 +00:00
deeplow	0a099540c8	Stream pages in containers: merge isolation providers Merge Qubes and Containers isolation providers core code into the class parent IsolationProviders abstract class. This is done by streaming pages in containers for exclusively in first conversion process. The commit is rather large due to the multiple interdependencies of the code, making it difficult to split into various commits. The main conversion method (_convert) now in the superclass simply calls two methods: - doc_to_pixels() - pixels_to_pdf() Critically, doc_to_pixels is implemented in the superclass, diverging only in a specialized method called "start_doc_to_pixels_proc()". This method obtains the process responsible that communicates with the isolation provider (container / disp VM) via `podman/docker` and qrexec on Containers and Qubes respectively. Known regressions: - progress reports stopped working on containers Fixes #443	2024-02-06 19:42:33 +00:00
deeplow	331b6514e8	Containers: remove debug messages (via files) Remove container_log messages ahead of debug info being sent over standard streams.	2024-02-06 18:54:39 +00:00
deeplow	dca46d0a6b	Homogenize qubes and containers inner convert method Simple rename of the __convert() method in the Qubes conversion to make the code structurally similar.	2024-02-06 18:54:31 +00:00
deeplow	cd99122385	Adds file formats: epub svg bmp pnm bpm ppm Partially fix for #660. Missing some files due to limitations [1]: - PSD - only available from PyMuPDF>=1.23.0 (qubes-fedora is lower) - TXT - only available from PyMuPDF>=1.23.7 (qubes-fedora is lower) - JXR - PyMuPDF was refusing to due to missing codec [1] - JPX - Generated test file was rejected by PyMuPDF [2] - FB2 - Most often cannot be detected by mime type alone [3] - CBZ - (idem) - XPS - (idem) - MOBI - (idem) - PAM - General version of other file format already included, so I decided not to include this extension [0] New test files were generated locally: - epub - generated with calibre's convert-ebook from another sample file - svg - generated with inkscape from a mix of a default template (hexagons) and a logo's PNG file - bmp, pnm, bpm, ppm - generated with ImageMagick's 'convert' from tests/test_docs/sample-png.png [0]: https://github.com/freedomofpress/dangerzone/issues/660#issuecomment-1914681487 [1]: https://github.com/freedomofpress/dangerzone/issues/660#issuecomment-1916803201 [2]: https://github.com/freedomofpress/dangerzone/issues/660#issuecomment-1916870347 [3]: https://github.com/freedomofpress/dangerzone/issues/688	2024-01-31 19:58:48 +00:00
deeplow	4e720aa6e2	Replace 'None' conversion type with "PyMuPDF" Replaced for clarity over the fact that this conversion is in fact handled by PyMuPDF.	2024-01-31 19:58:36 +00:00
sudwhiwdh	b4ef47e101	GUI header capitalisation	2024-01-22 11:38:54 +00:00
deeplow	f1d90c6fa9	Compress per page when not using OCR Make the compression happen per page when OCR is not enabled [1]. [1]: https://github.com/freedomofpress/dangerzone/pull/622#discussion_r1410986342	2024-01-03 12:58:36 +00:00
deeplow	e2531279c0	FIXUP Revert "Disable image compression when saving PDF" This reverts commit f074db0beaa50389634203657f9b46307164a353.	2024-01-03 12:58:36 +00:00
deeplow	ee35e28aa6	Disable image compression when saving PDF Some tests [1] lead to the conclusion that ocr_compression does the same to the file (performance and size-wise) to the file as deflating images when saving the file. However, both methods active do add a bit of extra time. For this reason we're disabling the image deflation (default option). [1]: https://github.com/freedomofpress/dangerzone/pull/622#discussion_r1434042296	2024-01-03 12:58:36 +00:00
deeplow	6f61e44502	Solve import errors by lazy-loading fitz module Qubes does on-host pixels-to-pdf whereas the containers version doesn't. This leads to an issue where on the containers version it tries to load fitz, which isn't installed there, just because it's trying to check if it should run the Qubes version. The error it was showing was something like this: ImportError while loading conftest '/home/user/dangerzone/tests/conftest.py'. tests/__init__.py:8: in <module> from dangerzone.document import SAFE_EXTENSION dangerzone/__init__.py:16: in <module> from .gui import gui_main as main dangerzone/gui/__init__.py:28: in <module> from ..isolation_provider.qubes import Qubes, is_qubes_native_conversion dangerzone/isolation_provider/qubes.py:15: in <module> from ..conversion.pixels_to_pdf import PixelsToPDF dangerzone/conversion/pixels_to_pdf.py:16: in <module> import fitz E ModuleNotFoundError: No module named 'fitz' For context see discussion in [1]. [1]: https://github.com/freedomofpress/dangerzone/pull/622#issuecomment-1839164885	2024-01-03 12:58:36 +00:00
deeplow	80db7bb02e	Remove pre-pymupdf exceptions and detect pymupdf ones	2024-01-03 12:58:35 +00:00
deeplow	b75417bbec	Remove all server-side timeouts from doc to pixels Now we're using client-side timeouts so the server side-ones are not needed. Implemented following the suggestion from @apyrgio [1]. [1]: https://github.com/freedomofpress/dangerzone/pull/622#discussion_r1413906514	2024-01-03 12:58:35 +00:00
deeplow	576cbd3382	Fix DPI mismatch between doc2pixels and pixels2pdf The original document was larger in dimensions than the original one due to a mismatch in DPI settings. When converting documents to pixels we were setting the DPI to 150 pixels per inch. Then when converting back into a PDF we were using 70 DPI. This difference would result in an overall larger document in dimensions (though not necessarily in file size). Fixes #626	2024-01-03 12:58:34 +00:00

1 2 3 4 5 ...

602 commits