dangerzone

mirror of https://github.com/freedomofpress/dangerzone.git synced 2025-05-01 03:02:23 +02:00

Author	SHA1	Message	Date
Alex Pyrgiotis	27d201a95b	container: Avoid pop-ups on Windows Avoid window pop-ups on Windows systems, by using the `startupinfo` argument of `subprocess.run`.	2024-09-27 12:55:46 +03:00
Alex Pyrgiotis	0a181a3342	container: Set `container_engine_t` SELinux label Set the `container_engine_t` SELinux on the outer Podman container, so that gVisor does not break on systems where SELinux is enforcing. This label is provided for container engines running within a container, which fits our `runsc` within `crun` situation. We have considered using the more permissive `label=disable` option, to disable SELinux labels altogether, but we want to take advantage of as many SELinux protections as we can, even for the outer container. Cherry-picked from `e1e63d14f8` Fixes #880	2024-07-30 16:41:13 +03:00
Alex Pyrgiotis	756945931f	container: Handle case where `docker kill` hangs We have encountered several conversions where the `docker kill` command hangs. Handle this case by specifying a timeout to this command. If the timeout expires, log a warning and proceed with the rest of the termination logic (i.e., kill the conversion process). Fixes #854	2024-07-01 17:56:21 +03:00
Alex Pyrgiotis	e7e3430ca1	Use a custom seccomp policy for older Docker Desktop releases We are aware that some Docker Desktop releases before 25.0.0 ship with a seccomp policy which disables the `ptrace(2)` system call. In such cases, we opt to use our own seccomp policy which allows this system call. This seccomp policy is the default one in the latest releases of Podman, and we use it in Linux distributions where Podman version is < 4.0. Fixes #846	2024-06-26 18:49:03 +03:00
Etienne Perot	f03bc71855	Sandbox all Dangerzone document processing within gVisor. This wraps the existing container image inside a gVisor-based sandbox. gVisor is an open-source OCI-compliant container runtime. It is a userspace reimplementation of the Linux kernel in a memory-safe language. It works by creating a sandboxed environment in which regular Linux applications run, but their system calls are intercepted by gVisor. gVisor then redirects these system calls and reinterprets them in its own kernel. This means the host Linux kernel is isolated from the sandboxed application, thereby providing protection against Linux container escape attacks. It also uses `seccomp-bpf` to provide a secondary layer of defense against container escapes. Even if its userspace kernel gets compromised, attackers would have to additionally have a Linux container escape vector, and that exploit would have to fit within the restricted `seccomp-bpf` rules that gVisor adds on itself. Fixes #126 Fixes #224 Fixes #225 Fixes #228	2024-06-12 13:40:04 +03:00
Alex Pyrgiotis	7179d6f734	Get container runtime version Get the (major, minor) parts of the Docker/Podman version, to check if some specific features can be used, or if we need a fallback. These features are related with the upcoming gVisor integration, and will be added in subsequent commits.	2024-06-12 13:40:04 +03:00
Alexis Métaireau	d9d9ab91a3	docs: document why `get_tmp_dir` is required in the imports	2024-06-05 14:19:32 +02:00
Alexis Métaireau	eba30f3c17	fix: do not catch bare exceptions Bare excepts will catch keyboard-exit exceptions, system-exit etc. which is probably not what we want.	2024-06-05 14:19:31 +02:00
Alexis Métaireau	65a8827daa	chore: minor linting A few minor changes about when to use `==` and when to use `is`. Basically, this uses `is` for booleans, and `==` for other values. With a few other changes about coding style which was enforced by `ruff`.	2024-06-05 14:19:31 +02:00
Alexis Métaireau	cbbd6afcc1	chore: remove unused code This commit removes code that's not being used, it can be exceptions with the `as e` where the exception itself is not used, the same with `with` statements, and some other parts where there were duplicated code.	2024-06-05 14:19:31 +02:00
Alexis Métaireau	5aa4863b52	chore(imports): remove useless imports As detected by [ruff](https://github.com/astral-sh/ruff) Related to #254, although it doesn't provide the command to lint the codebase itself.	2024-06-05 14:19:30 +02:00
Alex Pyrgiotis	ff25fa3045	Fix stuck conversion processes Gracefully terminate certain conversion processes that may get stuck when writing lots of data to stdout. Also, handle a race condition when a conversion process terminates slightly after the associated container. Fixes #791	2024-05-09 16:46:15 +03:00
Alex Pyrgiotis	37bf9badf4	Remove extraneous log sanitization Remove an extra call to `replace_control_chars()`, as well as an unnecessary method.	2024-05-09 15:57:42 +03:00
Alex Pyrgiotis	0b45360384	Keep newlines when reading debug logs In `d632908a44` we improved our `replace_control_chars()` function, by replacing every control or invalid Unicode character with a placeholder one. This change, however, made our debug logs harder to read, since newlines were not preserved. There are indeed various cases in which replacing newlines is wise (e.g., in filenames), so we should keep this behavior by default. However, specifically for reading debug logs, we add an option to keep newlines to improve readability, at no expense to security.	2024-05-09 15:57:42 +03:00
Alex Pyrgiotis	d6202cd028	Invoke external command on Windows properly On Windows, if we don't use the `startupinfo=` argument of subprocess.Popen, then a terminal window will flash while running the command. Use `startupinfo=` when killing a container, as we do for every other command.	2024-05-09 15:57:42 +03:00
Alex Pyrgiotis	f57d2f7191	isolation_provider: Always terminate spawned process Previously, we always assumed that the spawned process would quit within 3 seconds. This was an arbitrary call, and did not work in practice. We can improve our standing here by doing the following: 1. Make `Popen.wait()` calls take a generous amount of time (since they are usually on the sad path), and handle any timeout errors that they throw. This way, a slow conversion process cleanup does not take too much of our users time, nor is it reported as an error. 2. Always make sure that once the conversion of doc to pixels is over, the corresponding process will finish within a reasonable amount of time as well. Fixes #749	2024-04-24 14:39:15 +03:00
Alex Pyrgiotis	cd4cbdb00a	isolation_provider: Get exit code without timing out Get the exit code of the spawned process for the doc-to-pixels phase, without timing out. More specifically, if the spawned process has not finished within a generous amount of time (hardcode to 15 seconds), return UnexpectedConversionError, with a custom message. This way, the happy path is not affected, and we still make our best to learn the underlying cause of the I/O error.	2024-04-24 14:36:14 +03:00
Alex Pyrgiotis	171a7eca52	isolation_provider: Terminate doc-to-pixels proc Extend the IsolationProvider class with a `terminate_doc_to_pixels_proc()` method, which must be implemented by the Qubes/Container providers and gracefully terminate a process started for the doc to pixels phase. Refs #563	2024-04-24 14:36:14 +03:00
Alex Pyrgiotis	a63f4b85eb	isolation_provider: Set a unique name for spawned containers Set a unique name for spawned containers, based on the ID of the provided document. This ID is not globally unique, as it has few bits of entropy. However, since we only want to avoid collisions within a single Dangerzone invocation, and since we can't support multiple containers running in parallel, this ID will suffice.	2024-04-24 14:33:33 +03:00
Alex Pyrgiotis	6850d31edc	isolation_provider: Pass doc when creating doc-to-pixels proc Pass the Document instance that will be converted to the `IsolationProvider.start_doc_to_pixels_proc()` method. Concrete classes can then associate this name with the started process, so that they can later on kill it.	2024-04-24 14:33:33 +03:00
Alex Pyrgiotis	a31f3370d0	Capture missing logs in second-stage conversion For a while now, we didn't get logs for the second-stage conversion when using containers. Extend the code to log any captured output from the second stage conversion, only if we run Dangerzone via our dev entrypoint. Note that the Qubes isolation provider was always logging output from the second stage of the conversion.	2024-03-13 20:59:50 +02:00
deeplow	0449840ec3	dz.ConvertDev: do not teleport .pyc files On Qubes the conversion in dev mode would fail when converting from a Fedora 38 development qube via a Fedora 39 disposable qube. The reason was that dz.ConvertDev was receiving `.pyc` files, which were compiled for python 3.11 but running on python 3.12. Unfortunately PyZipFile objects cannot send source python files, even though the documentation is a little bit unclear on this [1]. Fixes #723 [1]: https://docs.python.org/3/library/zipfile.html#pyzipfile-objects	2024-03-13 07:13:39 +00:00
Alex Pyrgiotis	634523dac9	Get underlying error when conversion fails When we get an early EOF from the converter process, we should immediately get the exit code of that process, to find out the actual underlying error. Currently, the exception we raise masks the underlying error. Raise a ConverterProcException, that in turns makes our error handling code read the exit code of the spawned process, and converts it to a helpful error message. Fixes #714	2024-02-20 15:55:45 +02:00
Alex Pyrgiotis	6ee1d14c9a	Start conversion process earlier Start the conversion process earlier, so that we have a reference to the Popen object in case of an exception.	2024-02-20 15:55:45 +02:00
deeplow	e4a5dbce46	Don't show 50% duplicated progress info 50% would show twice in the conversion progress due to an overlap in conversion progress values. The doc_to_pixels would be from 0-50% and the pixels_to_pdf from 50%-100%. This commit makes the first part go from 0 to 49% instead. Fixes #715	2024-02-20 13:47:15 +00:00
deeplow	879fca6f9f	Remove uneeded TESSDATA_PREFIX setting in container The container image does not need the TESSDATA_PREFIX env variable since its PyMuPDF version is new enough to support `tessdata` as an argument when calling the PyMuPDF tesseract method.	2024-02-07 13:14:08 +00:00
deeplow	69c2a02d81	Remove timeouts Remove timeouts due to several reasons: 1. Lost purpose: after implementing the containers page streaming the only subprocess we have left is LibreOffice. So don't have such a big risk of commands hanging (the original reason for timeouts). 2. Little benefit: predicting execution time is generically unsolvable computer science problem. Ultimately we were guessing an arbitrary time based on the number of pages and the document size. As a guess we made it pretty lax (30s per page or MB). A document hanging for this long will probably lead to user frustration in any case and the user may be compelled to abort the conversion. 3. Technical Challenges with non-blocking timeout: there have been several technical challenges in keeping timeouts that we've made effort to accommodate. A significant one was having to do non-blocking read to ensure we could timeout when reading conversion stream (and then used here) Fixes #687	2024-02-06 20:11:43 +00:00
deeplow	07dd54cd13	Fix hanging: disable container logging The conversion was hanging arbitrarily [1] on some systems. Sometimes it would send the full page other times stop half-way. Originally found by @apyrgio. Co-authored-by: @apyrgio [1]: https://github.com/freedomofpress/dangerzone/pull/627#issuecomment-1892491968	2024-02-06 19:42:41 +00:00
deeplow	f3032a7142	Make big endian explicit in int to bytes Fix issues in older distros that don't yet support python 3.11 where endianness was not a default argument [1]. This is in response to CI failures [2]. [1]: https://docs.python.org/3/library/stdtypes.html#int.to_bytes [2]: https://app.circleci.com/pipelines/github/freedomofpress/dangerzone/2186/workflows/e340ca21-85ce-42b6-9bc3-09e66f96684a/jobs/27380y	2024-02-06 19:42:41 +00:00
deeplow	1835756b45	Allow each conversion to have its own proc If we increased the number of parallel conversions, we'd run into an issue where the streams were getting mixed together. This was because the Converter.proc was a single attribute. This breaks it down into a local variable such that this mixup doesn't happen.	2024-02-06 19:42:41 +00:00
deeplow	61e7a3c107	Fix isolation provider tests Conversions methods had changed and that was part of the reason why the tests were failing. Furthermore, due to the `provider.proc`, which stores the associated qrexec / container process, "server" exceptions raise a IterruptedConversion error (now ConverterProcException), which then requires interpretation of the process exit code to obtain the "real" exception.	2024-02-06 19:42:41 +00:00
deeplow	550786adfe	Remove untrusted progress parsing (stderr instead) Now that only the second container can send JSON-encoded progress information, we can the untrusted JSON parsing. The parse_progress was also renamed to `parse_progress_trusted` to ensure future developers don't mistake this as a safe method. The old methods for sending untrusted JSON were repurposed to send the progress instead to stderr for troubleshooting in development mode. Fixes #456	2024-02-06 19:42:40 +00:00
deeplow	c991e530d0	Fix IsolationProvider.percentage variable reuse If one converted more than one document, since the state of IsolationProvider.percentage would be stored in the IsolationProvider instance, it would get reused for the second document. The fix is to keep it as a local variable, but we can explore having progress stored on the document itself, for example. Or having one IsolationProvider per conversion.	2024-02-06 19:42:40 +00:00
deeplow	0a099540c8	Stream pages in containers: merge isolation providers Merge Qubes and Containers isolation providers core code into the class parent IsolationProviders abstract class. This is done by streaming pages in containers for exclusively in first conversion process. The commit is rather large due to the multiple interdependencies of the code, making it difficult to split into various commits. The main conversion method (_convert) now in the superclass simply calls two methods: - doc_to_pixels() - pixels_to_pdf() Critically, doc_to_pixels is implemented in the superclass, diverging only in a specialized method called "start_doc_to_pixels_proc()". This method obtains the process responsible that communicates with the isolation provider (container / disp VM) via `podman/docker` and qrexec on Containers and Qubes respectively. Known regressions: - progress reports stopped working on containers Fixes #443	2024-02-06 19:42:33 +00:00
deeplow	331b6514e8	Containers: remove debug messages (via files) Remove container_log messages ahead of debug info being sent over standard streams.	2024-02-06 18:54:39 +00:00
deeplow	dca46d0a6b	Homogenize qubes and containers inner convert method Simple rename of the __convert() method in the Qubes conversion to make the code structurally similar.	2024-02-06 18:54:31 +00:00
deeplow	77d5ea5940	Add PyMuPDF in pixels_to_pdf replacing old logic Adding PyMuPDF essentially make the code much simpler since it can do everything that we'd need multiple programs for. It also includes tesseract-OCR integration, which this commit makes use of.	2024-01-03 12:56:33 +00:00
Alex Pyrgiotis	edfba0c783	Qubes: Fix progress in first stage of Qubes conversion	2023-10-13 22:44:37 +03:00
Alex Pyrgiotis	3daf0e2cb7	Do not show file previews in case of exceptions If a Qubes conversion encounters an exception that is not a subclass of ConversionException, it will still show a preview of a file that does not exist. Send an error progress report in that case, so that the GUI code can detect that an error occurred and not open a file preview Fixes #581	2023-10-05 11:11:42 +03:00
Alex Pyrgiotis	bdf3f8babc	qubes: Clean up temporary files Create a temporary dir before the conversion begins, and store every file necessary for the conversion there. We are mostly concerned about the second stage of the conversion, which runs in the host. The first stage runs in a disposable qube and cleanup is implicit. Fixes #575 Fixes #436	2023-10-04 14:05:23 +03:00
Alex Pyrgiotis	6232062146	Add missing newline char	2023-10-02 15:41:29 +03:00
Alex Pyrgiotis	b7b76174ab	qubes: Log captured output for the second stage Log the captured command output during the second stage, only in dev environments. This follows what we have already done for the first stage.	2023-10-02 15:41:29 +03:00
Alex Pyrgiotis	16603875d6	qubes: Display all errors in second stage If a command encounters an error or times out during the second stage of the conversion in Qubes, handle it the same way as we would have handled it in the first stage: 1. Get its error message. 2. Throw an UnexpectedConversionError exception, with the original message. Note that, because the second stage takes place locally, users will see the original content of the error. Refs #567 Closes #430	2023-10-02 15:41:17 +03:00
deeplow	0a6b33ebed	Qubes: detect qube failing to start (missing RAM) In Qubes OS it's often the case that the user doesn't have enough RAM to start the conversion. In this case it raises BrokenPipeException and exits with code 126. It didn't seem possible to distinguish this kind of failure to one where the user has misconfigured qrexec policies. NOTE: this approach is not ideal UX-wise. After the first doc failing the next one will also try and fail. Upon first failure we should inform the user that they need to close some programs or qubes.	2023-09-28 11:08:50 +01:00
deeplow	63f03d5bcd	Add limit and test to max width and height of docs	2023-09-28 11:08:47 +01:00
deeplow	54b8ffbf96	Add page limit of 10000 Theoretically the max pages would be 65536 (2byte unsigned int. However this limit is much higher than practical documents have and larger ones can lead to unforseen problems, for example RAM limitations. We thus opted to use a lower limit of 10K. The limit must be detected client-side, given that the server is distrusted. However we also check it in the server, just as a fail-early mechanism.	2023-09-28 11:01:14 +01:00
Alex Pyrgiotis	18b73d94b0	qubes: Find out reason of interrupted conversions If a conversion has been interrupted (usually due to an EOF), figure out why this happened by checking the exit code of the spawned process.	2023-09-26 17:35:26 +03:00
Alex Pyrgiotis	30196ff35b	errors: Add error for interrupted conversions Add an error for interrupted conversions, in order to better differentiate this scenario from other ValueErrors that may be raised throughout the code's lifetime.	2023-09-26 17:35:26 +03:00
Alex Pyrgiotis	0273522fb1	qubes: Store the process for the spawned qube Store, in an instance attribute, the process that we have started for the spawned disposable qube. In subsequent commits, we will use it from other places as well, aside from the `_convert` method. Note that this commit does not alter the conversion logic, and only does the following: 1. Renames `p.` to `self.proc.` 2. Adds an `__init__` method to the Qubes isolation provider, and initializes the `self.proc` attribute to `None`. 3. Adds an assert that `self.proc` is not `None` after it's spawned, to placate Mypy.	2023-09-26 17:35:25 +03:00
deeplow	e08b6defc3	Round conversion progress from float to int Fixes #553	2023-09-26 15:20:41 +01:00

1 2

87 commits