dangerzone

mirror of https://github.com/freedomofpress/dangerzone.git synced 2025-05-01 11:12:24 +02:00

Author	SHA1	Message	Date
Alex Pyrgiotis	37bf9badf4	Remove extraneous log sanitization Remove an extra call to `replace_control_chars()`, as well as an unnecessary method.	2024-05-09 15:57:42 +03:00
Alex Pyrgiotis	0b45360384	Keep newlines when reading debug logs In `d632908a44` we improved our `replace_control_chars()` function, by replacing every control or invalid Unicode character with a placeholder one. This change, however, made our debug logs harder to read, since newlines were not preserved. There are indeed various cases in which replacing newlines is wise (e.g., in filenames), so we should keep this behavior by default. However, specifically for reading debug logs, we add an option to keep newlines to improve readability, at no expense to security.	2024-05-09 15:57:42 +03:00
Alex Pyrgiotis	e11aaec3ac	Always use sys.exit when exiting the application The `exit()` [1] function is not necessarily present in every Python environment, as it's added by the `site` module. Also, this function is "[...] useful for the interactive interpreter shell and should not be used in programs" For this reason, we replace all such occurrences with `sys.exit()` [2], which is the canonical function to exit Python programs. [1]: https://docs.python.org/3/library/constants.html#exit [2]: https://docs.python.org/3/library/sys.html#sys.exit	2024-05-09 15:57:42 +03:00
Alex Pyrgiotis	d6202cd028	Invoke external command on Windows properly On Windows, if we don't use the `startupinfo=` argument of subprocess.Popen, then a terminal window will flash while running the command. Use `startupinfo=` when killing a container, as we do for every other command.	2024-05-09 15:57:42 +03:00
Alex Pyrgiotis	1c70ee6771	Fix archiving the same doc twice on Windows On Windows, if we somehow attempt to archive the same document twice (e.g, because it got archived once, and then we copy it back), we will get an error, because Windows does not overwrite the target path, if it already exists. Fix this issue by always removing the previously archived version, when performing the next archival action, and update our tests.	2024-05-09 15:57:42 +03:00
Naglis Jonaitis	8cdb2d5720	Set the desktop filename and app name of the Qt application Currently, the app ID of the Dangerzone GUI application when running under Wayland is `python3`, which is not very useful if one wants to automate some action related to the Dangerzone application window (e.g. to always start Dangerzone window in floating mode under Sway WM). Setting the desktop filename property also sets the app ID of the application under Wayland. According to Qt documentation[1], the property value should be the name of the application's .desktop file but without the extension. Qt documentation also states: > This property gives a precise indication of what desktop entry > represents the application and it is needed by the windowing system to > retrieve such information without resorting to imprecise heuristics. Therefore I also think that setting this property is needed to display the correct application name and icon (taken from the .desktop entry) when running under certain windowing systems (like Wayland) (see also #402). Note that this property is not enough, as we've encountered systems where setting just the desktop file name does not alter the detected application name by the window manager. For this reason, we also use set the application name [2] to `dangerzone`, to remove any ambiguity. [1]: https://doc.qt.io/qt-6/qguiapplication.html#desktopFileName-prop [2]: https://doc.qt.io/qt-6/qcoreapplication.html#applicationName-prop Fixes #402	2024-04-25 17:23:02 +03:00
Naglis Jonaitis	d632908a44	Fix printing of filenames with surrogate escapes On Unix systems a filename can be a sequence of bytes that is not valid UTF-8. Python uses[1] surrogate escapes to allow to decode such filenames to Unicode (bytes that cannot be decoded are replaced by a surrogate; upon encoding the surrogate is converted to the original byte). From `click` docs[2]: > Invalid bytes or surrogate escapes will raise an error when written > to a stream with `errors="strict"`. This will typically happen with > `stdout` when the locale is something like `en_GB.UTF-8`. To fix that, we use `utils.replace_control_chars()` before printing the filenames to `stdout` so that surrogate escapes are replaced by �. Fixes #768	2024-04-25 14:11:25 +03:00
Naglis Jonaitis	52ced04507	Relax the restrictions of util.replace_control_chars The `util.replace_control_chars()` function was overly strict, and would replace every non-ASCII character with "_". This included both control characters, as well as normal characters in a non-English alphabet. Relax these restrictions by checking each character and deciding if it's a Unicode control character, using the `unicodedata` Python package. With this change, emojis and non-English letters are now allowed.	2024-04-25 14:11:16 +03:00
Alex Pyrgiotis	f57d2f7191	isolation_provider: Always terminate spawned process Previously, we always assumed that the spawned process would quit within 3 seconds. This was an arbitrary call, and did not work in practice. We can improve our standing here by doing the following: 1. Make `Popen.wait()` calls take a generous amount of time (since they are usually on the sad path), and handle any timeout errors that they throw. This way, a slow conversion process cleanup does not take too much of our users time, nor is it reported as an error. 2. Always make sure that once the conversion of doc to pixels is over, the corresponding process will finish within a reasonable amount of time as well. Fixes #749	2024-04-24 14:39:15 +03:00
Alex Pyrgiotis	cd4cbdb00a	isolation_provider: Get exit code without timing out Get the exit code of the spawned process for the doc-to-pixels phase, without timing out. More specifically, if the spawned process has not finished within a generous amount of time (hardcode to 15 seconds), return UnexpectedConversionError, with a custom message. This way, the happy path is not affected, and we still make our best to learn the underlying cause of the I/O error.	2024-04-24 14:36:14 +03:00
Alex Pyrgiotis	171a7eca52	isolation_provider: Terminate doc-to-pixels proc Extend the IsolationProvider class with a `terminate_doc_to_pixels_proc()` method, which must be implemented by the Qubes/Container providers and gracefully terminate a process started for the doc to pixels phase. Refs #563	2024-04-24 14:36:14 +03:00
Alex Pyrgiotis	a63f4b85eb	isolation_provider: Set a unique name for spawned containers Set a unique name for spawned containers, based on the ID of the provided document. This ID is not globally unique, as it has few bits of entropy. However, since we only want to avoid collisions within a single Dangerzone invocation, and since we can't support multiple containers running in parallel, this ID will suffice.	2024-04-24 14:33:33 +03:00
Alex Pyrgiotis	6850d31edc	isolation_provider: Pass doc when creating doc-to-pixels proc Pass the Document instance that will be converted to the `IsolationProvider.start_doc_to_pixels_proc()` method. Concrete classes can then associate this name with the started process, so that they can later on kill it.	2024-04-24 14:33:33 +03:00
deeplow	dfcb10c494	Move settings.json into constant Move settings.json into a constant so that they can later be referred to by the testing module.	2024-04-01 18:18:41 +03:00
deeplow	ad16a0e471	Fix Settings().set() when setting new setting Settings().set() would fail if we were trying to set a setting that did not exist before. The reason is because before setting it would try to get the previous value, but though direct key access, which would lead to an exception.	2024-04-01 18:18:41 +03:00
Alex Pyrgiotis	74c467eaf7	conversion: Do not let PyMuPDF print to stdout PyMuPDF has some hardcoded log messages that print to stdout [1]. We don't have a way to silence them, because they don't use the Python logging infrastructure. What we can do here is silence a particular call that's been creating debug messages. For a long term solution, we have sent a PR to the PyMuPDF team, and we will follow up there [2]. Fixes #700 [1]: https://github.com/freedomofpress/dangerzone/issues/700 [2]: https://github.com/pymupdf/PyMuPDF/pull/3137	2024-03-13 21:03:15 +02:00
Alex Pyrgiotis	a31f3370d0	Capture missing logs in second-stage conversion For a while now, we didn't get logs for the second-stage conversion when using containers. Extend the code to log any captured output from the second stage conversion, only if we run Dangerzone via our dev entrypoint. Note that the Qubes isolation provider was always logging output from the second stage of the conversion.	2024-03-13 20:59:50 +02:00
deeplow	0449840ec3	dz.ConvertDev: do not teleport .pyc files On Qubes the conversion in dev mode would fail when converting from a Fedora 38 development qube via a Fedora 39 disposable qube. The reason was that dz.ConvertDev was receiving `.pyc` files, which were compiled for python 3.11 but running on python 3.12. Unfortunately PyZipFile objects cannot send source python files, even though the documentation is a little bit unclear on this [1]. Fixes #723 [1]: https://docs.python.org/3/library/zipfile.html#pyzipfile-objects	2024-03-13 07:13:39 +00:00
Alex Pyrgiotis	f75d471ec8	Fix OCR bug in Qubes Fedora 38 templates Provide a fix for an OCR bug that affected Fedora 38 templates of Qubes OS. In that specific configuration, the PyMuPDF version accepts the Tesseract data directory only from the `TESSDATA_PREFIX` environment variable. Our mistake was that we were setting this environment variable in a dev script, instead of setting it for all configurations. In this commit, we set an attribute in the fitz.fitz module, so that both dev scripts and end-user installations can work. This is hacky, but it targets an old PyMuPDF release after all, so we don't expect things to break in the long run. Fixes #737	2024-03-04 16:53:04 +02:00
Alex Pyrgiotis	5b6911af84	Properly add new file extensions Accept `.svg` and `.bmp` files when browsing via the Dangerzone GUI. Support for these extensions has already been added in the converter code that runs in the sandbox (`cd99122385`) but they were erroneously left out from the filter in the Dangerzone main window.	2024-02-20 16:02:38 +02:00
Alex Pyrgiotis	e73f10f99b	Handle gracefully unknown error codes Do not throw exceptions for unknown error codes. If `get_proc_exception()` gets called from within an exception context and raises an exception itself, then this exception will not get caught, and it will get lost. Prefer instead to return an exception class that we have for this purpose, and show to the user the unknown error code of the converesion process.	2024-02-20 16:00:35 +02:00
Alex Pyrgiotis	634523dac9	Get underlying error when conversion fails When we get an early EOF from the converter process, we should immediately get the exit code of that process, to find out the actual underlying error. Currently, the exception we raise masks the underlying error. Raise a ConverterProcException, that in turns makes our error handling code read the exit code of the spawned process, and converts it to a helpful error message. Fixes #714	2024-02-20 15:55:45 +02:00
Alex Pyrgiotis	6ee1d14c9a	Start conversion process earlier Start the conversion process earlier, so that we have a reference to the Popen object in case of an exception.	2024-02-20 15:55:45 +02:00
deeplow	e4a5dbce46	Don't show 50% duplicated progress info 50% would show twice in the conversion progress due to an overlap in conversion progress values. The doc_to_pixels would be from 0-50% and the pixels_to_pdf from 50%-100%. This commit makes the first part go from 0 to 49% instead. Fixes #715	2024-02-20 13:47:15 +00:00
deeplow	75f8d76c5b	Appease new version of black lint tool	2024-02-13 11:36:10 +00:00
deeplow	879fca6f9f	Remove uneeded TESSDATA_PREFIX setting in container The container image does not need the TESSDATA_PREFIX env variable since its PyMuPDF version is new enough to support `tessdata` as an argument when calling the PyMuPDF tesseract method.	2024-02-07 13:14:08 +00:00
deeplow	6006beeb03	Fix OCR on Qubes: PyMuPDF required TESSDATA_PREFIX PyMuPDF versions lower than 1.22.5 pass the tesseract data path as an argument to `pixmap.pdfocr_tobytes()` [1], but lower versions require setting instead the TESSDATA_PREFIX environment variable [2]. Because on Qubes the pixels to pdf conversion happens on the host and Qubes has a lower PyMuPDF package version, we need to pass instead via environment variable. NOTE: the TESSDATA_PREFIX env. variable was set in dangerzone-cli instead of closer to the calling method in `doc_to_pixels.py` since PyMuPDF reads this variable as soon as the fitz module is imported [3][4]. [1]: https://pymupdf.readthedocs.io/en/latest/pixmap.html#Pixmap.pdfocr_tobytes [2]: https://pymupdf.readthedocs.io/en/latest/installation.html#enabling-integrated-ocr-support [3]: https://github.com/pymupdf/PyMuPDF/discussions/2439 [4]: https://github.com/pymupdf/PyMuPDF/blob/5d6a7db/src/__init__.py#L159 Fixes #682	2024-02-07 13:13:10 +00:00
deeplow	8a32d80762	Remove leftover progress variable in pixels_to_pdf Since the progress information is now inferred on host based on the number of pages obtained, progress-tracking variables should be removed from the server.	2024-02-06 20:11:52 +00:00
deeplow	69c2a02d81	Remove timeouts Remove timeouts due to several reasons: 1. Lost purpose: after implementing the containers page streaming the only subprocess we have left is LibreOffice. So don't have such a big risk of commands hanging (the original reason for timeouts). 2. Little benefit: predicting execution time is generically unsolvable computer science problem. Ultimately we were guessing an arbitrary time based on the number of pages and the document size. As a guess we made it pretty lax (30s per page or MB). A document hanging for this long will probably lead to user frustration in any case and the user may be compelled to abort the conversion. 3. Technical Challenges with non-blocking timeout: there have been several technical challenges in keeping timeouts that we've made effort to accommodate. A significant one was having to do non-blocking read to ensure we could timeout when reading conversion stream (and then used here) Fixes #687	2024-02-06 20:11:43 +00:00
deeplow	4d3f2b32c7	Revert "Add Stopwatch implementation" This reverts commit `344d6f7bfa`. Stopwatch is no longer needed now that we're removing timeouts.	2024-02-06 19:42:42 +00:00
deeplow	f31374e33c	Revert "Add non-blocking read utility" This reverts commit `fea193e935`. This is part of the purge of timeout-related code since we no longer need it [1]. Non-blocking reads were introduced in the reverted commit in order to be able to cut a stream mid-way due to a timeout. This is no longer needed now that we're getting rid of timeouts. [1]: https://github.com/freedomofpress/dangerzone/issues/687	2024-02-06 19:42:41 +00:00
deeplow	07dd54cd13	Fix hanging: disable container logging The conversion was hanging arbitrarily [1] on some systems. Sometimes it would send the full page other times stop half-way. Originally found by @apyrgio. Co-authored-by: @apyrgio [1]: https://github.com/freedomofpress/dangerzone/pull/627#issuecomment-1892491968	2024-02-06 19:42:41 +00:00
deeplow	f3032a7142	Make big endian explicit in int to bytes Fix issues in older distros that don't yet support python 3.11 where endianness was not a default argument [1]. This is in response to CI failures [2]. [1]: https://docs.python.org/3/library/stdtypes.html#int.to_bytes [2]: https://app.circleci.com/pipelines/github/freedomofpress/dangerzone/2186/workflows/e340ca21-85ce-42b6-9bc3-09e66f96684a/jobs/27380y	2024-02-06 19:42:41 +00:00
deeplow	1835756b45	Allow each conversion to have its own proc If we increased the number of parallel conversions, we'd run into an issue where the streams were getting mixed together. This was because the Converter.proc was a single attribute. This breaks it down into a local variable such that this mixup doesn't happen.	2024-02-06 19:42:41 +00:00
deeplow	61e7a3c107	Fix isolation provider tests Conversions methods had changed and that was part of the reason why the tests were failing. Furthermore, due to the `provider.proc`, which stores the associated qrexec / container process, "server" exceptions raise a IterruptedConversion error (now ConverterProcException), which then requires interpretation of the process exit code to obtain the "real" exception.	2024-02-06 19:42:41 +00:00
deeplow	550786adfe	Remove untrusted progress parsing (stderr instead) Now that only the second container can send JSON-encoded progress information, we can the untrusted JSON parsing. The parse_progress was also renamed to `parse_progress_trusted` to ensure future developers don't mistake this as a safe method. The old methods for sending untrusted JSON were repurposed to send the progress instead to stderr for troubleshooting in development mode. Fixes #456	2024-02-06 19:42:40 +00:00
deeplow	c991e530d0	Fix IsolationProvider.percentage variable reuse If one converted more than one document, since the state of IsolationProvider.percentage would be stored in the IsolationProvider instance, it would get reused for the second document. The fix is to keep it as a local variable, but we can explore having progress stored on the document itself, for example. Or having one IsolationProvider per conversion.	2024-02-06 19:42:40 +00:00
deeplow	0a099540c8	Stream pages in containers: merge isolation providers Merge Qubes and Containers isolation providers core code into the class parent IsolationProviders abstract class. This is done by streaming pages in containers for exclusively in first conversion process. The commit is rather large due to the multiple interdependencies of the code, making it difficult to split into various commits. The main conversion method (_convert) now in the superclass simply calls two methods: - doc_to_pixels() - pixels_to_pdf() Critically, doc_to_pixels is implemented in the superclass, diverging only in a specialized method called "start_doc_to_pixels_proc()". This method obtains the process responsible that communicates with the isolation provider (container / disp VM) via `podman/docker` and qrexec on Containers and Qubes respectively. Known regressions: - progress reports stopped working on containers Fixes #443	2024-02-06 19:42:33 +00:00
deeplow	331b6514e8	Containers: remove debug messages (via files) Remove container_log messages ahead of debug info being sent over standard streams.	2024-02-06 18:54:39 +00:00
deeplow	dca46d0a6b	Homogenize qubes and containers inner convert method Simple rename of the __convert() method in the Qubes conversion to make the code structurally similar.	2024-02-06 18:54:31 +00:00
deeplow	cd99122385	Adds file formats: epub svg bmp pnm bpm ppm Partially fix for #660. Missing some files due to limitations [1]: - PSD - only available from PyMuPDF>=1.23.0 (qubes-fedora is lower) - TXT - only available from PyMuPDF>=1.23.7 (qubes-fedora is lower) - JXR - PyMuPDF was refusing to due to missing codec [1] - JPX - Generated test file was rejected by PyMuPDF [2] - FB2 - Most often cannot be detected by mime type alone [3] - CBZ - (idem) - XPS - (idem) - MOBI - (idem) - PAM - General version of other file format already included, so I decided not to include this extension [0] New test files were generated locally: - epub - generated with calibre's convert-ebook from another sample file - svg - generated with inkscape from a mix of a default template (hexagons) and a logo's PNG file - bmp, pnm, bpm, ppm - generated with ImageMagick's 'convert' from tests/test_docs/sample-png.png [0]: https://github.com/freedomofpress/dangerzone/issues/660#issuecomment-1914681487 [1]: https://github.com/freedomofpress/dangerzone/issues/660#issuecomment-1916803201 [2]: https://github.com/freedomofpress/dangerzone/issues/660#issuecomment-1916870347 [3]: https://github.com/freedomofpress/dangerzone/issues/688	2024-01-31 19:58:48 +00:00
deeplow	4e720aa6e2	Replace 'None' conversion type with "PyMuPDF" Replaced for clarity over the fact that this conversion is in fact handled by PyMuPDF.	2024-01-31 19:58:36 +00:00
sudwhiwdh	b4ef47e101	GUI header capitalisation	2024-01-22 11:38:54 +00:00
deeplow	f1d90c6fa9	Compress per page when not using OCR Make the compression happen per page when OCR is not enabled [1]. [1]: https://github.com/freedomofpress/dangerzone/pull/622#discussion_r1410986342	2024-01-03 12:58:36 +00:00
deeplow	e2531279c0	FIXUP Revert "Disable image compression when saving PDF" This reverts commit f074db0beaa50389634203657f9b46307164a353.	2024-01-03 12:58:36 +00:00
deeplow	ee35e28aa6	Disable image compression when saving PDF Some tests [1] lead to the conclusion that ocr_compression does the same to the file (performance and size-wise) to the file as deflating images when saving the file. However, both methods active do add a bit of extra time. For this reason we're disabling the image deflation (default option). [1]: https://github.com/freedomofpress/dangerzone/pull/622#discussion_r1434042296	2024-01-03 12:58:36 +00:00
deeplow	6f61e44502	Solve import errors by lazy-loading fitz module Qubes does on-host pixels-to-pdf whereas the containers version doesn't. This leads to an issue where on the containers version it tries to load fitz, which isn't installed there, just because it's trying to check if it should run the Qubes version. The error it was showing was something like this: ImportError while loading conftest '/home/user/dangerzone/tests/conftest.py'. tests/__init__.py:8: in <module> from dangerzone.document import SAFE_EXTENSION dangerzone/__init__.py:16: in <module> from .gui import gui_main as main dangerzone/gui/__init__.py:28: in <module> from ..isolation_provider.qubes import Qubes, is_qubes_native_conversion dangerzone/isolation_provider/qubes.py:15: in <module> from ..conversion.pixels_to_pdf import PixelsToPDF dangerzone/conversion/pixels_to_pdf.py:16: in <module> import fitz E ModuleNotFoundError: No module named 'fitz' For context see discussion in [1]. [1]: https://github.com/freedomofpress/dangerzone/pull/622#issuecomment-1839164885	2024-01-03 12:58:36 +00:00
deeplow	80db7bb02e	Remove pre-pymupdf exceptions and detect pymupdf ones	2024-01-03 12:58:35 +00:00
deeplow	b75417bbec	Remove all server-side timeouts from doc to pixels Now we're using client-side timeouts so the server side-ones are not needed. Implemented following the suggestion from @apyrgio [1]. [1]: https://github.com/freedomofpress/dangerzone/pull/622#discussion_r1413906514	2024-01-03 12:58:35 +00:00
deeplow	576cbd3382	Fix DPI mismatch between doc2pixels and pixels2pdf The original document was larger in dimensions than the original one due to a mismatch in DPI settings. When converting documents to pixels we were setting the DPI to 150 pixels per inch. Then when converting back into a PDF we were using 70 DPI. This difference would result in an overall larger document in dimensions (though not necessarily in file size). Fixes #626	2024-01-03 12:58:34 +00:00
deeplow	e5dbe25abb	Replace 'convert' with PyMuPDF for images PyMuPDF can also convert images of the types we already support so we don't need ImageMagick's 'convert'.	2024-01-03 12:58:34 +00:00
deeplow	77d5ea5940	Add PyMuPDF in pixels_to_pdf replacing old logic Adding PyMuPDF essentially make the code much simpler since it can do everything that we'd need multiple programs for. It also includes tesseract-OCR integration, which this commit makes use of.	2024-01-03 12:56:33 +00:00
deeplow	ba17016643	Doc_to_pixels: remove unneeded timeout Timeout can no longer be used since we're not calling a subprocess. We could still implement it, but it's more worthy to reply in yet-to-implement client-side timeouts (in containers).	2024-01-03 12:40:45 +00:00
deeplow	317deadbe4	Replace pdfinfo logic (get # pages) with PyMuPDF	2024-01-03 12:40:45 +00:00
deeplow	327ab8791f	Replace pdftoppm logic with PyMuPDF (native python) Use PyMuPDF (AGPL-licensed) within the container conversion to replace the pdf conversion to RGB. This massively simplifies the code since PyMuPDF is a native python library.	2024-01-03 12:40:45 +00:00
Alex Pyrgiotis	5bf7549b55	Fix typo	2023-12-29 18:30:48 +02:00
Moon Sungjoon	63aea4cb45	Enable HWP conversion on MacOS (Apple silicon CPU) This PR reverts the patch that disables HWP / HWPX conversion on MacOS M1. It does not fix conversion on Qubes OS (#494). Previously, HWP / HWPX conversion didn't work on MacOS (Apple silicon CPU) (#498) because libreoffice wasn't built with Java support on Alpine Linux for ARM (aarch64). Gratefully, the Alpine team has enabled Java support on the aarch64 system [1], so we can enable it again for ARM architectures. And this patch is included in Alpine 3.19 This commit was included in #541 and reverted on #562 due to a stability issue. Fixes #498 [1]: `74d443f479`	2023-12-13 12:57:22 +02:00
Garrett Robinson	53115b3ffa	Use more descriptive button labels in update check prompt	2023-10-31 12:52:34 +00:00
Alex Pyrgiotis	ba5adb33c0	Fix a bug in "Change Selection" Fix a bug in the "Change Selection" action, whereby changing your selection and picking files from another directory results in: "Dangerzone does not support adding documents from multiple locations. The newly added documents were ignored." To fix this, change the output directory when we change selection as well.	2023-10-13 22:45:11 +03:00
Alex Pyrgiotis	edfba0c783	Qubes: Fix progress in first stage of Qubes conversion	2023-10-13 22:44:37 +03:00
deeplow	186ddd6b1e	Allow user to override update checking on Linux The original intention of leaving the update checkbox in the hamburger menu was to let non-supported Linux distros (e.g. compiled from source) to check for updates. However, on Linux it ended up being disabled forcefully by default on startup. This takes into account an overriden update checkbox. Fixes #596	2023-10-13 17:01:53 +01:00
Alex Pyrgiotis	3daf0e2cb7	Do not show file previews in case of exceptions If a Qubes conversion encounters an exception that is not a subclass of ConversionException, it will still show a preview of a file that does not exist. Send an error progress report in that case, so that the GUI code can detect that an error occurred and not open a file preview Fixes #581	2023-10-05 11:11:42 +03:00
Alex Pyrgiotis	bdf3f8babc	qubes: Clean up temporary files Create a temporary dir before the conversion begins, and store every file necessary for the conversion there. We are mostly concerned about the second stage of the conversion, which runs in the host. The first stage runs in a disposable qube and cleanup is implicit. Fixes #575 Fixes #436	2023-10-04 14:05:23 +03:00
Alex Pyrgiotis	f37d89f042	conversion: Allow using a temp dir other than /tmp Extend the PixelsToPDF converter by adding an additional `tempdir` argument. This argument can be used to make the conversion use a different temporary directory other than `/tmp`. For containers, this extra arguments makes no difference, as it won't be used. For Qubes, this argument will allow storing files in a temporary dir that will be cleaned up once the conversion completes. Previously, these files would linger in the user's `/tmp`. Refs #575	2023-10-04 14:00:53 +03:00
Alex Pyrgiotis	4f66353639	Add dark mode logic in our dialogs Make our dialogs set the OSColorMode CSS property, so that we can properly style them. Refs #528	2023-10-02 16:34:56 +03:00
Alex Pyrgiotis	6232062146	Add missing newline char	2023-10-02 15:41:29 +03:00
Alex Pyrgiotis	b7b76174ab	qubes: Log captured output for the second stage Log the captured command output during the second stage, only in dev environments. This follows what we have already done for the first stage.	2023-10-02 15:41:29 +03:00
Alex Pyrgiotis	16603875d6	qubes: Display all errors in second stage If a command encounters an error or times out during the second stage of the conversion in Qubes, handle it the same way as we would have handled it in the first stage: 1. Get its error message. 2. Throw an UnexpectedConversionError exception, with the original message. Note that, because the second stage takes place locally, users will see the original content of the error. Refs #567 Closes #430	2023-10-02 15:41:17 +03:00
Alex Pyrgiotis	2016965c84	Revert "Enable HWP conversion on MacOS M1" This reverts commit `214ce9720d`. The rationale is that we want to wait until the LibreOffice package that allows HWP conversion in Alpine Linux lands in `alpine:latest`. For more info, read https://github.com/freedomofpress/dangerzone/issues/498#issuecomment-1739894100	2023-10-02 14:22:47 +03:00
deeplow	7daeccdfea	Prevent PDF from overwriting num_pages in Qubes This should only affect the alpha version of Qubes OS (in containers it only allows the attacker to control the timeout). In short, an attacker could have PDF metadata that would show before "Pages:" in the `pdfinfo` command output and this would essentially override the number of pages measured in the server. This could enable the attacker to shorten the number of pages of a document for example. Fixes #565	2023-10-02 12:18:12 +01:00
deeplow	dabdf6c286	FIXUP: rename to QubesQrexecFailed instead	2023-10-02 12:06:18 +01:00
deeplow	eb488b16c5	FIXUP: rename QubesNotEnoughRAMError to QubesConversionStartFailed	2023-10-02 11:51:55 +01:00
deeplow	9cfac7ac2a	Generalize "out of RAM" error to reflect other issues When qrexec-client-vm fails, it could be a symptom of various issues: - the system being out of RAM - dz-dvm not existing The exit code is the same in all cases (126), which makes it particularly tricky to solve in the client application. For this reason the approach is now to tell the user to see the qubes error notification on the top right of their screen.	2023-10-02 11:06:17 +01:00
Alex Pyrgiotis	ccf4132ea0	conversion: Add sanity check for page count Add a sanity check at the end of the conversion from doc to pixels, to ensure that the resulting document will have the same number of pages as the original one. Refs #560	2023-09-28 22:50:54 +03:00
Alex Pyrgiotis	b4e5cf5be7	qubes: Stream page data in real time Stream page data back to the caller, immediately after we read them from pdftoppm. This way, we have more accurate progress reports and timeouts. Fixes #557	2023-09-28 22:50:54 +03:00
Alex Pyrgiotis	4bb959f220	conversion: Add anchor points for streaming page data/metadata Introduce 4 new methods that can be overloaded by the Qubes isolation provider to stream page data/metadata back to the caller. For the time being, these methods do what they did before, i.e., write this info in files within the pixels directory.	2023-09-28 22:50:53 +03:00
Alex Pyrgiotis	6012cd1491	Improve EOF detection when reading command output Do not read a line from the command output and then check if we are at EOF, because it's possible that the writer immediately exited after writing the last line of output. Instead, switch the order of actions. This is a very serious bug that can lead to Dangerzone excluding the last page of the document. It should have bit us right from the start (see `aeeed411a0`), but it seems that the small period of time it takes the kernel to close the file descriptors was hiding this bug. Fixes #560	2023-09-28 22:50:53 +03:00
Garrett Robinson	46f978e6f0	Detect OS color mode and set as property for stylesheets Sets the detected OS color mode (dark/light) as a property on the QApplication so it can be referenced in stylesheets to select style rules suited to the OS color mode.	2023-09-28 17:20:34 +03:00
deeplow	0a6b33ebed	Qubes: detect qube failing to start (missing RAM) In Qubes OS it's often the case that the user doesn't have enough RAM to start the conversion. In this case it raises BrokenPipeException and exits with code 126. It didn't seem possible to distinguish this kind of failure to one where the user has misconfigured qrexec policies. NOTE: this approach is not ideal UX-wise. After the first doc failing the next one will also try and fail. Upon first failure we should inform the user that they need to close some programs or qubes.	2023-09-28 11:08:50 +01:00
deeplow	63f03d5bcd	Add limit and test to max width and height of docs	2023-09-28 11:08:47 +01:00
deeplow	54b8ffbf96	Add page limit of 10000 Theoretically the max pages would be 65536 (2byte unsigned int. However this limit is much higher than practical documents have and larger ones can lead to unforseen problems, for example RAM limitations. We thus opted to use a lower limit of 10K. The limit must be detected client-side, given that the server is distrusted. However we also check it in the server, just as a fail-early mechanism.	2023-09-28 11:01:14 +01:00
Alex Pyrgiotis	18b73d94b0	qubes: Find out reason of interrupted conversions If a conversion has been interrupted (usually due to an EOF), figure out why this happened by checking the exit code of the spawned process.	2023-09-26 17:35:26 +03:00
Alex Pyrgiotis	30196ff35b	errors: Add error for interrupted conversions Add an error for interrupted conversions, in order to better differentiate this scenario from other ValueErrors that may be raised throughout the code's lifetime.	2023-09-26 17:35:26 +03:00
Alex Pyrgiotis	0273522fb1	qubes: Store the process for the spawned qube Store, in an instance attribute, the process that we have started for the spawned disposable qube. In subsequent commits, we will use it from other places as well, aside from the `_convert` method. Note that this commit does not alter the conversion logic, and only does the following: 1. Renames `p.` to `self.proc.` 2. Adds an `__init__` method to the Qubes isolation provider, and initializes the `self.proc` attribute to `None`. 3. Adds an assert that `self.proc` is not `None` after it's spawned, to placate Mypy.	2023-09-26 17:35:25 +03:00
deeplow	e08b6defc3	Round conversion progress from float to int Fixes #553	2023-09-26 15:20:41 +01:00
deeplow	8d37ff15e0	Remove duplicated Qubes message: "Safe PDF Created" Fixes #555. This is a leftover from when we didn't have progress reports from the second stage conversion (AKA. pixels to PDF) in #429.	2023-09-26 12:16:48 +01:00
Alex Pyrgiotis	e64d1da61f	qubes: Pass OCR parameters properly Pass OCR parameters to conversion functions as arguments, instead of setting environment variables. Fixes #455	2023-09-20 18:04:40 +03:00
Alex Pyrgiotis	8a0c0a4673	Make parameter actually optional	2023-09-20 17:58:39 +03:00
Alex Pyrgiotis	20157bef58	Fix typo	2023-09-20 17:45:44 +03:00
Alex Pyrgiotis	99dd5f5139	qubes: Add client-side timeouts Extend the client-side capabilities of the Qubes isolation provider, by adding client-side timeout logic. This implementation brings the same logic that we used server-side to the client, by taking into account the original file size and the number of pages that the server returns. Since the code does not have the exact same insight as the server has, the calculated timeouts are in two places: 1. The timeout for getting the number of pages. This timeout takes into account: * the disposable qube startup time, and * the time it takes to convert a file type to PDF 2. The total timeout for converting the PDF into pixels, in the same way that we do it on the server-side. Besides these changes, we also ensure that partial reads (e.g., due to EOF) are detected (see exact=... argument) Some things that are not resolved in this commit are: * We have both client-side and server-side timeouts for the first phase of the conversion. Once containers can stream data back to the application (see #443), these server-side timeouts can be removed. * We do not show a proper error message when a timeout occurs. This will be part of the error handling PR (see #430) Fixes #446 Refs #443 Refs #430	2023-09-20 17:32:42 +03:00
Alex Pyrgiotis	55a4491ced	Consolidate import statements	2023-09-20 17:14:24 +03:00
Alex Pyrgiotis	c547ffc3b4	conversion: Factor out calculate_timeout Factor out the logic behind the calculate_timeout() method, used in Dangerzone conversions, so that isolation providers can call it directly.	2023-09-20 17:14:24 +03:00
Alex Pyrgiotis	fea193e935	Add non-blocking read utility Add a function that can read data from non-blocking fds, which we will used later on to read from standard streams with a timeout.	2023-09-20 17:14:24 +03:00
Alex Pyrgiotis	344d6f7bfa	Add Stopwatch implementation Add a simple stopwatch implementation to track the elapsed time since an event, or the remaining time until a timeout.	2023-09-20 17:14:23 +03:00
deeplow	94f569cdf5	Add error code for unexpected errors in conversion	2023-09-19 15:52:47 +01:00
deeplow	8e4f04a52e	Shift to conversion exit codes by 128 Distinguish from podman or other errors in called binaries by shifting the error codes by 128.	2023-09-19 15:34:00 +01:00
deeplow	b4c3e07d36	Remove attacker-controlled error messages Creates exceptions in the server code to be shared with the client via an identifying exit code. These exceptions are then reconstructed in the client. Refs #456 but does not completely fix it. Unexpected exceptions and progress descriptions are still passed in Containers.	2023-09-19 15:33:20 +01:00
Moon Sungjoon	214ce9720d	Enable HWP conversion on MacOS M1 This PR reverts the patch that disables HWP / HWPX conversion on MacOS M1. It does not fix conversion on Qubes OS (#494) Previously, HWP / HWPX conversion didn't work on MacOS M1 systems (#498) because libreoffice wasn't built with Java support on Alpine Linux for ARM (aarch64). Gratefully, the Alpine team has enabled Java support on the aarch64 system [1], so we can enable it again for ARM architectures. Fixes #498 [1]: `74d443f479`	2023-09-06 13:10:18 +03:00
deeplow	8ae88eb10a	Ensure updates checkbox updated after updates accepted Ensure the status of the toggle updates checkbox is updated, after the user is prompted to enable updates.	2023-08-23 16:46:45 +01:00
deeplow	8221a56c7d	Revert "Propagate "update check" prompt to UI checkbox" This reverts commit 3915a86642502b673aa0e47931823acbe66f1043.	2023-08-23 16:46:44 +01:00

1 2 3 4 5 ...

552 commits