dangerzone

mirror of https://github.com/freedomofpress/dangerzone.git synced 2025-05-02 19:51:49 +02:00

Author	SHA1	Message	Date
deeplow	f3032a7142	Make big endian explicit in int to bytes Fix issues in older distros that don't yet support python 3.11 where endianness was not a default argument [1]. This is in response to CI failures [2]. [1]: https://docs.python.org/3/library/stdtypes.html#int.to_bytes [2]: https://app.circleci.com/pipelines/github/freedomofpress/dangerzone/2186/workflows/e340ca21-85ce-42b6-9bc3-09e66f96684a/jobs/27380y	2024-02-06 19:42:41 +00:00
deeplow	1835756b45	Allow each conversion to have its own proc If we increased the number of parallel conversions, we'd run into an issue where the streams were getting mixed together. This was because the Converter.proc was a single attribute. This breaks it down into a local variable such that this mixup doesn't happen.	2024-02-06 19:42:41 +00:00
deeplow	61e7a3c107	Fix isolation provider tests Conversions methods had changed and that was part of the reason why the tests were failing. Furthermore, due to the `provider.proc`, which stores the associated qrexec / container process, "server" exceptions raise a IterruptedConversion error (now ConverterProcException), which then requires interpretation of the process exit code to obtain the "real" exception.	2024-02-06 19:42:41 +00:00
deeplow	550786adfe	Remove untrusted progress parsing (stderr instead) Now that only the second container can send JSON-encoded progress information, we can the untrusted JSON parsing. The parse_progress was also renamed to `parse_progress_trusted` to ensure future developers don't mistake this as a safe method. The old methods for sending untrusted JSON were repurposed to send the progress instead to stderr for troubleshooting in development mode. Fixes #456	2024-02-06 19:42:40 +00:00
deeplow	0a099540c8	Stream pages in containers: merge isolation providers Merge Qubes and Containers isolation providers core code into the class parent IsolationProviders abstract class. This is done by streaming pages in containers for exclusively in first conversion process. The commit is rather large due to the multiple interdependencies of the code, making it difficult to split into various commits. The main conversion method (_convert) now in the superclass simply calls two methods: - doc_to_pixels() - pixels_to_pdf() Critically, doc_to_pixels is implemented in the superclass, diverging only in a specialized method called "start_doc_to_pixels_proc()". This method obtains the process responsible that communicates with the isolation provider (container / disp VM) via `podman/docker` and qrexec on Containers and Qubes respectively. Known regressions: - progress reports stopped working on containers Fixes #443	2024-02-06 19:42:33 +00:00
deeplow	331b6514e8	Containers: remove debug messages (via files) Remove container_log messages ahead of debug info being sent over standard streams.	2024-02-06 18:54:39 +00:00
deeplow	cd99122385	Adds file formats: epub svg bmp pnm bpm ppm Partially fix for #660. Missing some files due to limitations [1]: - PSD - only available from PyMuPDF>=1.23.0 (qubes-fedora is lower) - TXT - only available from PyMuPDF>=1.23.7 (qubes-fedora is lower) - JXR - PyMuPDF was refusing to due to missing codec [1] - JPX - Generated test file was rejected by PyMuPDF [2] - FB2 - Most often cannot be detected by mime type alone [3] - CBZ - (idem) - XPS - (idem) - MOBI - (idem) - PAM - General version of other file format already included, so I decided not to include this extension [0] New test files were generated locally: - epub - generated with calibre's convert-ebook from another sample file - svg - generated with inkscape from a mix of a default template (hexagons) and a logo's PNG file - bmp, pnm, bpm, ppm - generated with ImageMagick's 'convert' from tests/test_docs/sample-png.png [0]: https://github.com/freedomofpress/dangerzone/issues/660#issuecomment-1914681487 [1]: https://github.com/freedomofpress/dangerzone/issues/660#issuecomment-1916803201 [2]: https://github.com/freedomofpress/dangerzone/issues/660#issuecomment-1916870347 [3]: https://github.com/freedomofpress/dangerzone/issues/688	2024-01-31 19:58:48 +00:00
deeplow	4e720aa6e2	Replace 'None' conversion type with "PyMuPDF" Replaced for clarity over the fact that this conversion is in fact handled by PyMuPDF.	2024-01-31 19:58:36 +00:00
deeplow	f1d90c6fa9	Compress per page when not using OCR Make the compression happen per page when OCR is not enabled [1]. [1]: https://github.com/freedomofpress/dangerzone/pull/622#discussion_r1410986342	2024-01-03 12:58:36 +00:00
deeplow	e2531279c0	FIXUP Revert "Disable image compression when saving PDF" This reverts commit f074db0beaa50389634203657f9b46307164a353.	2024-01-03 12:58:36 +00:00
deeplow	ee35e28aa6	Disable image compression when saving PDF Some tests [1] lead to the conclusion that ocr_compression does the same to the file (performance and size-wise) to the file as deflating images when saving the file. However, both methods active do add a bit of extra time. For this reason we're disabling the image deflation (default option). [1]: https://github.com/freedomofpress/dangerzone/pull/622#discussion_r1434042296	2024-01-03 12:58:36 +00:00
deeplow	6f61e44502	Solve import errors by lazy-loading fitz module Qubes does on-host pixels-to-pdf whereas the containers version doesn't. This leads to an issue where on the containers version it tries to load fitz, which isn't installed there, just because it's trying to check if it should run the Qubes version. The error it was showing was something like this: ImportError while loading conftest '/home/user/dangerzone/tests/conftest.py'. tests/__init__.py:8: in <module> from dangerzone.document import SAFE_EXTENSION dangerzone/__init__.py:16: in <module> from .gui import gui_main as main dangerzone/gui/__init__.py:28: in <module> from ..isolation_provider.qubes import Qubes, is_qubes_native_conversion dangerzone/isolation_provider/qubes.py:15: in <module> from ..conversion.pixels_to_pdf import PixelsToPDF dangerzone/conversion/pixels_to_pdf.py:16: in <module> import fitz E ModuleNotFoundError: No module named 'fitz' For context see discussion in [1]. [1]: https://github.com/freedomofpress/dangerzone/pull/622#issuecomment-1839164885	2024-01-03 12:58:36 +00:00
deeplow	80db7bb02e	Remove pre-pymupdf exceptions and detect pymupdf ones	2024-01-03 12:58:35 +00:00
deeplow	b75417bbec	Remove all server-side timeouts from doc to pixels Now we're using client-side timeouts so the server side-ones are not needed. Implemented following the suggestion from @apyrgio [1]. [1]: https://github.com/freedomofpress/dangerzone/pull/622#discussion_r1413906514	2024-01-03 12:58:35 +00:00
deeplow	576cbd3382	Fix DPI mismatch between doc2pixels and pixels2pdf The original document was larger in dimensions than the original one due to a mismatch in DPI settings. When converting documents to pixels we were setting the DPI to 150 pixels per inch. Then when converting back into a PDF we were using 70 DPI. This difference would result in an overall larger document in dimensions (though not necessarily in file size). Fixes #626	2024-01-03 12:58:34 +00:00
deeplow	e5dbe25abb	Replace 'convert' with PyMuPDF for images PyMuPDF can also convert images of the types we already support so we don't need ImageMagick's 'convert'.	2024-01-03 12:58:34 +00:00
deeplow	77d5ea5940	Add PyMuPDF in pixels_to_pdf replacing old logic Adding PyMuPDF essentially make the code much simpler since it can do everything that we'd need multiple programs for. It also includes tesseract-OCR integration, which this commit makes use of.	2024-01-03 12:56:33 +00:00
deeplow	ba17016643	Doc_to_pixels: remove unneeded timeout Timeout can no longer be used since we're not calling a subprocess. We could still implement it, but it's more worthy to reply in yet-to-implement client-side timeouts (in containers).	2024-01-03 12:40:45 +00:00
deeplow	317deadbe4	Replace pdfinfo logic (get # pages) with PyMuPDF	2024-01-03 12:40:45 +00:00
deeplow	327ab8791f	Replace pdftoppm logic with PyMuPDF (native python) Use PyMuPDF (AGPL-licensed) within the container conversion to replace the pdf conversion to RGB. This massively simplifies the code since PyMuPDF is a native python library.	2024-01-03 12:40:45 +00:00
Moon Sungjoon	63aea4cb45	Enable HWP conversion on MacOS (Apple silicon CPU) This PR reverts the patch that disables HWP / HWPX conversion on MacOS M1. It does not fix conversion on Qubes OS (#494). Previously, HWP / HWPX conversion didn't work on MacOS (Apple silicon CPU) (#498) because libreoffice wasn't built with Java support on Alpine Linux for ARM (aarch64). Gratefully, the Alpine team has enabled Java support on the aarch64 system [1], so we can enable it again for ARM architectures. And this patch is included in Alpine 3.19 This commit was included in #541 and reverted on #562 due to a stability issue. Fixes #498 [1]: `74d443f479`	2023-12-13 12:57:22 +02:00
Alex Pyrgiotis	f37d89f042	conversion: Allow using a temp dir other than /tmp Extend the PixelsToPDF converter by adding an additional `tempdir` argument. This argument can be used to make the conversion use a different temporary directory other than `/tmp`. For containers, this extra arguments makes no difference, as it won't be used. For Qubes, this argument will allow storing files in a temporary dir that will be cleaned up once the conversion completes. Previously, these files would linger in the user's `/tmp`. Refs #575	2023-10-04 14:00:53 +03:00
Alex Pyrgiotis	2016965c84	Revert "Enable HWP conversion on MacOS M1" This reverts commit `214ce9720d`. The rationale is that we want to wait until the LibreOffice package that allows HWP conversion in Alpine Linux lands in `alpine:latest`. For more info, read https://github.com/freedomofpress/dangerzone/issues/498#issuecomment-1739894100	2023-10-02 14:22:47 +03:00
deeplow	7daeccdfea	Prevent PDF from overwriting num_pages in Qubes This should only affect the alpha version of Qubes OS (in containers it only allows the attacker to control the timeout). In short, an attacker could have PDF metadata that would show before "Pages:" in the `pdfinfo` command output and this would essentially override the number of pages measured in the server. This could enable the attacker to shorten the number of pages of a document for example. Fixes #565	2023-10-02 12:18:12 +01:00
deeplow	dabdf6c286	FIXUP: rename to QubesQrexecFailed instead	2023-10-02 12:06:18 +01:00
deeplow	eb488b16c5	FIXUP: rename QubesNotEnoughRAMError to QubesConversionStartFailed	2023-10-02 11:51:55 +01:00
deeplow	9cfac7ac2a	Generalize "out of RAM" error to reflect other issues When qrexec-client-vm fails, it could be a symptom of various issues: - the system being out of RAM - dz-dvm not existing The exit code is the same in all cases (126), which makes it particularly tricky to solve in the client application. For this reason the approach is now to tell the user to see the qubes error notification on the top right of their screen.	2023-10-02 11:06:17 +01:00
Alex Pyrgiotis	ccf4132ea0	conversion: Add sanity check for page count Add a sanity check at the end of the conversion from doc to pixels, to ensure that the resulting document will have the same number of pages as the original one. Refs #560	2023-09-28 22:50:54 +03:00
Alex Pyrgiotis	b4e5cf5be7	qubes: Stream page data in real time Stream page data back to the caller, immediately after we read them from pdftoppm. This way, we have more accurate progress reports and timeouts. Fixes #557	2023-09-28 22:50:54 +03:00
Alex Pyrgiotis	4bb959f220	conversion: Add anchor points for streaming page data/metadata Introduce 4 new methods that can be overloaded by the Qubes isolation provider to stream page data/metadata back to the caller. For the time being, these methods do what they did before, i.e., write this info in files within the pixels directory.	2023-09-28 22:50:53 +03:00
Alex Pyrgiotis	6012cd1491	Improve EOF detection when reading command output Do not read a line from the command output and then check if we are at EOF, because it's possible that the writer immediately exited after writing the last line of output. Instead, switch the order of actions. This is a very serious bug that can lead to Dangerzone excluding the last page of the document. It should have bit us right from the start (see `aeeed411a0`), but it seems that the small period of time it takes the kernel to close the file descriptors was hiding this bug. Fixes #560	2023-09-28 22:50:53 +03:00
deeplow	0a6b33ebed	Qubes: detect qube failing to start (missing RAM) In Qubes OS it's often the case that the user doesn't have enough RAM to start the conversion. In this case it raises BrokenPipeException and exits with code 126. It didn't seem possible to distinguish this kind of failure to one where the user has misconfigured qrexec policies. NOTE: this approach is not ideal UX-wise. After the first doc failing the next one will also try and fail. Upon first failure we should inform the user that they need to close some programs or qubes.	2023-09-28 11:08:50 +01:00
deeplow	63f03d5bcd	Add limit and test to max width and height of docs	2023-09-28 11:08:47 +01:00
deeplow	54b8ffbf96	Add page limit of 10000 Theoretically the max pages would be 65536 (2byte unsigned int. However this limit is much higher than practical documents have and larger ones can lead to unforseen problems, for example RAM limitations. We thus opted to use a lower limit of 10K. The limit must be detected client-side, given that the server is distrusted. However we also check it in the server, just as a fail-early mechanism.	2023-09-28 11:01:14 +01:00
Alex Pyrgiotis	30196ff35b	errors: Add error for interrupted conversions Add an error for interrupted conversions, in order to better differentiate this scenario from other ValueErrors that may be raised throughout the code's lifetime.	2023-09-26 17:35:26 +03:00
Alex Pyrgiotis	e64d1da61f	qubes: Pass OCR parameters properly Pass OCR parameters to conversion functions as arguments, instead of setting environment variables. Fixes #455	2023-09-20 18:04:40 +03:00
Alex Pyrgiotis	20157bef58	Fix typo	2023-09-20 17:45:44 +03:00
Alex Pyrgiotis	c547ffc3b4	conversion: Factor out calculate_timeout Factor out the logic behind the calculate_timeout() method, used in Dangerzone conversions, so that isolation providers can call it directly.	2023-09-20 17:14:24 +03:00
deeplow	94f569cdf5	Add error code for unexpected errors in conversion	2023-09-19 15:52:47 +01:00
deeplow	8e4f04a52e	Shift to conversion exit codes by 128 Distinguish from podman or other errors in called binaries by shifting the error codes by 128.	2023-09-19 15:34:00 +01:00
deeplow	b4c3e07d36	Remove attacker-controlled error messages Creates exceptions in the server code to be shared with the client via an identifying exit code. These exceptions are then reconstructed in the client. Refs #456 but does not completely fix it. Unexpected exceptions and progress descriptions are still passed in Containers.	2023-09-19 15:33:20 +01:00
Moon Sungjoon	214ce9720d	Enable HWP conversion on MacOS M1 This PR reverts the patch that disables HWP / HWPX conversion on MacOS M1. It does not fix conversion on Qubes OS (#494) Previously, HWP / HWPX conversion didn't work on MacOS M1 systems (#498) because libreoffice wasn't built with Java support on Alpine Linux for ARM (aarch64). Gratefully, the Alpine team has enabled Java support on the aarch64 system [1], so we can enable it again for ARM architectures. Fixes #498 [1]: `74d443f479`	2023-09-06 13:10:18 +03:00
deeplow	fa215063ee	Add logging for second container	2023-08-22 16:11:38 +01:00
deeplow	75369cf621	Adapt code so it works for reporting script Reporting script now parses JunitXML instead of a series of ".container_log" files. The script in in changed submodule. Additionally it makes failed tests actually fail so that this is recorded in the JunitXML report.	2023-08-22 16:11:36 +01:00
deeplow	eb16285790	Replace container output command prefix ">>>" In the junitxml this prefix would look ugly ("&gt&gt&gt") because it has to escape any non-xml tags.	2023-08-22 16:11:35 +01:00
deeplow	48b2e7bc3c	Log command to debug log for traceback purposes Log commands so we can trace back which errors / outputs are from each command.	2023-08-22 16:11:34 +01:00
deeplow	95cef8cf0a	Containers: capture conversion logs Store the conversion log to a file (captured-output.txt) in the container and when in development mode, have its output displayed on the terminal output.	2023-08-22 16:11:26 +01:00
deeplow	874b8865e2	Qubes: strategy for capturing conversion logs Use qrexec stdout to send conversion data (pixels) and stderr to send conversion progress at the end of the conversion. This happens regardless of whether or not the conversion is in developer mode or not. It's the client that decides if it reads the debug data from stderr or not. In this case, it only reads it if developer mode is enabled.	2023-08-22 16:11:20 +01:00
Alex Pyrgiotis	e3a8a651f1	Disable HWP / HWPX conversion on MacOS M1 / Qubes The HWP / HWPX conversion feature does not work on the following platforms: * MacOS with Apple Silicon CPU * Native Qubes OS For this reason, we need to: 1. Disable it on the GUI side, by not allowing the user to select these files. 2. Throw an error on the isolation provider side, in case the user directly attempts to convert the file (either through CLI or via "Open With"). Refs #494 Refs #498	2023-08-05 16:50:49 +01:00
Alex Pyrgiotis	bc83341d2a	conversion: Detect when LibreOffice silently fails Sometimes, LibreOffice returns with status code 0, but in reality, it fails. It doesn't create a file, and Dangerzone does not detect this. What happens next is that it fails in the next command, and throws an unrelated error. Detect that LibreOffice fails, by checking if the output file exists, after the PDF conversion.	2023-08-05 16:50:47 +01:00

1 2

62 commits