Commit graph

40 commits

Author SHA1 Message Date
Alex Pyrgiotis
42646877d7
Switch base image to Debian Stable
Switch base image from Alpine Linux to Debian Stable, in order to reduce
our image footprint, improve our security posture, and build our
container image reproducibly.

Fixes #1046
Refs #1047
2025-01-14 11:57:37 +02:00
Alex Pyrgiotis
3f86e7b465
Make PyMuPDF always log to stderr
PyMUPDF logs to stdout by default, which is problematic because we use
the stdout of the conversion process to read the pixel stream of a
document.

Make PyMuPDF always log to stderr, by setting the following environment
variables: PYMUPDF_MESSAGE and PYMUPDF_LOG.

Fixes #877
2024-08-09 14:32:19 +03:00
Alexis Métaireau
cbbd6afcc1
chore: remove unused code
This commit removes code that's not being used, it can be exceptions
with the `as e` where the exception itself is not used, the same with
`with` statements, and some other parts where there were duplicated
code.
2024-06-05 14:19:31 +02:00
Alexis Métaireau
5aa4863b52
chore(imports): remove useless imports
As detected by [ruff](https://github.com/astral-sh/ruff)

Related to #254, although it doesn't provide the command to lint the
codebase itself.
2024-06-05 14:19:30 +02:00
Alex Pyrgiotis
1e1d9274f0
Handle complaints about shebangs during RPM build
When building the Dangerzone RPMs, we were seeing the following shebang
warnings:

    + /usr/lib/rpm/redhat/brp-mangle-shebangs
    mangling shebang in /usr/lib/python3.12/site-packages/dangerzone/conversion/doc_to_pixels.py from /usr/bin/env python3 to #!/usr/bin/python3
    mangling shebang in /usr/lib/python3.12/site-packages/dangerzone/conversion/common.py from /usr/bin/env python3 to #!/usr/bin/python3
    mangling shebang in /usr/lib/python3.12/site-packages/dangerzone/conversion/pixels_to_pdf.py from /usr/bin/env python3 to #!/usr/bin/python3
    mangling shebang in /etc/qubes-rpc/dz.ConvertDev from /usr/bin/env python3 to #!/usr/bin/python3
    mangling shebang in /etc/qubes-rpc/dz.Convert from /bin/sh to #!/usr/bin/sh

These warnings are benign in nature, but coupled with #727, they could
lead to incorrect file permissions.

Remove shebangs from the following files, since they are not executed
directly, but are imported instead:

    dangerzone/conversion/common.py
    dangerzone/conversion/doc_to_pixels.py
    dangerzone/conversion/pixels_to_pdf.py

Also, accept the suggestions by Fedora (/bin/sh -> /usr/bin/sh,
/usr/bin/env python3 -> /usr/bin/python3) for the following files:

    qubes/dz.Convert
    qubes/dz.ConvertDev

Refs #727
2024-05-28 18:06:34 +03:00
deeplow
8a32d80762
Remove leftover progress variable in pixels_to_pdf
Since the progress information is now inferred on host based on the
number of pages obtained, progress-tracking variables should be removed
from the server.
2024-02-06 20:11:52 +00:00
deeplow
550786adfe
Remove untrusted progress parsing (stderr instead)
Now that only the second container can send JSON-encoded progress
information, we can the untrusted JSON parsing. The parse_progress was
also renamed to `parse_progress_trusted` to ensure future developers
don't mistake this as a safe method.

The old methods for sending untrusted JSON were repurposed to send the
progress instead to stderr for troubleshooting in development mode.

Fixes #456
2024-02-06 19:42:40 +00:00
deeplow
0a099540c8
Stream pages in containers: merge isolation providers
Merge Qubes and Containers isolation providers core code into the class
parent IsolationProviders abstract class.

This is done by streaming pages in containers for exclusively in first
conversion process. The commit is rather large due to the multiple
interdependencies of the code, making it difficult to split into various
commits.

The main conversion method (_convert) now in the superclass simply calls
two methods:
  - doc_to_pixels()
  - pixels_to_pdf()

Critically, doc_to_pixels is implemented in the superclass, diverging
only in a specialized method called "start_doc_to_pixels_proc()". This
method obtains the process responsible that communicates with the
isolation provider (container / disp VM) via `podman/docker` and qrexec
on Containers and Qubes respectively.

Known regressions:
  - progress reports stopped working on containers

Fixes #443
2024-02-06 19:42:33 +00:00
deeplow
331b6514e8
Containers: remove debug messages (via files)
Remove container_log messages ahead of debug info being sent over
standard streams.
2024-02-06 18:54:39 +00:00
deeplow
cd99122385
Adds file formats: epub svg bmp pnm bpm ppm
Partially fix for #660. Missing some files due to limitations [1]:
- PSD - only available from PyMuPDF>=1.23.0 (qubes-fedora is lower)
- TXT - only available from PyMuPDF>=1.23.7 (qubes-fedora is lower)
- JXR - PyMuPDF was refusing to due to missing codec [1]
- JPX - Generated test file was rejected by PyMuPDF [2]
- FB2 - Most often cannot be detected by mime type alone [3]
- CBZ - (idem)
- XPS - (idem)
- MOBI - (idem)
- PAM - General version of other file format already included, so I
  decided not to include this extension [0]

New test files were generated locally:
 - epub - generated with calibre's convert-ebook from another
   sample file
 - svg - generated with inkscape from a mix of a default template
   (hexagons) and a logo's PNG file
 - bmp, pnm, bpm, ppm - generated with ImageMagick's 'convert' from
   tests/test_docs/sample-png.png

[0]: https://github.com/freedomofpress/dangerzone/issues/660#issuecomment-1914681487
[1]: https://github.com/freedomofpress/dangerzone/issues/660#issuecomment-1916803201
[2]: https://github.com/freedomofpress/dangerzone/issues/660#issuecomment-1916870347
[3]: https://github.com/freedomofpress/dangerzone/issues/688
2024-01-31 19:58:48 +00:00
deeplow
4e720aa6e2
Replace 'None' conversion type with "PyMuPDF"
Replaced for clarity over the fact that this conversion is in fact
handled by PyMuPDF.
2024-01-31 19:58:36 +00:00
deeplow
80db7bb02e
Remove pre-pymupdf exceptions and detect pymupdf ones 2024-01-03 12:58:35 +00:00
deeplow
b75417bbec
Remove all server-side timeouts from doc to pixels
Now we're using client-side timeouts so the server side-ones are not
needed. Implemented following the suggestion from @apyrgio [1].

[1]: https://github.com/freedomofpress/dangerzone/pull/622#discussion_r1413906514
2024-01-03 12:58:35 +00:00
deeplow
576cbd3382
Fix DPI mismatch between doc2pixels and pixels2pdf
The original document was larger in dimensions than the original one due
to a mismatch in DPI settings. When converting documents to pixels we
were setting the DPI to 150 pixels per inch. Then when converting back
into a PDF we were using 70 DPI. This difference would result in an
overall larger document in dimensions (though not necessarily in file
size).

Fixes #626
2024-01-03 12:58:34 +00:00
deeplow
e5dbe25abb
Replace 'convert' with PyMuPDF for images
PyMuPDF can also convert images of the types we already support so we
don't need ImageMagick's 'convert'.
2024-01-03 12:58:34 +00:00
deeplow
ba17016643
Doc_to_pixels: remove unneeded timeout
Timeout can no longer be used since we're not calling a subprocess. We
could still implement it, but it's more worthy to reply in
yet-to-implement client-side timeouts (in containers).
2024-01-03 12:40:45 +00:00
deeplow
317deadbe4
Replace pdfinfo logic (get # pages) with PyMuPDF 2024-01-03 12:40:45 +00:00
deeplow
327ab8791f
Replace pdftoppm logic with PyMuPDF (native python)
Use PyMuPDF (AGPL-licensed) within the container conversion to replace
the pdf conversion to RGB. This massively simplifies the code since
PyMuPDF is a native python library.
2024-01-03 12:40:45 +00:00
Moon Sungjoon
63aea4cb45
Enable HWP conversion on MacOS (Apple silicon CPU)
This PR reverts the patch that disables HWP / HWPX conversion on MacOS M1.
It does not fix conversion on Qubes OS (#494).

Previously, HWP / HWPX conversion didn't work on MacOS (Apple silicon CPU) (#498)
because libreoffice wasn't built with Java support on Alpine Linux for ARM (aarch64).

Gratefully, the Alpine team has enabled Java support on the aarch64
system [1], so we can enable it again for ARM architectures.
And this patch is included in Alpine 3.19

This commit was included in #541 and reverted on #562 due to a stability issue.

Fixes #498

[1]: 74d443f479
2023-12-13 12:57:22 +02:00
Alex Pyrgiotis
2016965c84
Revert "Enable HWP conversion on MacOS M1"
This reverts commit 214ce9720d. The
rationale is that we want to wait until the LibreOffice package that
allows HWP conversion in Alpine Linux lands in `alpine:latest`.

For more info, read
https://github.com/freedomofpress/dangerzone/issues/498#issuecomment-1739894100
2023-10-02 14:22:47 +03:00
deeplow
7daeccdfea
Prevent PDF from overwriting num_pages in Qubes
This should only affect the alpha version of Qubes OS (in containers
it only allows the attacker to control the timeout). In short, an
attacker could have PDF metadata that would show before "Pages:" in
the `pdfinfo` command output and this would essentially override the
number of pages measured in the server. This could enable the attacker
to shorten the number of pages of a document for example.

Fixes #565
2023-10-02 12:18:12 +01:00
Alex Pyrgiotis
ccf4132ea0
conversion: Add sanity check for page count
Add a sanity check at the end of the conversion from doc to pixels, to
ensure that the resulting document will have the same number of pages as
the original one.

Refs #560
2023-09-28 22:50:54 +03:00
Alex Pyrgiotis
4bb959f220
conversion: Add anchor points for streaming page data/metadata
Introduce 4 new methods that can be overloaded by the Qubes isolation
provider to stream page data/metadata back to the caller. For the time
being, these methods do what they did before, i.e., write this info in
files within the pixels directory.
2023-09-28 22:50:53 +03:00
deeplow
54b8ffbf96
Add page limit of 10000
Theoretically the max pages would be 65536 (2byte unsigned int.
However this limit is much higher than practical documents have
and larger ones can lead to unforseen problems, for example RAM
limitations.

We thus opted to use a lower limit of 10K. The limit must be
detected client-side, given that the server is distrusted. However
we also check it in the server, just as a fail-early mechanism.
2023-09-28 11:01:14 +01:00
deeplow
94f569cdf5
Add error code for unexpected errors in conversion 2023-09-19 15:52:47 +01:00
deeplow
b4c3e07d36
Remove attacker-controlled error messages
Creates exceptions in the server code to be shared with the client via an
identifying exit code. These exceptions are then reconstructed in the
client.

Refs #456 but does not completely fix it. Unexpected exceptions and
progress descriptions are still passed in Containers.
2023-09-19 15:33:20 +01:00
Moon Sungjoon
214ce9720d
Enable HWP conversion on MacOS M1
This PR reverts the patch that disables HWP / HWPX conversion on MacOS
M1. It does not fix conversion on Qubes OS (#494)

Previously, HWP / HWPX conversion didn't work on MacOS M1 systems (#498)
because libreoffice wasn't built with Java support on Alpine Linux for
ARM (aarch64).

Gratefully, the Alpine team has enabled Java support on the aarch64
system [1], so we can enable it again for ARM architectures.

Fixes #498

[1]: 74d443f479
2023-09-06 13:10:18 +03:00
deeplow
75369cf621
Adapt code so it works for reporting script
Reporting script now parses JunitXML instead of a series of
".container_log" files. The script in in changed submodule.

Additionally it makes failed tests actually fail so that this is
recorded in the JunitXML report.
2023-08-22 16:11:36 +01:00
deeplow
95cef8cf0a
Containers: capture conversion logs
Store the conversion log to a file (captured-output.txt) in the
container and when in development mode, have its output displayed on the
terminal output.
2023-08-22 16:11:26 +01:00
deeplow
874b8865e2
Qubes: strategy for capturing conversion logs
Use qrexec stdout to send conversion data (pixels) and stderr to send
conversion progress at the end of the conversion. This happens
regardless of whether or not the conversion is in developer mode or not.

It's the client that decides if it reads the debug data from stderr or
not. In this case, it only reads it if developer mode is enabled.
2023-08-22 16:11:20 +01:00
Alex Pyrgiotis
e3a8a651f1
Disable HWP / HWPX conversion on MacOS M1 / Qubes
The HWP / HWPX conversion feature does not work on the following
platforms:

* MacOS with Apple Silicon CPU
* Native Qubes OS

For this reason, we need to:

1. Disable it on the GUI side, by not allowing the user to select these
   files.
2. Throw an error on the isolation provider side, in case the user
   directly attempts to convert the file (either through CLI or via
   "Open With").

Refs #494
Refs #498
2023-08-05 16:50:49 +01:00
Alex Pyrgiotis
bc83341d2a
conversion: Detect when LibreOffice silently fails
Sometimes, LibreOffice returns with status code 0, but in reality, it
fails. It doesn't create a file, and Dangerzone does not detect this.
What happens next is that it fails in the next command, and throws an
unrelated error.

Detect that LibreOffice fails, by checking if the output file exists,
after the PDF conversion.
2023-08-05 16:50:47 +01:00
Alex Pyrgiotis
6736fb0153
Factor out MIME type detection
Factor out the MIME type detection logic, so that we can use it both in
Qubes and containers.
2023-08-05 16:50:35 +01:00
Moon Sungjoon
fa22e96af7
Clean up HWP/HWPX MIME types
Use the MIME types actually used by the `file` command, which was
recently changed for the detection of the HWPX format [1].

application/hwp+zip -> application/x-hwp+zip

But the HWPX format includes a 'mimetype' file, which contains the
MIME type string "application/hwp+zip", so that was left so because
it may be possible to detect it as "application/hwp+zip".

[1]: ceef7ead3a
2023-08-01 14:35:28 +01:00
Moon Sungjoon
a453c890a0
Fix dynamic loading of LibreOffice extensions
HWPX MIME type is recognized as 'application/zip' with current version of file command (file-5.44).
It will be recognized as 'application/hwp+zip' when new version of file is released.

For a temporary fix, when MIME type of file is 'application/zip',
check the file type again (without the MIME option).
And then check if it's 'Zip data (MIME type "application/hwp+zip"?)' or not.
2023-08-01 14:28:36 +01:00
deeplow
d16961bed6
Security: Dynamically load libreoffice extension (PoC)
Only load the LibreOffice extension for opening hwp/hwpx when it is
actually needed. Adding an extension to libreoffice may allow for it to
run arbitrary code. This makes it trust more scalable by trusting
LibreOffice extensions only for the filetypes which they target.

Reasoning
---------

Assuming a malicious `.oxt` extension this means that the extension has
arbitrary code execution in the container. While this is not an
existential threat in itself, we should not expose every Dangerzone user
to it. This is achieved by dynamically loading the extension at runtime
only when needed.

This ensures that a compromised extension will in its least malicious
form be able to modify the visual content of any hancom office files but
not *every file*. In the more malicious version, if the code execution
manages to do a container escape, this will only affect users that have
converted a Hancom office file.
2023-08-01 14:28:34 +01:00
Moon Sungjoon
3e895adbab
Add hwp hwpx support
hwp/hwpx has several custom MIME types

.hwp:
 - application/x-hwp
 - application/haansofthwp
 - application/vnd.hancom.hwp

.hwpx:
 - application/haansofthwpx
 - application/vnd.hancom.hwpx,
 - application/hwp+zip

Fixes #243
2023-08-01 14:27:18 +01:00
Alex Pyrgiotis
cfdaec23c5
Support multiple Python libraries for libmagic
It seems that there are at least two Python libraries with libmagic
support:

* PyPI: python-magic (https://pypi.org/project/python-magic/)
  On Fedora it's `python3-magic`
* PyPI: filemagic (https://pypi.org/project/filemagic/)
  On Fedora it's `python3-file-magic`

The first package corresponds to the `py3-magic` package on Alpine
Linux, and it's the one we install in the container. The second package
uses a different API, and it's the only one we can use on Qubes.

To make matters worse, we:

* Can't install the first package on Fedora, because it installs the
  second under the hood:
  https://bugzilla.redhat.com/show_bug.cgi?id=1899279
* Can't install the second package on Alpine Linux (untested), due to
  Musl being used instead of libC:
  https://stackoverflow.com/a/53936722

Ultimately, we need to support both, by trying the first API, and on
failure using the other API.
2023-06-21 11:45:00 +03:00
deeplow
a0d1a68302
Use /tmp/dangerzone for Qubes compatibility
For using in containers, creating a /dangerzone directory is fine but it
is more standard to do this in /tmp.
2023-06-21 11:44:53 +03:00
deeplow
814d533c3b
Restructure container code
The files in `container/` no longer make sense to have that name since
the "document to pixels" part will run in Qubes OS in its own virtual
machine.

To adapt to this, this PR does the following:
- Moves all the files in `container` to `dangerzone/conversion`
- Splits the old `container/dangerzone.py` into its two components
  `dangerzone/conversion/{doc_to_pixels,pixels_to_pdf}.py` with a
  `common.py` file for shared functions
- Moves the Dockerfile to the project root and adapts it to the new
  container code location
- Updates the CircleCI config to properly cache Docker images.
- Updates our install scripts to properly build Docker images.
- Adds the new conversion module to the container image, so that it can
  be imported as a package.
- Adapts the container isolation provider to use the new way of calling
  the code.

NOTE: We have made zero changes to the conversion code in this commit,
except for necessary imports in order to factor out some common parts.
Any changes necessary for Qubes integration follow in the subsequent
commits.
2023-06-21 11:44:47 +03:00
Renamed from container/dangerzone.py (Browse further)