dangerzone/dangerzone
Naglis Jonaitis d632908a44
Fix printing of filenames with surrogate escapes
On Unix systems a filename can be a sequence of bytes that is not valid
UTF-8. Python uses[1] surrogate escapes to allow to decode such
filenames to Unicode (bytes that cannot be decoded are replaced by a
surrogate; upon encoding the surrogate is converted to the original
byte).

From `click` docs[2]:

> Invalid bytes or surrogate escapes will raise an error when written
> to a stream with `errors="strict"`. This will typically happen with
> `stdout` when the locale is something like `en_GB.UTF-8`.

To fix that, we use `utils.replace_control_chars()` before printing the
filenames to `stdout` so that surrogate escapes are replaced by �.

Fixes #768
2024-04-25 14:11:25 +03:00
..
conversion conversion: Do not let PyMuPDF print to stdout 2024-03-13 21:03:15 +02:00
gui Properly add new file extensions 2024-02-20 16:02:38 +02:00
isolation_provider isolation_provider: Always terminate spawned process 2024-04-24 14:39:15 +03:00
__init__.py Remove separate dangerzone-container entry point, make CLI work with it, and refactor container code to be more DRY 2021-08-04 16:21:00 -07:00
args.py Update typing hints for Mypy 1.1.1 2023-03-27 15:19:43 +03:00
cli.py Fix printing of filenames with surrogate escapes 2024-04-25 14:11:25 +03:00
document.py Sanitize filenames before logging them 2023-08-01 14:43:48 +03:00
errors.py Prevent adding duplicate documents 2022-11-30 12:49:18 +00:00
logic.py Add logic to handle documents removal 2023-07-25 15:00:12 +01:00
settings.py Move settings.json into constant 2024-04-01 18:18:41 +03:00
util.py Relax the restrictions of util.replace_control_chars 2024-04-25 14:11:16 +03:00