The `util.replace_control_chars()` function was overly strict, and
would replace every non-ASCII character with "_". This included both
control characters, as well as normal characters in a non-English
alphabet.
Relax these restrictions by checking each character and deciding if it's
a Unicode control character, using the `unicodedata` Python package.
With this change, emojis and non-English letters are now allowed.
Add a pytest fixture that crafts a filename with Unicode characters that
are not considered common for this use. By default, this fixture uses
an invalid Unicode character as well, but we strip it in case of macOS
(APFS) since filenames must be UTF-8 encoded.
[1]: https://en.wikipedia.org/wiki/Filename#Comparison_of_filename_limitations
Theoretically the max pages would be 65536 (2byte unsigned int.
However this limit is much higher than practical documents have
and larger ones can lead to unforseen problems, for example RAM
limitations.
We thus opted to use a lower limit of 10K. The limit must be
detected client-side, given that the server is distrusted. However
we also check it in the server, just as a fail-early mechanism.
Adds a large pool of document that can and should be used prior to a
release to understand effects of the new release over a real-world
scenario.
Documents are stored in an external git LFS repo under
`tests/test_docs_large` and currently it's about 11K documents gathered
from multiple PDF readers and office suite's test sets.
Documentation on how to run the tests is under
`docs/developer/TESTING.md`
Add extra files and base64 encode externally contributed docs. This
prevents the accidental opening of such documents, since they couldn't
be rebuit by the Dangerzone developers to ensure their safety.
Now that sample_doc was renamed to sample_pdf it could cause some
confusion the fact that that the TestBase class had an attribute called
sample_doc which referenced the sample PDF.
By removing this attribute and passing the fixture instead we are
following a more pytest-native approach of passing arguments explicitly.
Adds tests for macOS and Windows with the dummy converter. Tests won't
actually perform the conversion. But it should be enough for us to test
the remainder of the codebase.
Fixes#229
Checking if files were writeable created files in the process. In the
case where someone adds a list of N files to dangerzone but exits before
converting, they would be left with N 0-byte files for the -safe
version. Now they don't.
Fixes#214
Let the Document class suggest the default filename for the safe PDF,
based on the provided input filename, appended with the extension
`-safe.pdf`.
Previously, this logic was copy-pasted throughout the code, which made
it difficult to maintain.
Run Mypy static checks against our tests. This brings them inline with
the rest of the codebase, and we have an extra level of certainty that
the tests (and unit tests in particular) will not significantly diverge
from the code they are testing.