dangerzone

mirror of https://github.com/freedomofpress/dangerzone.git synced 2025-04-28 18:02:38 +02:00

History

Alex Pyrgiotis a0d6f0d719 container: Grab trained OCR models from GitHub Grab Tesseract's trained models from GitHub, instead of from the Alpine Linux repos. Over the past few months, the models in the Alpine Linux repos did not remain stable, leading to CI issues. Since the models are already pre-trained and available through Tesseract's repo on GitHub, we can use the release tarball that they offer to install them in the container image, which is basically what the upstream packages are doing as well. In order to make sure that we have no regressions, at the time of this commit we ensured that the hashes of the models offered through the Alpine Linux repos and the models offered from the GitHub release are the same. Also, in order to detect future regressions or foul play, we check the downloaded models against a known checksum. Given that these models change every few years, updating the checksum should not be an issue. Fix #357	2023-05-23 16:27:40 +03:00
..
dangerzone.py	container: Run LibreOffice in safe mode	2023-03-28 14:47:07 +03:00
Dockerfile	container: Grab trained OCR models from GitHub	2023-05-23 16:27:40 +03:00

container: Grab trained OCR models from GitHub

Grab Tesseract's trained models from GitHub, instead of from the Alpine
Linux repos. Over the past few months, the models in the Alpine Linux
repos did not remain stable, leading to CI issues.

Since the models are already pre-trained and available through
Tesseract's repo on GitHub, we can use the release tarball that they
offer to install them in the container image, which is basically what
the upstream packages are doing as well.

In order to make sure that we have no regressions, at the time of this
commit we ensured that the hashes of the models offered through the
Alpine Linux repos and the models offered from the GitHub release are
the same. Also, in order to detect future regressions or foul play, we
check the downloaded models against a known checksum. Given that these
models change every few years, updating the checksum should not be an
issue.

Fix #357

2023-05-23 16:27:40 +03:00

dangerzone.py

container: Run LibreOffice in safe mode

2023-03-28 14:47:07 +03:00

Dockerfile

container: Grab trained OCR models from GitHub

2023-05-23 16:27:40 +03:00