dangerzone

799 commits 51 branches 42 tags 54 MiB

Author	SHA1	Message	Date
Alex Pyrgiotis	a0d6f0d719	container: Grab trained OCR models from GitHub Grab Tesseract's trained models from GitHub, instead of from the Alpine Linux repos. Over the past few months, the models in the Alpine Linux repos did not remain stable, leading to CI issues. Since the models are already pre-trained and available through Tesseract's repo on GitHub, we can use the release tarball that they offer to install them in the container image, which is basically what the upstream packages are doing as well. In order to make sure that we have no regressions, at the time of this commit we ensured that the hashes of the models offered through the Alpine Linux repos and the models offered from the GitHub release are the same. Also, in order to detect future regressions or foul play, we check the downloaded models against a known checksum. Given that these models change every few years, updating the checksum should not be an issue. Fix #357	2023-05-23 16:27:40 +03:00
Alex Pyrgiotis	3d822e1aa3	container: Install a renamed package The tesseract-ocr-data-ell package for the Greek language has been renamed to tesseract-ocr-data-grc. Use the new name in our Dockerfile.	2023-05-17 20:29:13 +03:00
deeplow	dbd0450542	Add poppler-data package due to missing fonts Some documents were reporting the following error when running them over pdftoppm: Syntax Error: Missing language pack for 'Adobe-Japan1' mapping This did not necessarily make the document fail but it could be that some fonts were not properly rendered due to the missing package.	2023-02-21 18:39:14 +00:00
Alex Pyrgiotis	24975fabd5	container: Reinstate OpenJDK 8 dependency Commit `d7be28ec2a` assumed that OpenJDK was required for the PDFtk package, which is no longer installed in the Dangerzone image, and thus was removed. Turns out that while LibreOffice does not depend on OpenJDK, it may produce corrupted PDFs if installed without it, and will not abort the operation. Reinstate OpenJDK to fix the issue of corrupted PDFs. Fixes #315	2023-02-07 18:52:49 +02:00
deeplow	2da973232b	Remove sudo: no longer needed Fixes #232	2023-01-23 14:13:56 +00:00
deeplow	d7be28ec2a	Remove openjdk-8 as a dependency. default-jre and java dependencies dependencies had been added initially [1] because of libreoffice-java-common, which is no longer present. Then, when the image was changed from ubuntu to alpine [2], default-jre was replaced with openjdk-8. If java is still a dependency for libreoffice, then it should be pulled automatically. [1] `9ecdb9e995` [2] `650ae6eee1`	2023-01-23 14:13:48 +00:00
deeplow	d28aa5a25b	Remove PDFtk dependency (replace w/ pdftoppm) PDFtk actually isn't needed. It was being used for breaking a PDF into pages but this is something that be replaced by the already present 'pdftoppm'. Furthermore, by removing this dependency we contribute to reproducible builds and overall supply chain security because it was obtained from gitlab with no signature verification or version pinning. The replacement 'pdftoppm' enabled us to do a shortcut: - before: PDF -> PDF pages -> PNG images -> RGB images - after: PDF -> PPM images -> RGB images And this last conversion step is trivial since the RGB format we were using is just a PPM file without the metadata in its header.	2023-01-23 14:00:57 +00:00
deeplow	21a9a6c98c	running dangerzone without root in container There was previously a user created in the container but it was not used via the dockerfile RUN directive (as pointed out by gmarmstrong[1]). Fixes #169 [1]: https://github.com/freedomofpress/dangerzone/issues/169#issue-1268399245	2022-08-22 08:43:58 +01:00
Micah Lee	8052220034	Get rid of wrapper scripts in the container	2021-11-29 15:39:24 -08:00
Micah Lee	2de2b6dca5	Rename dangerzone-converter to container	2021-11-29 15:30:21 -08:00

Renamed from dangerzone-converter/Dockerfile (Browse further)

10 commits