dangerzone

mirror of https://github.com/freedomofpress/dangerzone.git synced 2025-04-28 18:02:38 +02:00

Author	SHA1	Message	Date
Alex Pyrgiotis	cbca9110ca	Switch to tessdata-fast Tesseract model Switch to the tessdata-fast Tesseract model, instead of the tessdata one. The tessdata-fast Tesseract model is much smaller, and a bit faster than the other one. Also, it's the model that Debian/Fedora ship by default. Closes #545	2023-09-25 12:48:05 +03:00
Alex Pyrgiotis	5bd609781d	Remove Kurdish (Arabic) language Remove the Kurdish (Arabic) language ("kur_ara") from the list of languages that we offer for OCR, since it's not included in the installed languages. Interestingly, it is not present in the Apline Linux repos as well, so this was probably an omission in the first place.	2023-05-24 13:43:29 +03:00
Alex Pyrgiotis	35e439f9e8	Restore the OCR languages Restore the OCR languages to the state they were in `66d3c40163`, with some minor changes. We can now do so because we download all the trained models, not just the ones that Alpine Linux offers.	2023-05-24 13:43:29 +03:00
deeplow	58332fdd6e	tesseract: add new lanaguages and others Tagalo was replaced with filipino [1] in newer tesseract versions, so it doesn't make sense for us to use the new name and map it to the old "tgl" name (Tagalo) under the hood. Language names obtained from tesseract's man page [2]. [1]: `58f7a72f00` [2]: https://github.com/tesseract-ocr/tesseract/blob/main/doc/tesseract.1.asc	2023-03-16 14:23:30 +00:00
deeplow	d8d83ff036	Remove languages not supported When the ocr languages list was originally introduced (commit `b527776`), the container was running in a ubuntu 18.04 [1]. Later it changed to alpine linux. Unfortunately it has less languages than in ubuntu. This commit removes those languages. Fixes #355 [1]: `b527776e28 (diff-ec032b25a6c2af24eaf4128c85090c5ce0dcbab64e64eace10be9f4e4683a71bR1)`	2023-03-16 14:23:28 +00:00
deeplow	66d3c40163	Sort OCR languages by tesseract arg name Make it easier to compare the list of languages with the output of `tesseract --list-langs`.	2023-03-16 14:23:25 +00:00
deeplow	2d6826afa9	move ocr_languages from global_common to share/ ocr_languages can be treated as just a json file instead of being in global_common. This way it is easier to maintain and makes global_common cleaner.	2022-09-15 10:40:34 +01:00

7 commits