Alex Pyrgiotis
35e439f9e8
Restore the OCR languages
...
Restore the OCR languages to the state they were in
66d3c40163
, with some minor changes. We
can now do so because we download all the trained models, not just the
ones that Alpine Linux offers.
2023-05-24 13:43:29 +03:00
deeplow
58332fdd6e
tesseract: add new lanaguages and others
...
Tagalo was replaced with filipino [1] in newer tesseract versions, so it
doesn't make sense for us to use the new name and map it to the old
"tgl" name (Tagalo) under the hood.
Language names obtained from tesseract's man page [2].
[1]: 58f7a72f00
[2]: https://github.com/tesseract-ocr/tesseract/blob/main/doc/tesseract.1.asc
2023-03-16 14:23:30 +00:00
deeplow
d8d83ff036
Remove languages not supported
...
When the ocr languages list was originally introduced (commit b527776
),
the container was running in a ubuntu 18.04 [1]. Later it changed to
alpine linux. Unfortunately it has less languages than in ubuntu.
This commit removes those languages. Fixes #355
[1]: b527776e28 (diff-ec032b25a6c2af24eaf4128c85090c5ce0dcbab64e64eace10be9f4e4683a71bR1)
2023-03-16 14:23:28 +00:00
deeplow
66d3c40163
Sort OCR languages by tesseract arg name
...
Make it easier to compare the list of languages with the output of
`tesseract --list-langs`.
2023-03-16 14:23:25 +00:00
deeplow
2d6826afa9
move ocr_languages from global_common to share/
...
ocr_languages can be treated as just a json file instead of being
in global_common. This way it is easier to maintain and makes
global_common cleaner.
2022-09-15 10:40:34 +01:00