deeplow
|
58332fdd6e
|
tesseract: add new lanaguages and others
Tagalo was replaced with filipino [1] in newer tesseract versions, so it
doesn't make sense for us to use the new name and map it to the old
"tgl" name (Tagalo) under the hood.
Language names obtained from tesseract's man page [2].
[1]: 58f7a72f00
[2]: https://github.com/tesseract-ocr/tesseract/blob/main/doc/tesseract.1.asc
|
2023-03-16 14:23:30 +00:00 |
|
deeplow
|
d8d83ff036
|
Remove languages not supported
When the ocr languages list was originally introduced (commit b527776 ),
the container was running in a ubuntu 18.04 [1]. Later it changed to
alpine linux. Unfortunately it has less languages than in ubuntu.
This commit removes those languages. Fixes #355
[1]: b527776e28 (diff-ec032b25a6c2af24eaf4128c85090c5ce0dcbab64e64eace10be9f4e4683a71bR1)
|
2023-03-16 14:23:28 +00:00 |
|
deeplow
|
66d3c40163
|
Sort OCR languages by tesseract arg name
Make it easier to compare the list of languages with the output of
`tesseract --list-langs`.
|
2023-03-16 14:23:25 +00:00 |
|
deeplow
|
2d6826afa9
|
move ocr_languages from global_common to share/
ocr_languages can be treated as just a json file instead of being
in global_common. This way it is easier to maintain and makes
global_common cleaner.
|
2022-09-15 10:40:34 +01:00 |
|