What were you trying to do?
Hi,
I recently upgraded to OcrmyPDF 17.2 and it appears that it completely ignores language params and defaults to English when running tesseract.
Here's my python code:
ocrmypdf.ocr(inpfile,outputfile, language=["tam"], force_ocr=True)
This was for a file in Tamil.
Here's the relevant line from the Debug output:
DEBUG ocrmypdf.subprocess - 2 Running: ['tesseract', '-l', 'eng', '--oem', '1' ...
Needless to say, the language pack for Tamil has been correctly installed (so this isn't the issue).
I was able to get around the issue by reverting to version 16.13 which works with no problems.
It would be great if this bug could be fixed in the next update.
Thank you!
Best,
Where are you installing/running from?
PyPI (pip, poetry, pipx, etc.)
OCRmyPDF version
17.2
What operating system are you working on?
Linux
Operating system details and version
No response
Simple sanity checks
Relevant log output
DEBUG ocrmypdf.subprocess - 2 Running: ['tesseract', '-l', 'eng', '--oem', '1',... etc]
What were you trying to do?
Hi,
I recently upgraded to OcrmyPDF 17.2 and it appears that it completely ignores language params and defaults to English when running tesseract.
Here's my python code:
ocrmypdf.ocr(inpfile,outputfile, language=["tam"], force_ocr=True)
This was for a file in Tamil.
Here's the relevant line from the Debug output:
DEBUG ocrmypdf.subprocess - 2 Running: ['tesseract', '-l', 'eng', '--oem', '1' ...
Needless to say, the language pack for Tamil has been correctly installed (so this isn't the issue).
I was able to get around the issue by reverting to version 16.13 which works with no problems.
It would be great if this bug could be fixed in the next update.
Thank you!
Best,
Where are you installing/running from?
PyPI (pip, poetry, pipx, etc.)
OCRmyPDF version
17.2
What operating system are you working on?
Linux
Operating system details and version
No response
Simple sanity checks
Relevant log output