Skip to content

[Bug]: Ocrmypdf API 17.1 and 17.2 seem to ignore language parameters #1640

@gibreelferishta

Description

@gibreelferishta

What were you trying to do?

Hi,
I recently upgraded to OcrmyPDF 17.2 and it appears that it completely ignores language params and defaults to English when running tesseract.
Here's my python code:
ocrmypdf.ocr(inpfile,outputfile, language=["tam"], force_ocr=True)

This was for a file in Tamil.

Here's the relevant line from the Debug output:
DEBUG ocrmypdf.subprocess - 2 Running: ['tesseract', '-l', 'eng', '--oem', '1' ...

Needless to say, the language pack for Tamil has been correctly installed (so this isn't the issue).

I was able to get around the issue by reverting to version 16.13 which works with no problems.

It would be great if this bug could be fixed in the next update.
Thank you!
Best,

Where are you installing/running from?

PyPI (pip, poetry, pipx, etc.)

OCRmyPDF version

17.2

What operating system are you working on?

Linux

Operating system details and version

No response

Simple sanity checks

  • Operating system is currently supported by its vendor (not end of life)
  • Python version is compatible with OCRmyPDF
  • This issue is not about a specific input file

Relevant log output

DEBUG ocrmypdf.subprocess -    2  Running: ['tesseract', '-l', 'eng', '--oem', '1',... etc]

Metadata

Metadata

Assignees

Labels

triageIssue needs triage

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions