Skip to content

[Bug]: FileNotFoundError: [Errno 2] No such file or directory: '/home/x/temp/ocrmypdf.io.w7a0zro8/000003_ocr_hocr.hocr' #1650

@rubyFeedback

Description

@rubyFeedback

I use a custom temp dir on my linux box here.

The output before that error shown in the title was:

Scanning contents ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 3/3 0:00:00
Starting processing with 3 workers concurrently ocr.py:107
3 [tesseract] read_params_file: Can't open hocr tesseract.py:311
3 [tesseract] read_params_file: Can't open txt tesseract.py:311
1 [tesseract] read_params_file: Can't open hocr tesseract.py:311
1 [tesseract] read_params_file: Can't open txt tesseract.py:311
2 [tesseract] read_params_file: Can't open hocr tesseract.py:311
2 [tesseract] read_params_file: Can't open txt tesseract.py:311
OCR ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 3/3 0:00:00
Parsing 3 pages with HocrParser _graft.py:342
An exception occurred while executing the pipeline _common.py:318
Traceback (most recent call last):
File "/usr/lib/python3.14/site-packages/ocrmypdf/_pipelines/_common.py", line 273, in cli_exception_handler
return fn(options, plugin_manager)
File "/usr/lib/python3.14/site-packages/ocrmypdf/_pipelines/ocr.py", line 193, in _run_pipeline
optimize_messages = exec_concurrent(context, executor)
File "/usr/lib/python3.14/site-packages/ocrmypdf/_pipelines/ocr.py", line 151, in exec_concurrent
pdf = ocrgraft.finalize()
File "/usr/lib/python3.14/site-packages/ocrmypdf/_graft.py", line 328, in finalize
parsed_pages = self._parse_hocr_pages()
File "/usr/lib/python3.14/site-packages/ocrmypdf/_graft.py", line 350, in _parse_hocr_pages
if page_info.hocr_path.stat().st_size == 0:
~~~~~~~~~~~~~~~~~~~~~~~~^^
File "/usr/lib/python3.14/pathlib/init.py", line 654, in stat
return os.stat(self, follow_symlinks=follow_symlinks)
~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I believe internally it attempts to want to create a file or directory
but it never does. This may be a logic error. Perhaps it could not
find another required file, indicated by:

Can't open hocr

But in that event it is a bit strange that it expects a file that will never
be at the temp-directory specified about. So perhaps it should bail
earlier?

Ideally the error message would be better. What is hocr? I assume
it has to do with tesseract, right? But it would be nice if ocrmypdf
could be more explicit here.

Metadata

Metadata

Assignees

Labels

triageIssue needs triage

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions