I use a custom temp dir on my linux box here.
The output before that error shown in the title was:
Scanning contents ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 3/3 0:00:00
Starting processing with 3 workers concurrently ocr.py:107
3 [tesseract] read_params_file: Can't open hocr tesseract.py:311
3 [tesseract] read_params_file: Can't open txt tesseract.py:311
1 [tesseract] read_params_file: Can't open hocr tesseract.py:311
1 [tesseract] read_params_file: Can't open txt tesseract.py:311
2 [tesseract] read_params_file: Can't open hocr tesseract.py:311
2 [tesseract] read_params_file: Can't open txt tesseract.py:311
OCR ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 3/3 0:00:00
Parsing 3 pages with HocrParser _graft.py:342
An exception occurred while executing the pipeline _common.py:318
Traceback (most recent call last):
File "/usr/lib/python3.14/site-packages/ocrmypdf/_pipelines/_common.py", line 273, in cli_exception_handler
return fn(options, plugin_manager)
File "/usr/lib/python3.14/site-packages/ocrmypdf/_pipelines/ocr.py", line 193, in _run_pipeline
optimize_messages = exec_concurrent(context, executor)
File "/usr/lib/python3.14/site-packages/ocrmypdf/_pipelines/ocr.py", line 151, in exec_concurrent
pdf = ocrgraft.finalize()
File "/usr/lib/python3.14/site-packages/ocrmypdf/_graft.py", line 328, in finalize
parsed_pages = self._parse_hocr_pages()
File "/usr/lib/python3.14/site-packages/ocrmypdf/_graft.py", line 350, in _parse_hocr_pages
if page_info.hocr_path.stat().st_size == 0:
~~~~~~~~~~~~~~~~~~~~~~~~^^
File "/usr/lib/python3.14/pathlib/init.py", line 654, in stat
return os.stat(self, follow_symlinks=follow_symlinks)
~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
I believe internally it attempts to want to create a file or directory
but it never does. This may be a logic error. Perhaps it could not
find another required file, indicated by:
But in that event it is a bit strange that it expects a file that will never
be at the temp-directory specified about. So perhaps it should bail
earlier?
Ideally the error message would be better. What is hocr? I assume
it has to do with tesseract, right? But it would be nice if ocrmypdf
could be more explicit here.
I use a custom temp dir on my linux box here.
The output before that error shown in the title was:
Scanning contents ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 3/3 0:00:00
Starting processing with 3 workers concurrently ocr.py:107
3 [tesseract] read_params_file: Can't open hocr tesseract.py:311
3 [tesseract] read_params_file: Can't open txt tesseract.py:311
1 [tesseract] read_params_file: Can't open hocr tesseract.py:311
1 [tesseract] read_params_file: Can't open txt tesseract.py:311
2 [tesseract] read_params_file: Can't open hocr tesseract.py:311
2 [tesseract] read_params_file: Can't open txt tesseract.py:311
OCR ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 3/3 0:00:00
Parsing 3 pages with HocrParser _graft.py:342
An exception occurred while executing the pipeline _common.py:318
Traceback (most recent call last):
File "/usr/lib/python3.14/site-packages/ocrmypdf/_pipelines/_common.py", line 273, in cli_exception_handler
return fn(options, plugin_manager)
File "/usr/lib/python3.14/site-packages/ocrmypdf/_pipelines/ocr.py", line 193, in _run_pipeline
optimize_messages = exec_concurrent(context, executor)
File "/usr/lib/python3.14/site-packages/ocrmypdf/_pipelines/ocr.py", line 151, in exec_concurrent
pdf = ocrgraft.finalize()
File "/usr/lib/python3.14/site-packages/ocrmypdf/_graft.py", line 328, in finalize
parsed_pages = self._parse_hocr_pages()
File "/usr/lib/python3.14/site-packages/ocrmypdf/_graft.py", line 350, in _parse_hocr_pages
if page_info.hocr_path.stat().st_size == 0:
~~~~~~~~~~~~~~~~~~~~~~~~^^
File "/usr/lib/python3.14/pathlib/init.py", line 654, in stat
return os.stat(self, follow_symlinks=follow_symlinks)
~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
I believe internally it attempts to want to create a file or directory
but it never does. This may be a logic error. Perhaps it could not
find another required file, indicated by:
But in that event it is a bit strange that it expects a file that will never
be at the temp-directory specified about. So perhaps it should bail
earlier?
Ideally the error message would be better. What is hocr? I assume
it has to do with tesseract, right? But it would be nice if ocrmypdf
could be more explicit here.