[Bug]: FileNotFoundError: [Errno 2] No such file or directory: '/home/x/temp/ocrmypdf.io.w7a0zro8/000003_ocr_hocr.hocr'

I use a custom temp dir on my linux box here.

The output before that error shown in the title was:

Scanning contents    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 3/3 0:00:00
Starting processing with 3 workers concurrently                                                                          ocr.py:107
    3 [tesseract] read_params_file: Can't open hocr                                                                tesseract.py:311
    3 [tesseract] read_params_file: Can't open txt                                                                 tesseract.py:311
    1 [tesseract] read_params_file: Can't open hocr                                                                tesseract.py:311
    1 [tesseract] read_params_file: Can't open txt                                                                 tesseract.py:311
    2 [tesseract] read_params_file: Can't open hocr                                                                tesseract.py:311
    2 [tesseract] read_params_file: Can't open txt                                                                 tesseract.py:311
OCR                  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 3/3 0:00:00
Parsing 3 pages with HocrParser                                                                                       _graft.py:342
An exception occurred while executing the pipeline                                                                   _common.py:318
Traceback (most recent call last):                                                                                                 
  File "/usr/lib/python3.14/site-packages/ocrmypdf/_pipelines/_common.py", line 273, in cli_exception_handler                      
    return fn(options, plugin_manager)                                                                                             
  File "/usr/lib/python3.14/site-packages/ocrmypdf/_pipelines/ocr.py", line 193, in _run_pipeline                                  
    optimize_messages = exec_concurrent(context, executor)                                                                         
  File "/usr/lib/python3.14/site-packages/ocrmypdf/_pipelines/ocr.py", line 151, in exec_concurrent                                
    pdf = ocrgraft.finalize()                                                                                                      
  File "/usr/lib/python3.14/site-packages/ocrmypdf/_graft.py", line 328, in finalize                                               
    parsed_pages = self._parse_hocr_pages()                                                                                        
  File "/usr/lib/python3.14/site-packages/ocrmypdf/_graft.py", line 350, in _parse_hocr_pages                                      
    if page_info.hocr_path.stat().st_size == 0:                                                                                    
       ~~~~~~~~~~~~~~~~~~~~~~~~^^                                                                                                  
  File "/usr/lib/python3.14/pathlib/__init__.py", line 654, in stat                                                                
    return os.stat(self, follow_symlinks=follow_symlinks)                                                                          
           ~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^    

I believe internally it attempts to want to create a file or directory
but it never does. This may be a logic error. Perhaps it could not
find another required file, indicated by:

    Can't open hocr

But in that event it is a bit strange that it expects a file that will never
be at the temp-directory specified about. So perhaps it should bail
earlier?

Ideally the error message would be better. What is hocr? I assume
it has to do with tesseract, right? But it would be nice if ocrmypdf
could be more explicit here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: FileNotFoundError: [Errno 2] No such file or directory: '/home/x/temp/ocrmypdf.io.w7a0zro8/000003_ocr_hocr.hocr' #1650

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: FileNotFoundError: [Errno 2] No such file or directory: '/home/x/temp/ocrmypdf.io.w7a0zro8/000003_ocr_hocr.hocr' #1650

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions