[Question]: difference between command line and ocrmypdf.ocr() API

### Describe the bug

I have a file that is a PDF consisting of an SVG of dummy text (attached). If I run from the command line:
`python -m ocrmypdf dummy.pdf dummyocr.pdf`
The file dummyocr.pdf is generated as expected with actual text that can be selected.

If I run the following code in a python file as described in the docs:

```
import ocrmypdf
from ocrmypdf import OcrOptions
if __name__ == '__main__':
    
    options = OcrOptions(
        input_file="dummy.pdf",
        output_file="dummyocr2.pdf",
    )

    ocrmypdf.ocr(options)
```

The outpuf file dummyocr2.pdf still appears to have an SVG and the text is NOT selectable.


Here is the output from the command line:

```
Scanning contents    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 1/1 0:00:00
    1 [tesseract] lots of diacritics - possibly poor OCR                                                                                    tesseract.py:295
OCR                  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 1/1 0:00:00
Parsing 1 pages with HocrParser                                                                                                                _graft.py:334
Postprocessing...                                                                                                                                 ocr.py:156
[WinError 2] The system cannot find the file specified                                                                                        _windows.py:87
Auto mode: no verapdf available and input is not PDF/A, outputting PDF                                                                     _pipeline.py:1078
Linearizing          ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 100/100 0:00:00
Recompressing JPEGs  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% 0/0 -:--:--
Deflating JPEGs      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% 0/0 -:--:--
JBIG2                ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% 0/0 -:--:--
[WinError 2] The system cannot find the file specified                                                                                        _windows.py:87
[WinError 2] The system cannot find the file specified                                                                                        _windows.py:87
Image optimization ratio: 1.00 savings: 0.0%                                                                                               _pipeline.py:1175
Total file size ratio: 0.93 savings: -7.7%                                                                                                 _pipeline.py:1178
Output file is a PDF (auto mode)            
```

And here is the output from the API call:
```
Scanning contents    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 1/1 0:00:00
OCR                  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 1/1 0:00:00
[WinError 2] The system cannot find the file specified
Linearizing          ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 100/100 0:00:00
Recompressing JPEGs  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% 0/0 -:--:--
Deflating JPEGs      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% 0/0 -:--:--
JBIG2                ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% 0/0 -:--:--
[WinError 2] The system cannot find the file specified
[WinError 2] The system cannot find the file specified
```
Notably, the command line indicates "Parsing 1 pages with HocrParser" whereas the API call does not. Do I need to specify different params to trigger that? Or any other reason these are not working the same?

I have tried various combinations of options in the API call such as force_ocr=True, redo_ocr=True (not concurrently with force_ocr), languages=['eng',], output_type="pdf", but cannot reproduce what is done with the command line.

Note, I believe the error "The system cannot find the file specified" is because GhostScript is NOT installed.

Any help appreciated. Thank you!



### Steps to reproduce

```plain text
See description
```

### Files

[dummy.pdf](https://github.com/user-attachments/files/25327581/dummy.pdf)
[dummyocr.pdf](https://github.com/user-attachments/files/25327582/dummyocr.pdf)
[dummyocr2.pdf](https://github.com/user-attachments/files/25327580/dummyocr2.pdf)

### How did you download and install the software?

PyPI (pip, poetry, pipx, etc.)

### OCRmyPDF version

17.2.0

### Relevant log output

```plain text
see output in description
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Question]: difference between command line and ocrmypdf.ocr() API #1636

Describe the bug

Steps to reproduce

Files

How did you download and install the software?

OCRmyPDF version

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Question]: difference between command line and ocrmypdf.ocr() API #1636

Description

Describe the bug

Steps to reproduce

Files

How did you download and install the software?

OCRmyPDF version

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions