Simple sanity checks
Third party app name and version
ocrmypdf 16.12.0, Pillow (in venv), Python 3.13
Describe the bug
Environment: Paperless-NGX native install, Debian 13, Python venv at /opt/paperless/.venv
Describe the bug
When processing scanned PDFs (~10 MP, typical smartphone/scanner output), ocrmypdf raises a DecompressionBombError with an unexpectedly low limit:
DecompressionBombError: Image size (10385522 pixels) exceeds limit of 600 pixels,
could be decompression bomb DOS attack.
The limit of 600 is 2 * PIL.Image.MAX_IMAGE_PIXELS. This means MAX_IMAGE_PIXELS is being set to 300, which is clearly unintentional.
Root cause
In ocrmypdf/_validation.py:
pythondef check_options_pillow(options: Namespace) -> None:
PIL.Image.MAX_IMAGE_PIXELS = int(options.max_image_mpixels * 1_000_000)
if PIL.Image.MAX_IMAGE_PIXELS == 0:
PIL.Image.MAX_IMAGE_PIXELS = None
The default value of options.max_image_mpixels results in int(... * 1_000_000) = 300, which is far below any real-world image size. The guard if == 0 never triggers. The result is that all non-trivial images fail with a DecompressionBombError.
Expected behavior
The default should either preserve Pillow's own default (~89 MP) or be set to None (no limit). A limit of 300 pixels effectively breaks OCR for any real document.
Workaround
bashsed -i 's/PIL.Image.MAX_IMAGE_PIXELS = int(options.max_image_mpixels * 1_000_000)/PIL.Image.MAX_IMAGE_PIXELS = None/'
/path/to/.venv/lib/python3.13/site-packages/ocrmypdf/_validation.py
Steps to reproduce
1. Import attached file into Paperless-ngx
2. Trigger OCR
3. Check log file
4. ...
Files
No response
OCRmyPDF version
No response
Relevant log output
Simple sanity checks
Third party app name and version
ocrmypdf 16.12.0, Pillow (in venv), Python 3.13
Describe the bug
Environment: Paperless-NGX native install, Debian 13, Python venv at /opt/paperless/.venv
Describe the bug
When processing scanned PDFs (~10 MP, typical smartphone/scanner output), ocrmypdf raises a DecompressionBombError with an unexpectedly low limit:
DecompressionBombError: Image size (10385522 pixels) exceeds limit of 600 pixels,
could be decompression bomb DOS attack.
The limit of 600 is 2 * PIL.Image.MAX_IMAGE_PIXELS. This means MAX_IMAGE_PIXELS is being set to 300, which is clearly unintentional.
Root cause
In ocrmypdf/_validation.py:
pythondef check_options_pillow(options: Namespace) -> None:
PIL.Image.MAX_IMAGE_PIXELS = int(options.max_image_mpixels * 1_000_000)
if PIL.Image.MAX_IMAGE_PIXELS == 0:
PIL.Image.MAX_IMAGE_PIXELS = None
The default value of options.max_image_mpixels results in int(... * 1_000_000) = 300, which is far below any real-world image size. The guard if == 0 never triggers. The result is that all non-trivial images fail with a DecompressionBombError.
Expected behavior
The default should either preserve Pillow's own default (~89 MP) or be set to None (no limit). A limit of 300 pixels effectively breaks OCR for any real document.
Workaround
bashsed -i 's/PIL.Image.MAX_IMAGE_PIXELS = int(options.max_image_mpixels * 1_000_000)/PIL.Image.MAX_IMAGE_PIXELS = None/'
/path/to/.venv/lib/python3.13/site-packages/ocrmypdf/_validation.py
Steps to reproduce
Files
No response
OCRmyPDF version
No response
Relevant log output