Skip to content

[3rdparty]: paperless-ngx AttributeError: 'PdfInfo' object has no attribute '_pages' #1483

@remisharrock

Description

@remisharrock

Simple sanity checks

  • This is an issue with an app that uses OCRmyPDF for OCR
  • I am using a recent version of the third party app
  • I will include a file that reproduces the issuse

Third party app name and version

paperless-ngx 2.14.7

Describe the bug


[2025-02-20 23:51:47,676] [DEBUG] [paperless.parsing.tesseract] Calling OCRmyPDF with args: {'input_file': PosixPath('/tmp/paperless/paperless-ngx79qbunrp/DevisPellegrin_1_Plantation-1.pdf'), 'output_file': PosixPath('/tmp/paperless/paperless-2zyv5sx4/archive.pdf'), 'use_threads': True, 'jobs': 2, 'language': 'fra', 'output_type': 'pdfa', 'progress_bar': False, 'color_conversion_strategy': 'RGB', 'redo_ocr': True, 'clean': True, 'rotate_pages': True, 'rotate_pages_threshold': 12.0, 'sidecar': PosixPath('/tmp/paperless/paperless-2zyv5sx4/sidecar.txt')}

[2025-02-20 23:51:47,816] [DEBUG] [paperless.parsing.tesseract] Deleting directory /tmp/paperless/paperless-2zyv5sx4

[2025-02-20 23:51:47,819] [ERROR] [paperless.consumer] Error occurred while consuming document DevisPellegrin_1_Plantation-1.pdf: AttributeError: 'PdfInfo' object has no attribute '_pages'

DevisPellegrin_1_Plantation.pdf

Steps to reproduce

1. Import attached file into Paperless-ngx
2. Trigger OCR
3. Check log file

Files

I have added the pdf file above

OCRmyPDF version

ocrmypdf --version 16.8.0

Relevant log output


Metadata

Metadata

Assignees

Labels

triageIssue needs triage

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions