Skip to content

[3rdparty]: Paperless-ngx fails on consuming a file #1495

@GooRoo

Description

@GooRoo

Simple sanity checks

  • This is an issue with an app that uses OCRmyPDF for OCR
  • I am using a recent version of the third party app
  • I will include a file that reproduces the issuse

Third party app name and version

Paperless-ngx 2.14.7

Describe the bug

Paperless can't consume a file.

Steps to reproduce

1. Import attached file into Paperless-ngx.
2. OCR is automatically triggered.
3. The process is failed with the following errors in log.

Files

o451229v21_160992A98S_202401.pdf

OCRmyPDF version

No response

Relevant log output

[2025-03-17 23:01:37,509] [ERROR] [paperless.consumer] Error occurred while consuming document o451229v21_160992A98S_202401.pdf: SubprocessOutputError: Ghostscript PDF/A rendering failed. See logs for more information.
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/ocrmypdf/_exec/ghostscript.py", line 288, in generate_pdfa
    p = run_polling_stderr(
        ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/ocrmypdf/subprocess/__init__.py", line 114, in run_polling_stderr
    raise CalledProcessError(proc.returncode, args, output=None, stderr=stderr)
subprocess.CalledProcessError: Command '['gs', '-dBATCH', '-dNOPAUSE', '-dSAFER', '-dCompatibilityLevel=1.6', '-sDEVICE=pdfwrite', '-dAutoRotatePages=/None', '-sColorConversionStrategy=RGB', '-dPDFSTOPONERROR', '-dAutoFilterColorImages=true', '-dAutoFilterGrayImages=true', '-dJPEGQ=95', '-dPDFA=2', '-dPDFACompatibilityPolicy=1', '-o', '/tmp/ocrmypdf.io.p49cqgey/pdfa.pdf', '-sstdout=%stderr', '/tmp/ocrmypdf.io.p49cqgey/pdfa.ps', '/tmp/ocrmypdf.io.p49cqgey/fix_docinfo.pdf']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/src/paperless/src/paperless_tesseract/parsers.py", line 382, in parse
    ocrmypdf.ocr(**args)
  File "/usr/local/lib/python3.12/site-packages/ocrmypdf/api.py", line 380, in ocr
    return run_pipeline(options=options, plugin_manager=plugin_manager)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/ocrmypdf/_pipelines/ocr.py", line 214, in run_pipeline
    return _run_pipeline(options, plugin_manager)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/ocrmypdf/_pipelines/ocr.py", line 181, in _run_pipeline
    optimize_messages = exec_concurrent(context, executor)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/ocrmypdf/_pipelines/ocr.py", line 145, in exec_concurrent
    pdf, messages = postprocess(pdf, context, executor)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/ocrmypdf/_pipelines/_common.py", line 453, in postprocess
    pdf_out = convert_to_pdfa(pdf_out, ps_stub_out, context)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/ocrmypdf/_pipeline.py", line 912, in convert_to_pdfa
    context.plugin_manager.hook.generate_pdfa(
  File "/usr/local/lib/python3.12/site-packages/pluggy/_hooks.py", line 513, in __call__
    return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/pluggy/_manager.py", line 120, in _hookexec
    return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/pluggy/_callers.py", line 139, in _multicall
    raise exception.with_traceback(exception.__traceback__)
  File "/usr/local/lib/python3.12/site-packages/pluggy/_callers.py", line 103, in _multicall
    res = hook_impl.function(*args)
          ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/ocrmypdf/builtin_plugins/ghostscript.py", line 131, in generate_pdfa
    ghostscript.generate_pdfa(
  File "/usr/local/lib/python3.12/site-packages/ocrmypdf/_exec/ghostscript.py", line 301, in generate_pdfa
    raise SubprocessOutputError('Ghostscript PDF/A rendering failed') from e
ocrmypdf.exceptions.SubprocessOutputError: Ghostscript PDF/A rendering failed

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/asgiref/sync.py", line 327, in main_wrap
    raise exc_info[1]
  File "/usr/src/paperless/src/documents/consumer.py", line 477, in run
    document_parser.parse(self.working_copy, mime_type, self.filename)
  File "/usr/src/paperless/src/paperless_tesseract/parsers.py", line 405, in parse
    raise ParseError(
documents.parsers.ParseError: SubprocessOutputError: Ghostscript PDF/A rendering failed. See logs for more information.
[2025-03-17 23:01:37,560] [ERROR] [paperless.tasks] ConsumeTaskPlugin failed: o451229v21_160992A98S_202401.pdf: Error occurred while consuming document o451229v21_160992A98S_202401.pdf: SubprocessOutputError: Ghostscript PDF/A rendering failed. See logs for more information.
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/ocrmypdf/_exec/ghostscript.py", line 288, in generate_pdfa
    p = run_polling_stderr(
        ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/ocrmypdf/subprocess/__init__.py", line 114, in run_polling_stderr
    raise CalledProcessError(proc.returncode, args, output=None, stderr=stderr)
subprocess.CalledProcessError: Command '['gs', '-dBATCH', '-dNOPAUSE', '-dSAFER', '-dCompatibilityLevel=1.6', '-sDEVICE=pdfwrite', '-dAutoRotatePages=/None', '-sColorConversionStrategy=RGB', '-dPDFSTOPONERROR', '-dAutoFilterColorImages=true', '-dAutoFilterGrayImages=true', '-dJPEGQ=95', '-dPDFA=2', '-dPDFACompatibilityPolicy=1', '-o', '/tmp/ocrmypdf.io.p49cqgey/pdfa.pdf', '-sstdout=%stderr', '/tmp/ocrmypdf.io.p49cqgey/pdfa.ps', '/tmp/ocrmypdf.io.p49cqgey/fix_docinfo.pdf']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/src/paperless/src/paperless_tesseract/parsers.py", line 382, in parse
    ocrmypdf.ocr(**args)
  File "/usr/local/lib/python3.12/site-packages/ocrmypdf/api.py", line 380, in ocr
    return run_pipeline(options=options, plugin_manager=plugin_manager)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/ocrmypdf/_pipelines/ocr.py", line 214, in run_pipeline
    return _run_pipeline(options, plugin_manager)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/ocrmypdf/_pipelines/ocr.py", line 181, in _run_pipeline
    optimize_messages = exec_concurrent(context, executor)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/ocrmypdf/_pipelines/ocr.py", line 145, in exec_concurrent
    pdf, messages = postprocess(pdf, context, executor)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/ocrmypdf/_pipelines/_common.py", line 453, in postprocess
    pdf_out = convert_to_pdfa(pdf_out, ps_stub_out, context)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/ocrmypdf/_pipeline.py", line 912, in convert_to_pdfa
    context.plugin_manager.hook.generate_pdfa(
  File "/usr/local/lib/python3.12/site-packages/pluggy/_hooks.py", line 513, in __call__
    return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/pluggy/_manager.py", line 120, in _hookexec
    return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/pluggy/_callers.py", line 139, in _multicall
    raise exception.with_traceback(exception.__traceback__)
  File "/usr/local/lib/python3.12/site-packages/pluggy/_callers.py", line 103, in _multicall
    res = hook_impl.function(*args)
          ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/ocrmypdf/builtin_plugins/ghostscript.py", line 131, in generate_pdfa
    ghostscript.generate_pdfa(
  File "/usr/local/lib/python3.12/site-packages/ocrmypdf/_exec/ghostscript.py", line 301, in generate_pdfa
    raise SubprocessOutputError('Ghostscript PDF/A rendering failed') from e
ocrmypdf.exceptions.SubprocessOutputError: Ghostscript PDF/A rendering failed

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/asgiref/sync.py", line 327, in main_wrap
    raise exc_info[1]
  File "/usr/src/paperless/src/documents/consumer.py", line 477, in run
    document_parser.parse(self.working_copy, mime_type, self.filename)
  File "/usr/src/paperless/src/paperless_tesseract/parsers.py", line 405, in parse
    raise ParseError(
documents.parsers.ParseError: SubprocessOutputError: Ghostscript PDF/A rendering failed. See logs for more information.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/src/paperless/src/documents/tasks.py", line 154, in consume_file
    msg = plugin.run()
          ^^^^^^^^^^^^
  File "/usr/src/paperless/src/documents/consumer.py", line 509, in run
    self._fail(
  File "/usr/src/paperless/src/documents/consumer.py", line 151, in _fail
    raise ConsumerError(f"{self.filename}: {log_message or message}") from exception
documents.consumer.ConsumerError: o451229v21_160992A98S_202401.pdf: Error occurred while consuming document o451229v21_160992A98S_202401.pdf: SubprocessOutputError: Ghostscript PDF/A rendering failed. See logs for more information.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions