Hallo Leute,
weil mir die Funktionalität von Paperless-NGX so zusagt, habe ich eine weitere Instanz davon in einem weiteren LXC auf meinem Proxmox-Server angelegt, um Tausende von Lokalzeitungen, die bisher auf meiner Synology als PDF-Dokumente vorliegen, darin zu archivieren.
Eine eigene Instanz habe ich genutzt, um die Zeitungen getrennt von meinen anderen Dokumenten zu halten. Andererseits hilft mir Paperless-NGX dabei, gezielt in diesem Fundus zu recherchieren. Will ich schnell mal nachschlagen, wann ein Artikel zu unserer Straße erschien, ist das in wenigen Augenblicken gefunden. Interessiert es mich, wann eine Person verstorben ist, so lässt sich sehr schnell die entsprechende Todesanzeige finden. Einfach genial!
Nun habe ich aber ein Problem: Einzelne PDF-Dokumente lassen sich nicht in Paperless-NGX archivieren. Es erscheint die Fehlermeldung:
2026-01-04_Zeitung_04-01-2026.pdf: Error occurred while consuming document 2026-01-04_Zeitung_04-01-2026.pdf: SubprocessOutputError: Ghostscript PDF/A rendering failed. See logs for more information.
Im Log steht über dieses Dokument eine ganze Menge Zeug, mit dem ich ein wenig überfordert bin:
Error: /typecheck in --runpdf--
Operand stack:
--nostringval-- --nostringval-- 23
Execution stack:
%interp_exit .runexec2 --nostringval-- runpdf --nostringval-- 2 %stopped_push --nostringval-- runpdf runpdf false 1 %stopped_push 1949 1 3 %oparray_pop 1948 1 3 %oparray_pop 1933 1 3 %oparray_pop 1934 1 3 %oparray_pop runpdf runpdf runpdf 25 1 24 runpdf %for_pos_int_continue runpdf
Dictionary stack:
--dict:757/1123(ro)(G)-- --dict:0/20(G)-- --dict:87/200(L)-- --dict:7/10(L)--
Current allocation mode is local
GPL Ghostscript 10.05.1: Unrecoverable error, exit code 1
GPL Ghostscript 10.05.1: Page object was reserved for an Annotation destination, but no such page was drawn, annotation in output will be invalid.
[2026-01-04 13:15:50,400] [WARNING] [paperless.parsing.tesseract] Ghostscript PDF/A rendering failed, consider setting PAPERLESS_OCR_USER_ARGS: '{"continue_on_soft_render_error": true}'
[2026-01-04 13:15:50,400] [DEBUG] [paperless.parsing.tesseract] Deleting directory /tmp/paperless/paperless-g6le5gn8
[2026-01-04 13:15:50,410] [ERROR] [paperless.consumer] Error occurred while consuming document 2026-01-04_Zeitung_04-01-2026.pdf: SubprocessOutputError: Ghostscript PDF/A rendering failed. See logs for more information.
Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/ocrmypdf/_exec/ghostscript.py", line 300, in generate_pdfa
p = run_polling_stderr(
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/ocrmypdf/subprocess/__init__.py", line 114, in run_polling_stderr
raise CalledProcessError(proc.returncode, args, output=None, stderr=stderr)
subprocess.CalledProcessError: Command '['gs', '-dBATCH', '-dNOPAUSE', '-dSAFER', '-dCompatibilityLevel=1.6', '-sDEVICE=pdfwrite', '-dAutoRotatePages=/None', '-sColorConversionStrategy=RGB', '-dPDFSTOPONERROR', '-dAutoFilterColorImages=true', '-dAutoFilterGrayImages=true', '-dJPEGQ=95', '-dSubsetFonts=false', '-dPDFA=2', '-dPDFACompatibilityPolicy=1', '-o', '/tmp/ocrmypdf.io.i7y6n7ce/pdfa.pdf', '-sstdout=%stderr', '/tmp/ocrmypdf.io.i7y6n7ce/pdfa.ps', '/tmp/ocrmypdf.io.i7y6n7ce/fix_docinfo.pdf']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/src/paperless/src/paperless_tesseract/parsers.py", line 384, in parse
ocrmypdf.ocr(**args)
File "/usr/local/lib/python3.12/site-packages/ocrmypdf/api.py", line 414, in ocr
return run_pipeline(options=options, plugin_manager=plugin_manager)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/ocrmypdf/_pipelines/ocr.py", line 214, in run_pipeline
return _run_pipeline(options, plugin_manager)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/ocrmypdf/_pipelines/ocr.py", line 181, in _run_pipeline
optimize_messages = exec_concurrent(context, executor)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/ocrmypdf/_pipelines/ocr.py", line 145, in exec_concurrent
pdf, messages = postprocess(pdf, context, executor)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/ocrmypdf/_pipelines/_common.py", line 453, in postprocess
pdf_out = convert_to_pdfa(pdf_out, ps_stub_out, context)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/ocrmypdf/_pipeline.py", line 912, in convert_to_pdfa
context.plugin_manager.hook.generate_pdfa(
File "/usr/local/lib/python3.12/site-packages/pluggy/_hooks.py", line 512, in __call__
return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/pluggy/_manager.py", line 120, in _hookexec
return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/pluggy/_callers.py", line 167, in _multicall
raise exception
File "/usr/local/lib/python3.12/site-packages/pluggy/_callers.py", line 121, in _multicall
res = hook_impl.function(*args)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/ocrmypdf/builtin_plugins/ghostscript.py", line 131, in generate_pdfa
ghostscript.generate_pdfa(
File "/usr/local/lib/python3.12/site-packages/ocrmypdf/_exec/ghostscript.py", line 313, in generate_pdfa
raise SubprocessOutputError('Ghostscript PDF/A rendering failed') from e
ocrmypdf.exceptions.SubprocessOutputError: Ghostscript PDF/A rendering failed
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/asgiref/sync.py", line 298, in main_wrap
raise exc_info[1]
File "/usr/src/paperless/src/documents/consumer.py", line 405, in run
document_parser.parse(self.working_copy, mime_type, self.filename)
File "/usr/src/paperless/src/paperless_tesseract/parsers.py", line 407, in parse
raise ParseError(
documents.parsers.ParseError: SubprocessOutputError: Ghostscript PDF/A rendering failed. See logs for more information.
[2026-01-04 13:15:50,529] [ERROR] [paperless.tasks] ConsumeTaskPlugin failed: 2026-01-04_Zeitung_04-01-2026.pdf: Error occurred while consuming document 2026-01-04_Zeitung_04-01-2026.pdf: SubprocessOutputError: Ghostscript PDF/A rendering failed. See logs for more information.
Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/ocrmypdf/_exec/ghostscript.py", line 300, in generate_pdfa
p = run_polling_stderr(
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/ocrmypdf/subprocess/__init__.py", line 114, in run_polling_stderr
raise CalledProcessError(proc.returncode, args, output=None, stderr=stderr)
subprocess.CalledProcessError: Command '['gs', '-dBATCH', '-dNOPAUSE', '-dSAFER', '-dCompatibilityLevel=1.6', '-sDEVICE=pdfwrite', '-dAutoRotatePages=/None', '-sColorConversionStrategy=RGB', '-dPDFSTOPONERROR', '-dAutoFilterColorImages=true', '-dAutoFilterGrayImages=true', '-dJPEGQ=95', '-dSubsetFonts=false', '-dPDFA=2', '-dPDFACompatibilityPolicy=1', '-o', '/tmp/ocrmypdf.io.i7y6n7ce/pdfa.pdf', '-sstdout=%stderr', '/tmp/ocrmypdf.io.i7y6n7ce/pdfa.ps', '/tmp/ocrmypdf.io.i7y6n7ce/fix_docinfo.pdf']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/src/paperless/src/paperless_tesseract/parsers.py", line 384, in parse
ocrmypdf.ocr(**args)
File "/usr/local/lib/python3.12/site-packages/ocrmypdf/api.py", line 414, in ocr
return run_pipeline(options=options, plugin_manager=plugin_manager)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/ocrmypdf/_pipelines/ocr.py", line 214, in run_pipeline
return _run_pipeline(options, plugin_manager)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/ocrmypdf/_pipelines/ocr.py", line 181, in _run_pipeline
optimize_messages = exec_concurrent(context, executor)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/ocrmypdf/_pipelines/ocr.py", line 145, in exec_concurrent
pdf, messages = postprocess(pdf, context, executor)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/ocrmypdf/_pipelines/_common.py", line 453, in postprocess
pdf_out = convert_to_pdfa(pdf_out, ps_stub_out, context)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/ocrmypdf/_pipeline.py", line 912, in convert_to_pdfa
context.plugin_manager.hook.generate_pdfa(
File "/usr/local/lib/python3.12/site-packages/pluggy/_hooks.py", line 512, in __call__
return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/pluggy/_manager.py", line 120, in _hookexec
return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/pluggy/_callers.py", line 167, in _multicall
raise exception
File "/usr/local/lib/python3.12/site-packages/pluggy/_callers.py", line 121, in _multicall
res = hook_impl.function(*args)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/ocrmypdf/builtin_plugins/ghostscript.py", line 131, in generate_pdfa
ghostscript.generate_pdfa(
File "/usr/local/lib/python3.12/site-packages/ocrmypdf/_exec/ghostscript.py", line 313, in generate_pdfa
raise SubprocessOutputError('Ghostscript PDF/A rendering failed') from e
ocrmypdf.exceptions.SubprocessOutputError: Ghostscript PDF/A rendering failed
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/asgiref/sync.py", line 298, in main_wrap
raise exc_info[1]
File "/usr/src/paperless/src/documents/consumer.py", line 405, in run
document_parser.parse(self.working_copy, mime_type, self.filename)
File "/usr/src/paperless/src/paperless_tesseract/parsers.py", line 407, in parse
raise ParseError(
documents.parsers.ParseError: SubprocessOutputError: Ghostscript PDF/A rendering failed. See logs for more information.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/src/paperless/src/documents/tasks.py", line 183, in consume_file
msg = plugin.run()
^^^^^^^^^^^^
File "/usr/src/paperless/src/documents/consumer.py", line 437, in run
self._fail(
File "/usr/src/paperless/src/documents/consumer.py", line 148, in _fail
raise ConsumerError(f"{self.filename}: {log_message or message}") from exception
documents.consumer.ConsumerError: 2026-01-04_Zeitung_04-01-2026.pdf: Error occurred while consuming document 2026-01-04_Zeitung_04-01-2026.pdf: SubprocessOutputError: Ghostscript PDF/A rendering failed. See logs for more information.
[2026-01-04 13:15:51,947] [INFO] [paperless.tasks] No automatic matching items, not training
Kennt sich jemand mit python aus und kann mir helfen, auch solche Zeitungen zu importieren? Andernfalls müsste ich diese Sonntagsausgaben weiterhin im PDF-Ordner lassen.
Pfiffikus,
dem bewusst ist, dass es sich hier nur um ein Luxusproblem handelt