Probleme mit der Konsumation / Frage ganz allgemein

Hello everyone
Irgendwie will meine Paperless Installation (genau nach Vorgabe in der Masterclass installiert inkl. Tika/Gotenberg) nicht immer PDF’s konsumieren. Ohne erkenntliche Signatur auf dem PDF oder ohne, dass sie zu gross (<10MB) ist, erscheint eine Fehlermeldung in den Dateiaufgaben, welche für mich nicht näher klar ist (roter Balken auf der Startseite unten rechts mit dem Hinweis es handle sich um ein Duplikat oder PDF konnte nicht konsumiert weden). Wenn ich in der .env die Konsumation trotz Duplikat auf true setze bringt das keine Verbesserung. PAPERLESS_CONSUME_IGNORE_DUPLICATES=True
Im gestrigen konkreten Fall konnte es sich nicht um Duplikat handeln, weil es ein soeben erhaltenes PDF war.

Der Fehler tritt auf, wenn man das Dokument per Drag&Drop oder via Upload direkt auf Paperless-NGX hochlädt. Mit dem Netzwerkscanner ist mir das bislang noch nie passiert und da heissen die Dateien auch immer gleich. Ich habe auch versucht die PDF’s zuerst in PDF/A umzuwandeln und dann hochzuladen. Das hat auch nichts gebracht.

Gibt es eine Konfiguration in Paperless-NGX, welche diesem Problem Abhilfe schafft? Sonst verliert man irgendwie das Vertrauen, dass hochgeladene Dokumente schlussendlich gar nicht durchsuchbar in Paperless-NGX landen und man muss immer beim Hochladen kontrollieren, dass die PDF’s nach und nach konsumiert werden. Das wird irgendwie mühsam, wenn man 20 aufs mal hochladen will.

Weiss nicht, ob ich mich als relativ frischer Paperless Laie verständlich ausgedrückt habe. Sonst fragt einfach nach. Und habt ihr dieses Problem auch?

Eine Fehlermeldung erscheint manchmal auch mit .docx Dateien:
Konnte Lernziele_Deutsch_2_670fa173a285cf0c37512c80.docx nicht hinzufügen: Lernziele_Deutsch_2_670fa173a285cf0c37512c80.docx: Error occurred while consuming document Lernziele_Deutsch_2_670fa173a285cf0c37512c80.docx: Error while converting document to PDF: Server error '503 Service Unavailable' for url 'http://gotenberg:3000/forms/libreoffice/convert' For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/503

Hi!

Kannst du einmal die Logs zeigen, die das Problem beschreiben?
Die Logs findest du links in der Seitenleiste bei den Protokollen.

Das Thema mit .docx ist ein anderer Fehler (gotenberg), lass uns das am besten nicht vermischen, sonst wird es unübersichtlich. Du kannst hierzu gerne ein zweites Thema öffnen.

Hallo Stefan

Noch kurz ein Update. Ich war nicht so geduldig und habe aus dem Backup die die 13 Datenbankversion gezogen und diese dann auf die 15 geupdated. Es schien dann alles zu funktionieren und dachte es habe am Update auf die SQL 16 gelegen. Somit habe ich keine Logs vom Einsatz mit der 16er Version. Jetzt habe ich jedoch das gleiche Problem auch mit dem Scanner.

Diesen Moment wollte ich (paperless: latest, redis:7, postgres:15/.env und .yml aus der Masterclass) mit dem Scanner der gestern noch seine Arbeit getan (postgres:15) hat, ein 22 Seitiges Dokument einscannen. In den Dateiaufgaben sieht man, wie er beginnt zu arbeiten und die Konsumation dann in „Abgeschlossen erscheint“. Wenn ich den Dateiaufgaben direkt auf Dokument öffnen klicke erscheint „nicht gefunden 404“. In der Inbox erscheint sie nicht. Und dann in den Dateiaufgaben auch nicht mehr. Meine Synology RS822+ läuft im Moment mit 2GB RAM und ohne SSD-Cache. Weiss nicht, ob das auch einen Einfluss hat.

Danke für Deine Hilfe.

Die Logs sehen wie folgt aus:

[2024-10-21 16:26:56,011] [ERROR] [ocrmypdf._exec.ghostscript] file that it does not conform to Adobe's published PDF

[2024-10-21 16:26:56,011] [ERROR] [ocrmypdf._exec.ghostscript] specification.


[2024-10-21 16:27:00,592] [INFO] [ocrmypdf.optimize] Image optimization did not improve the file - optimizations will not be used
[2024-10-21 16:27:02,972] [INFO] [ocrmypdf._pipeline] Image optimization ratio: 0.98 savings: -1.8%
[2024-10-21 16:27:02,973] [INFO] [ocrmypdf._pipeline] Total file size ratio: 0.99 savings: -0.5%
[2024-10-21 16:27:03,026] [INFO] [ocrmypdf._pipelines._common] Output file is a PDF/A-2B (as expected)
[2024-10-21 16:27:08,402] [DEBUG] [paperless.parsing.tesseract] Incomplete sidecar file: discarding.
[2024-10-21 16:27:12,431] [INFO] [paperless.parsing.tesseract] pdftotext exited 0
[2024-10-21 16:27:12,525] [DEBUG] [paperless.consumer] Generating thumbnail for 0458-806-9621-C_ZBA_05_01.pdf...
[2024-10-21 16:27:12,568] [DEBUG] [paperless.parsing] Execute: convert -density 300 -scale 500x5000> -alpha remove -strip -auto-orient -define pdf:use-cropbox=true /tmp/paperless/paperless-tj13k0y_/archive.pdf[0] /tmp/paperless/paperless-tj13k0y_/convert.webp
[2024-10-21 16:27:14,982] [INFO] [paperless.parsing] convert exited 0
[2024-10-21 16:27:39,612] [DEBUG] [paperless.consumer] Saving record to database
[2024-10-21 16:27:39,626] [DEBUG] [paperless.consumer] Creation date from parse_date: 2024-04-30 00:00:00+02:00
[2024-10-21 16:27:47,448] [INFO] [paperless.handlers] Assigning document type Benutzerhandbuch to 2024-04-30 0458-806-9621-C_ZBA_05_01
[2024-10-21 16:27:51,574] [INFO] [paperless.matching] Document did not match Workflow: Benutzerdefinierte Felder in Rechnungen
[2024-10-21 16:27:51,585] [DEBUG] [paperless.matching] ('Document doc type Benutzerhandbuch does not match Rechnung',)
[2024-10-21 16:27:51,599] [INFO] [paperless.matching] Document did not match Workflow: Steuerrelevant in Quittung
[2024-10-21 16:27:51,600] [DEBUG] [paperless.matching] ('Document doc type Benutzerhandbuch does not match Quittung',)
[2024-10-21 16:27:51,618] [INFO] [paperless.matching] Document did not match Workflow: Betrag in Unterhaltskosten
[2024-10-21 16:27:51,619] [DEBUG] [paperless.matching] ('Document tags <QuerySet [<Tag: Inbox>]> do not include <QuerySet [<Tag: Unterhaltskosten>]>',)
[2024-10-21 16:27:51,631] [INFO] [paperless.matching] Document did not match Workflow: Helsana Tag Gesundheit und Versicherung zuweisen
[2024-10-21 16:27:51,632] [DEBUG] [paperless.matching] ('Document correspondent None does not match Helsana Versicherungen AG',)
[2024-10-21 16:27:51,910] [DEBUG] [paperless.consumer] Deleting file /tmp/paperless/paperless-ngx2l6m_nr1/0458-806-9621-C_ZBA_05_01.pdf
[2024-10-21 16:27:52,743] [DEBUG] [paperless.parsing.tesseract] Deleting directory /tmp/paperless/paperless-tj13k0y_
[2024-10-21 16:27:52,746] [INFO] [paperless.consumer] Document 2024-04-30 0458-806-9621-C_ZBA_05_01 consumption finished
[2024-10-21 16:27:53,121] [INFO] [paperless.tasks] ConsumeTaskPlugin completed with: Success. New document id 242 created
[2024-10-21 16:28:51,154] [INFO] [paperless.matching] Document did not match Workflow: Benutzerdefinierte Felder in Rechnungen
[2024-10-21 16:28:51,160] [DEBUG] [paperless.matching] ('Document doc type Benutzerhandbuch does not match Rechnung',)
[2024-10-21 16:28:51,172] [INFO] [paperless.matching] Document did not match Workflow: Steuerrelevant in Quittung
[2024-10-21 16:28:51,173] [DEBUG] [paperless.matching] ('Document doc type Benutzerhandbuch does not match Quittung',)
[2024-10-21 16:28:51,192] [INFO] [paperless.matching] Document did not match Workflow: Betrag in Unterhaltskosten
[2024-10-21 16:28:51,193] [DEBUG] [paperless.matching] ('Document tags <QuerySet [<Tag: Garten>, <Tag: Werkzeuge>]> do not include <QuerySet [<Tag: Unterhaltskosten>]>',)
[2024-10-21 16:28:51,207] [INFO] [paperless.matching] Document did not match Workflow: Helsana Tag Gesundheit und Versicherung zuweisen
[2024-10-21 16:28:51,208] [DEBUG] [paperless.matching] ('Document correspondent Stihl does not match Helsana Versicherungen AG',)
[2024-10-21 16:35:25,522] [INFO] [paperless.management.consumer] Adding /usr/src/paperless/consume/Scannen.pdf to the task queue.
[2024-10-21 16:35:26,665] [DEBUG] [paperless.tasks] Skipping plugin CollatePlugin
[2024-10-21 16:35:26,665] [DEBUG] [paperless.tasks] Executing plugin BarcodePlugin
[2024-10-21 16:35:26,666] [DEBUG] [paperless.barcodes] Scanning for barcodes using PYZBAR
[2024-10-21 16:35:26,871] [DEBUG] [paperless.barcodes] PDF has 22 pages
[2024-10-21 16:35:26,872] [DEBUG] [paperless.barcodes] Processing page 0
[2024-10-21 16:35:28,326] [DEBUG] [paperless.barcodes] Image is at /tmp/paperless/tmp6qydq626/barcode_4fzs8c0/e3fda87c-b039-44a7-adf4-27c018b6ddcf-01.ppm
[2024-10-21 16:35:29,731] [DEBUG] [paperless.barcodes] Processing page 1
[2024-10-21 16:35:30,367] [DEBUG] [paperless.barcodes] Image is at /tmp/paperless/tmp6qydq626/barcode_4fzs8c0/bbe88c0c-cd99-4ce4-9291-cf2b2875de48-02.ppm
[2024-10-21 16:35:30,668] [DEBUG] [paperless.barcodes] Processing page 2
[2024-10-21 16:35:34,895] [DEBUG] [paperless.barcodes] Image is at /tmp/paperless/tmp6qydq626/barcode_4fzs8c0/c1332cff-9b9a-40d5-adc3-3b5412d10b63-03.ppm
[2024-10-21 16:35:35,537] [DEBUG] [paperless.barcodes] Processing page 3
[2024-10-21 16:35:37,225] [DEBUG] [paperless.barcodes] Image is at /tmp/paperless/tmp6qydq626/barcode_4fzs8c0/7999a0d8-8df0-4daf-8a73-652d81866db1-04.ppm
[2024-10-21 16:35:37,600] [DEBUG] [paperless.barcodes] Processing page 4
[2024-10-21 16:35:38,579] [DEBUG] [paperless.barcodes] Image is at /tmp/paperless/tmp6qydq626/barcode_4fzs8c0/fe65d3fa-f0cc-421f-84d8-68f582fb822b-05.ppm
[2024-10-21 16:35:38,918] [DEBUG] [paperless.barcodes] Processing page 5
[2024-10-21 16:35:39,617] [DEBUG] [paperless.barcodes] Image is at /tmp/paperless/tmp6qydq626/barcode_4fzs8c0/47e98eed-8aa2-4804-a25c-5c681c678ea1-06.ppm
[2024-10-21 16:35:40,296] [DEBUG] [paperless.barcodes] Processing page 6
[2024-10-21 16:35:40,985] [DEBUG] [paperless.barcodes] Image is at /tmp/paperless/tmp6qydq626/barcode_4fzs8c0/f42cd5a5-261e-4924-a917-e11ffed97016-07.ppm
[2024-10-21 16:35:41,532] [DEBUG] [paperless.barcodes] Processing page 7
[2024-10-21 16:35:42,076] [DEBUG] [paperless.barcodes] Image is at /tmp/paperless/tmp6qydq626/barcode_4fzs8c0/33d28c3d-1fc0-4b41-b75f-d6e59b471499-08.ppm
[2024-10-21 16:35:42,488] [DEBUG] [paperless.barcodes] Processing page 8
[2024-10-21 16:35:43,051] [DEBUG] [paperless.barcodes] Image is at /tmp/paperless/tmp6qydq626/barcode_4fzs8c0/fe2dab1e-09c4-48d9-8e49-d92dff34d0fc-09.ppm
[2024-10-21 16:35:43,459] [DEBUG] [paperless.barcodes] Processing page 9
[2024-10-21 16:35:43,948] [DEBUG] [paperless.barcodes] Image is at /tmp/paperless/tmp6qydq626/barcode_4fzs8c0/f85e8a14-8f61-42ec-bde9-909d60521787-10.ppm
[2024-10-21 16:35:44,305] [DEBUG] [paperless.barcodes] Processing page 10
[2024-10-21 16:35:44,797] [DEBUG] [paperless.barcodes] Image is at /tmp/paperless/tmp6qydq626/barcode_4fzs8c0/56a8571c-cf65-4efd-a30d-93763a29e3e1-11.ppm
[2024-10-21 16:35:45,176] [DEBUG] [paperless.barcodes] Processing page 11
[2024-10-21 16:35:45,659] [DEBUG] [paperless.barcodes] Image is at /tmp/paperless/tmp6qydq626/barcode_4fzs8c0/cfdf204d-78e2-439c-a7c3-f3efe3cc1274-12.ppm
[2024-10-21 16:35:46,003] [DEBUG] [paperless.barcodes] Processing page 12
[2024-10-21 16:35:46,503] [DEBUG] [paperless.barcodes] Image is at /tmp/paperless/tmp6qydq626/barcode_4fzs8c0/ce3723b3-fa1e-4ee6-ae9b-4c5c2e3a9e98-13.ppm
[2024-10-21 16:35:46,919] [DEBUG] [paperless.barcodes] Processing page 13
[2024-10-21 16:35:47,405] [DEBUG] [paperless.barcodes] Image is at /tmp/paperless/tmp6qydq626/barcode_4fzs8c0/927c9987-d13b-4281-bd6f-252120f07e94-14.ppm
[2024-10-21 16:35:47,746] [DEBUG] [paperless.barcodes] Processing page 14
[2024-10-21 16:35:48,236] [DEBUG] [paperless.barcodes] Image is at /tmp/paperless/tmp6qydq626/barcode_4fzs8c0/76cbeacf-6b62-4fa0-b6b7-8b57ed540273-15.ppm
[2024-10-21 16:35:48,617] [DEBUG] [paperless.barcodes] Processing page 15
[2024-10-21 16:35:49,109] [DEBUG] [paperless.barcodes] Image is at /tmp/paperless/tmp6qydq626/barcode_4fzs8c0/8b4961ae-40a7-48df-a423-62ddf47c73f6-16.ppm
[2024-10-21 16:35:49,453] [DEBUG] [paperless.barcodes] Processing page 16
[2024-10-21 16:35:49,950] [DEBUG] [paperless.barcodes] Image is at /tmp/paperless/tmp6qydq626/barcode_4fzs8c0/0a6ffd91-1d06-458c-af74-2326f7bbb5e5-17.ppm
[2024-10-21 16:35:50,365] [DEBUG] [paperless.barcodes] Processing page 17
[2024-10-21 16:35:50,850] [DEBUG] [paperless.barcodes] Image is at /tmp/paperless/tmp6qydq626/barcode_4fzs8c0/1385eadd-cb8f-4577-aba0-27bb0bbeff94-18.ppm
[2024-10-21 16:35:51,188] [DEBUG] [paperless.barcodes] Processing page 18
[2024-10-21 16:35:51,676] [DEBUG] [paperless.barcodes] Image is at /tmp/paperless/tmp6qydq626/barcode_4fzs8c0/5e2ed7ba-ac6b-46b6-81af-5a9b32c90e39-19.ppm
[2024-10-21 16:35:52,068] [DEBUG] [paperless.barcodes] Processing page 19
[2024-10-21 16:35:52,551] [DEBUG] [paperless.barcodes] Image is at /tmp/paperless/tmp6qydq626/barcode_4fzs8c0/3b920fb1-8418-40df-818a-1d7a2cbcf75f-20.ppm
[2024-10-21 16:35:52,934] [DEBUG] [paperless.barcodes] Processing page 20
[2024-10-21 16:35:53,495] [DEBUG] [paperless.barcodes] Image is at /tmp/paperless/tmp6qydq626/barcode_4fzs8c0/2f8637d4-2a96-4d5e-8f6f-44bfdfdc64ea-21.ppm
[2024-10-21 16:35:53,827] [DEBUG] [paperless.barcodes] Processing page 21
[2024-10-21 16:35:54,318] [DEBUG] [paperless.barcodes] Image is at /tmp/paperless/tmp6qydq626/barcode_4fzs8c0/446aec80-adcd-4a0a-923f-d377c090777b-22.ppm
[2024-10-21 16:35:54,774] [INFO] [paperless.tasks] BarcodePlugin completed with no message
[2024-10-21 16:35:54,809] [DEBUG] [paperless.tasks] Executing plugin WorkflowTriggerPlugin
[2024-10-21 16:35:55,041] [INFO] [paperless.matching] Document did not match Workflow: Benutzerdefinierte Felder in Rechnungen
[2024-10-21 16:35:55,042] [DEBUG] [paperless.matching] No matching triggers with type 1 found
[2024-10-21 16:35:55,044] [INFO] [paperless.matching] Document did not match Workflow: Steuerrelevant in Quittung
[2024-10-21 16:35:55,044] [DEBUG] [paperless.matching] No matching triggers with type 1 found
[2024-10-21 16:35:55,046] [INFO] [paperless.matching] Document did not match Workflow: Betrag in Unterhaltskosten
[2024-10-21 16:35:55,047] [DEBUG] [paperless.matching] No matching triggers with type 1 found
[2024-10-21 16:35:55,049] [INFO] [paperless.matching] Document did not match Workflow: Helsana Tag Gesundheit und Versicherung zuweisen
[2024-10-21 16:35:55,049] [DEBUG] [paperless.matching] No matching triggers with type 1 found
[2024-10-21 16:35:55,049] [INFO] [paperless.tasks] WorkflowTriggerPlugin completed with:
[2024-10-21 16:35:55,050] [DEBUG] [paperless.tasks] Executing plugin ConsumeTaskPlugin
[2024-10-21 16:35:55,263] [INFO] [paperless.consumer] Consuming Scannen.pdf
[2024-10-21 16:35:55,342] [DEBUG] [paperless.consumer] Detected mime type: application/pdf
[2024-10-21 16:35:55,396] [DEBUG] [paperless.consumer] Parser: RasterisedDocumentParser
[2024-10-21 16:35:55,400] [DEBUG] [paperless.consumer] Parsing Scannen.pdf...
[2024-10-21 16:35:55,429] [INFO] [paperless.parsing.tesseract] pdftotext exited 0
[2024-10-21 16:35:57,463] [DEBUG] [paperless.parsing.tesseract] Calling OCRmyPDF with args: {'input_file': PosixPath('/tmp/paperless/paperless-ngx3357wguq/Scannen.pdf'), 'output_file': PosixPath('/tmp/paperless/paperless-d7n7o7bk/archive.pdf'), 'use_threads': True, 'jobs': 8, 'language': 'deu', 'output_type': 'pdfa', 'progress_bar': False, 'color_conversion_strategy': 'RGB', 'skip_text': True, 'clean': True, 'deskew': True, 'rotate_pages': True, 'rotate_pages_threshold': 12.0, 'sidecar': PosixPath('/tmp/paperless/paperless-d7n7o7bk/sidecar.txt')}
[2024-10-21 16:35:59,701] [INFO] [ocrmypdf._pipelines.ocr] Start processing 8 pages concurrently
[2024-10-21 16:36:35,197] [INFO] [ocrmypdf._pipeline] page is facing ⇧, confidence 12.73 - rotation appears correct
[2024-10-21 16:36:35,232] [INFO] [ocrmypdf._pipeline] page is facing ⇧, confidence 15.17 - rotation appears correct
[2024-10-21 16:36:35,209] [INFO] [ocrmypdf._pipeline] page is facing ⇧, confidence 12.30 - rotation appears correct
[2024-10-21 16:36:35,232] [INFO] [ocrmypdf._pipeline] page is facing ⇧, confidence 13.10 - rotation appears correct
[2024-10-21 16:36:35,197] [INFO] [ocrmypdf._pipeline] page is facing ⇧, confidence 14.22 - rotation appears correct
[2024-10-21 16:36:35,233] [INFO] [ocrmypdf._pipeline] page is facing ⇧, confidence 12.57 - rotation appears correct
[2024-10-21 16:36:35,249] [INFO] [ocrmypdf._pipeline] page is facing ⇧, confidence 13.91 - rotation appears correct
[2024-10-21 16:36:35,219] [INFO] [ocrmypdf._pipeline] page is facing ⇧, confidence 12.95 - rotation appears correct
[2024-10-21 16:40:21,018] [INFO] [ocrmypdf._pipeline] page is facing ⇧, confidence 12.31 - rotation appears correct
[2024-10-21 16:40:21,007] [INFO] [ocrmypdf._pipeline] page is facing ⇧, confidence 12.79 - rotation appears correct
[2024-10-21 16:40:21,069] [INFO] [ocrmypdf._pipeline] page is facing ⇧, confidence 12.23 - rotation appears correct
[2024-10-21 16:40:21,282] [INFO] [ocrmypdf._pipeline] page is facing ⇧, confidence 13.42 - rotation appears correct
[2024-10-21 16:40:21,282] [INFO] [ocrmypdf._pipeline] page is facing ⇧, confidence 11.00 - no change
[2024-10-21 16:40:21,283] [INFO] [ocrmypdf._pipeline] page is facing ⇧, confidence 11.26 - no change
[2024-10-21 16:40:21,306] [INFO] [ocrmypdf._pipeline] page is facing ⇧, confidence 12.82 - rotation appears correct
[2024-10-21 16:40:21,898] [INFO] [ocrmypdf._pipeline] page is facing ⇧, confidence 11.71 - no change
[2024-10-21 16:41:59,464] [INFO] [ocrmypdf._pipeline] page is facing ⇧, confidence 10.58 - no change
[2024-10-21 16:41:59,612] [INFO] [ocrmypdf._pipeline] page is facing ⇧, confidence 13.16 - rotation appears correct
[2024-10-21 16:41:59,628] [INFO] [ocrmypdf._pipeline] page is facing ⇧, confidence 11.82 - no change
[2024-10-21 16:41:59,703] [INFO] [ocrmypdf._pipeline] page is facing ⇧, confidence 12.24 - rotation appears correct
[2024-10-21 16:41:59,703] [INFO] [ocrmypdf._pipeline] page is facing ⇧, confidence 12.73 - rotation appears correct
[2024-10-21 16:41:59,829] [INFO] [ocrmypdf._pipeline] page is facing ⇧, confidence 14.45 - rotation appears correct
[2024-10-21 16:42:59,733] [INFO] [ocrmypdf._pipelines.ocr] Postprocessing...
[2024-10-21 16:43:33,488] [INFO] [ocrmypdf._pipeline] Image optimization ratio: 1.13 savings: 11.3%
[2024-10-21 16:43:33,489] [INFO] [ocrmypdf._pipeline] Total file size ratio: 1.16 savings: 14.1%
[2024-10-21 16:43:33,618] [INFO] [ocrmypdf._pipelines._common] Output file is a PDF/A-2B (as expected)
[2024-10-21 16:43:51,185] [DEBUG] [paperless.parsing.tesseract] Using text from sidecar file
[2024-10-21 16:43:51,318] [DEBUG] [paperless.consumer] Generating thumbnail for Scannen.pdf...
[2024-10-21 16:43:52,212] [DEBUG] [paperless.parsing] Execute: convert -density 300 -scale 500x5000> -alpha remove -strip -auto-orient -define pdf:use-cropbox=true /tmp/paperless/paperless-d7n7o7bk/archive.pdf[0] /tmp/paperless/paperless-d7n7o7bk/convert.webp
[2024-10-21 16:43:55,033] [INFO] [paperless.parsing] convert exited 0
[2024-10-21 16:44:06,728] [DEBUG] [paperless.consumer] Saving record to database
[2024-10-21 16:44:06,728] [DEBUG] [paperless.consumer] Creation date from parse_date: 2024-10-01 00:00:00+02:00
[2024-10-21 16:44:08,902] [INFO] [paperless.handlers] Assigning correspondent Helsana Versicherungen AG to 2024-10-01 Scannen
[2024-10-21 16:44:08,991] [INFO] [paperless.handlers] Assigning document type Korrespondenz to 2024-10-01 Helsana Versicherungen AG Scannen
[2024-10-21 16:44:09,094] [INFO] [paperless.handlers] Tagging "2024-10-01 Helsana Versicherungen AG Scannen" with "Versicherung"
[2024-10-21 16:44:11,736] [INFO] [paperless.matching] Document did not match Workflow: Benutzerdefinierte Felder in Rechnungen
[2024-10-21 16:44:11,736] [DEBUG] [paperless.matching] ('Document doc type Korrespondenz does not match Rechnung',)
[2024-10-21 16:44:11,747] [INFO] [paperless.matching] Document did not match Workflow: Steuerrelevant in Quittung
[2024-10-21 16:44:11,748] [DEBUG] [paperless.matching] ('Document doc type Korrespondenz does not match Quittung',)
[2024-10-21 16:44:11,763] [INFO] [paperless.matching] Document did not match Workflow: Betrag in Unterhaltskosten
[2024-10-21 16:44:11,763] [DEBUG] [paperless.matching] ('Document tags <QuerySet [<Tag: Inbox>, <Tag: Versicherung>]> do not include <QuerySet [<Tag: Unterhaltskosten>]>',)
[2024-10-21 16:44:11,773] [INFO] [paperless.matching] Document matched WorkflowTrigger 9 from Workflow: Helsana Tag Gesundheit und Versicherung zuweisen
[2024-10-21 16:44:11,774] [INFO] [paperless.handlers] Applying WorkflowAction 4 from Workflow: Helsana Tag Gesundheit und Versicherung zuweisen
[2024-10-21 16:44:11,813] [INFO] [paperless.handlers] Applying WorkflowAction 5 from Workflow: Helsana Tag Gesundheit und Versicherung zuweisen
[2024-10-21 16:44:12,181] [DEBUG] [paperless.consumer] Deleting file /tmp/paperless/paperless-ngx3357wguq/Scannen.pdf
[2024-10-21 16:44:12,286] [DEBUG] [paperless.parsing.tesseract] Deleting directory /tmp/paperless/paperless-d7n7o7bk
[2024-10-21 16:44:12,289] [INFO] [paperless.consumer] Document 2024-10-01 Helsana Versicherungen AG Scannen consumption finished
[2024-10-21 16:44:12,296] [INFO] [paperless.tasks] ConsumeTaskPlugin completed with: Success. New document id 243 created
[2024-10-21 16:47:02,137] [INFO] [paperless.management.consumer] Adding /usr/src/paperless/consume/Scannen.pdf to the task queue.
[2024-10-21 16:47:03,052] [DEBUG] [paperless.tasks] Skipping plugin CollatePlugin
[2024-10-21 16:47:03,053] [DEBUG] [paperless.tasks] Executing plugin BarcodePlugin
[2024-10-21 16:47:03,053] [DEBUG] [paperless.barcodes] Scanning for barcodes using PYZBAR
[2024-10-21 16:47:03,058] [DEBUG] [paperless.barcodes] PDF has 22 pages
[2024-10-21 16:47:03,059] [DEBUG] [paperless.barcodes] Processing page 0
[2024-10-21 16:47:03,996] [DEBUG] [paperless.barcodes] Image is at /tmp/paperless/tmpkl5o66sg/barcodec5tlam6i/57a875b2-a010-4afb-b8ea-111d8c1a6f1b-01.ppm
[2024-10-21 16:47:04,458] [DEBUG] [paperless.barcodes] Processing page 1
[2024-10-21 16:47:04,944] [DEBUG] [paperless.barcodes] Image is at /tmp/paperless/tmpkl5o66sg/barcodec5tlam6i/004d083a-1db0-4ab5-8c94-e9758c58cd6f-02.ppm
[2024-10-21 16:47:05,226] [DEBUG] [paperless.barcodes] Processing page 2
[2024-10-21 16:47:05,718] [DEBUG] [paperless.barcodes] Image is at /tmp/paperless/tmpkl5o66sg/barcodec5tlam6i/f1796de2-65fa-4e87-9097-a6b47c342481-03.ppm
[2024-10-21 16:47:06,100] [DEBUG] [paperless.barcodes] Processing page 3
[2024-10-21 16:47:06,591] [DEBUG] [paperless.barcodes] Image is at /tmp/paperless/tmpkl5o66sg/barcodec5tlam6i/a9c73d26-a92c-4f1e-9561-9e7f78c80269-04.ppm
[2024-10-21 16:47:06,984] [DEBUG] [paperless.barcodes] Processing page 4
[2024-10-21 16:47:07,473] [DEBUG] [paperless.barcodes] Image is at /tmp/paperless/tmpkl5o66sg/barcodec5tlam6i/7a2f8cff-9c21-48ff-bb7e-57ea3dcf0a94-05.ppm
[2024-10-21 16:47:07,832] [DEBUG] [paperless.barcodes] Processing page 5
[2024-10-21 16:47:08,331] [DEBUG] [paperless.barcodes] Image is at /tmp/paperless/tmpkl5o66sg/barcodec5tlam6i/2a27af77-34d9-445c-bbaa-3b6c0388f3bc-06.ppm
[2024-10-21 16:47:08,784] [DEBUG] [paperless.barcodes] Processing page 6
[2024-10-21 16:47:09,269] [DEBUG] [paperless.barcodes] Image is at /tmp/paperless/tmpkl5o66sg/barcodec5tlam6i/d00ce784-8cae-47e1-a61d-e6e80243cb48-07.ppm
[2024-10-21 16:47:09,609] [DEBUG] [paperless.barcodes] Processing page 7
[2024-10-21 16:47:10,096] [DEBUG] [paperless.barcodes] Image is at /tmp/paperless/tmpkl5o66sg/barcodec5tlam6i/1aae2caa-ccd9-48a6-be53-c9e87198b3a2-08.ppm
[2024-10-21 16:47:10,493] [DEBUG] [paperless.barcodes] Processing page 8
[2024-10-21 16:47:10,977] [DEBUG] [paperless.barcodes] Image is at /tmp/paperless/tmpkl5o66sg/barcodec5tlam6i/7ab8b6de-0d3c-474e-bd67-503b21e96b8d-09.ppm
[2024-10-21 16:47:11,333] [DEBUG] [paperless.barcodes] Processing page 9
[2024-10-21 16:47:11,837] [DEBUG] [paperless.barcodes] Image is at /tmp/paperless/tmpkl5o66sg/barcodec5tlam6i/d188a5de-e1c6-4b69-8783-89d057e2c53d-10.ppm
[2024-10-21 16:47:12,211] [DEBUG] [paperless.barcodes] Processing page 10
[2024-10-21 16:47:12,700] [DEBUG] [paperless.barcodes] Image is at /tmp/paperless/tmpkl5o66sg/barcodec5tlam6i/95d03700-b021-4e7e-ac18-c6812c298651-11.ppm
[2024-10-21 16:47:13,089] [DEBUG] [paperless.barcodes] Processing page 11
[2024-10-21 16:47:13,580] [DEBUG] [paperless.barcodes] Image is at /tmp/paperless/tmpkl5o66sg/barcodec5tlam6i/63fb426a-39e8-40f7-9055-c784b6695c9e-12.ppm
[2024-10-21 16:47:13,937] [DEBUG] [paperless.barcodes] Processing page 12
[2024-10-21 16:47:14,434] [DEBUG] [paperless.barcodes] Image is at /tmp/paperless/tmpkl5o66sg/barcodec5tlam6i/39e170f6-a415-44ca-be71-e5fa9e9867e2-13.ppm
[2024-10-21 16:47:14,862] [DEBUG] [paperless.barcodes] Processing page 13
[2024-10-21 16:47:15,384] [DEBUG] [paperless.barcodes] Image is at /tmp/paperless/tmpkl5o66sg/barcodec5tlam6i/e27a57e1-f87b-4618-9acd-03532860f428-14.ppm
[2024-10-21 16:47:15,735] [DEBUG] [paperless.barcodes] Processing page 14
[2024-10-21 16:47:16,218] [DEBUG] [paperless.barcodes] Image is at /tmp/paperless/tmpkl5o66sg/barcodec5tlam6i/61f97bd3-b783-4b5d-a15a-116879bf7035-15.ppm
[2024-10-21 16:47:16,610] [DEBUG] [paperless.barcodes] Processing page 15
[2024-10-21 16:47:17,092] [DEBUG] [paperless.barcodes] Image is at /tmp/paperless/tmpkl5o66sg/barcodec5tlam6i/ad15f013-4945-45e0-a45a-40ee1fc3f870-16.ppm
[2024-10-21 16:47:17,453] [DEBUG] [paperless.barcodes] Processing page 16
[2024-10-21 16:47:17,952] [DEBUG] [paperless.barcodes] Image is at /tmp/paperless/tmpkl5o66sg/barcodec5tlam6i/b53c8f43-f35a-463f-8770-54aa7074d871-17.ppm
[2024-10-21 16:47:18,431] [DEBUG] [paperless.barcodes] Processing page 17
[2024-10-21 16:47:18,916] [DEBUG] [paperless.barcodes] Image is at /tmp/paperless/tmpkl5o66sg/barcodec5tlam6i/88d06fca-905f-4acc-9e73-9fa4f24d3c95-18.ppm
[2024-10-21 16:47:19,266] [DEBUG] [paperless.barcodes] Processing page 18
[2024-10-21 16:47:19,759] [DEBUG] [paperless.barcodes] Image is at /tmp/paperless/tmpkl5o66sg/barcodec5tlam6i/c3bfb0b9-4012-4970-9ea5-497e72a628d8-19.ppm
[2024-10-21 16:47:20,143] [DEBUG] [paperless.barcodes] Processing page 19
[2024-10-21 16:47:20,633] [DEBUG] [paperless.barcodes] Image is at /tmp/paperless/tmpkl5o66sg/barcodec5tlam6i/bd9a03fa-4a7c-4e0a-aebe-625881bdd895-20.ppm
[2024-10-21 16:47:21,019] [DEBUG] [paperless.barcodes] Processing page 20
[2024-10-21 16:47:21,504] [DEBUG] [paperless.barcodes] Image is at /tmp/paperless/tmpkl5o66sg/barcodec5tlam6i/bfe9bc09-1efe-4882-9d5e-a1ba2964d834-21.ppm
[2024-10-21 16:47:21,845] [DEBUG] [paperless.barcodes] Processing page 21
[2024-10-21 16:47:22,331] [DEBUG] [paperless.barcodes] Image is at /tmp/paperless/tmpkl5o66sg/barcodec5tlam6i/5bdc315e-418e-4fdb-bc02-fcbe378aaf17-22.ppm
[2024-10-21 16:47:22,693] [INFO] [paperless.tasks] BarcodePlugin completed with no message
[2024-10-21 16:47:22,694] [DEBUG] [paperless.tasks] Executing plugin WorkflowTriggerPlugin
[2024-10-21 16:47:22,827] [INFO] [paperless.matching] Document did not match Workflow: Benutzerdefinierte Felder in Rechnungen
[2024-10-21 16:47:22,827] [DEBUG] [paperless.matching] No matching triggers with type 1 found
[2024-10-21 16:47:22,829] [INFO] [paperless.matching] Document did not match Workflow: Steuerrelevant in Quittung
[2024-10-21 16:47:22,830] [DEBUG] [paperless.matching] No matching triggers with type 1 found
[2024-10-21 16:47:22,832] [INFO] [paperless.matching] Document did not match Workflow: Betrag in Unterhaltskosten
[2024-10-21 16:47:22,832] [DEBUG] [paperless.matching] No matching triggers with type 1 found
[2024-10-21 16:47:22,834] [INFO] [paperless.matching] Document did not match Workflow: Helsana Tag Gesundheit und Versicherung zuweisen
[2024-10-21 16:47:22,835] [DEBUG] [paperless.matching] No matching triggers with type 1 found
[2024-10-21 16:47:22,835] [INFO] [paperless.tasks] WorkflowTriggerPlugin completed with:
[2024-10-21 16:47:22,836] [DEBUG] [paperless.tasks] Executing plugin ConsumeTaskPlugin
[2024-10-21 16:47:22,895] [INFO] [paperless.consumer] Consuming Scannen.pdf
[2024-10-21 16:47:22,928] [DEBUG] [paperless.consumer] Detected mime type: application/pdf
[2024-10-21 16:47:22,959] [DEBUG] [paperless.consumer] Parser: RasterisedDocumentParser
[2024-10-21 16:47:22,963] [DEBUG] [paperless.consumer] Parsing Scannen.pdf...
[2024-10-21 16:47:22,974] [INFO] [paperless.parsing.tesseract] pdftotext exited 0
[2024-10-21 16:47:23,760] [DEBUG] [paperless.parsing.tesseract] Calling OCRmyPDF with args: {'input_file': PosixPath('/tmp/paperless/paperless-ngxmoprfx88/Scannen.pdf'), 'output_file': PosixPath('/tmp/paperless/paperless-opgsy1_6/archive.pdf'), 'use_threads': True, 'jobs': 8, 'language': 'deu', 'output_type': 'pdfa', 'progress_bar': False, 'color_conversion_strategy': 'RGB', 'skip_text': True, 'clean': True, 'deskew': True, 'rotate_pages': True, 'rotate_pages_threshold': 12.0, 'sidecar': PosixPath('/tmp/paperless/paperless-opgsy1_6/sidecar.txt')}
[2024-10-21 16:47:26,788] [INFO] [ocrmypdf._pipelines.ocr] Start processing 8 pages concurrently
[2024-10-21 16:47:36,550] [INFO] [ocrmypdf._pipeline] page is facing ⇧, confidence 14.89 - rotation appears correct
[2024-10-21 16:47:36,589] [INFO] [ocrmypdf._pipeline] page is facing ⇧, confidence 11.74 - no change
[2024-10-21 16:47:36,589] [INFO] [ocrmypdf._pipeline] page is facing ⇧, confidence 13.98 - rotation appears correct
[2024-10-21 16:47:36,589] [INFO] [ocrmypdf._pipeline] page is facing ⇧, confidence 11.57 - no change
[2024-10-21 16:47:36,589] [INFO] [ocrmypdf._pipeline] page is facing ⇧, confidence 14.61 - rotation appears correct
[2024-10-21 16:47:36,589] [INFO] [ocrmypdf._pipeline] page is facing ⇧, confidence 12.36 - rotation appears correct
[2024-10-21 16:47:36,589] [INFO] [ocrmypdf._pipeline] page is facing ⇧, confidence 13.00 - rotation appears correct
[2024-10-21 16:47:36,715] [INFO] [ocrmypdf._pipeline] page is facing ⇧, confidence 14.42 - rotation appears correct
[2024-10-21 16:49:20,698] [INFO] [ocrmypdf._pipeline] page is facing ⇧, confidence 13.71 - rotation appears correct
[2024-10-21 16:49:20,897] [INFO] [ocrmypdf._pipeline] page is facing ⇧, confidence 13.23 - rotation appears correct
[2024-10-21 16:49:20,709] [INFO] [ocrmypdf._pipeline] page is facing ⇧, confidence 13.04 - rotation appears correct
[2024-10-21 16:49:20,875] [INFO] [ocrmypdf._pipeline] page is facing ⇧, confidence 12.23 - rotation appears correct
[2024-10-21 16:49:21,216] [INFO] [ocrmypdf._pipeline] page is facing ⇧, confidence 12.37 - rotation appears correct
[2024-10-21 16:49:21,365] [INFO] [ocrmypdf._pipeline] page is facing ⇧, confidence 13.95 - rotation appears correct
[2024-10-21 16:49:21,366] [INFO] [ocrmypdf._pipeline] page is facing ⇧, confidence 13.36 - rotation appears correct
[2024-10-21 16:49:27,815] [INFO] [ocrmypdf._pipeline] page is facing ⇧, confidence 10.41 - no change
[2024-10-21 16:51:15,207] [INFO] [ocrmypdf._pipeline] page is facing ⇧, confidence 12.22 - rotation appears correct
[2024-10-21 16:51:16,335] [INFO] [ocrmypdf._pipeline] page is facing ⇧, confidence 10.75 - no change
[2024-10-21 16:51:16,475] [INFO] [ocrmypdf._pipeline] page is facing ⇧, confidence 12.01 - rotation appears correct
[2024-10-21 16:51:18,092] [INFO] [ocrmypdf._pipeline] page is facing ⇧, confidence 12.82 - rotation appears correct
[2024-10-21 16:51:18,146] [INFO] [ocrmypdf._pipeline] page is facing ⇧, confidence 12.63 - rotation appears correct
[2024-10-21 16:51:18,386] [INFO] [ocrmypdf._pipeline] page is facing ⇧, confidence 12.14 - rotation appears correct
[2024-10-21 16:52:13,078] [INFO] [ocrmypdf._pipelines.ocr] Postprocessing...
[2024-10-21 16:52:46,391] [INFO] [ocrmypdf._pipeline] Image optimization ratio: 1.09 savings: 8.4%
[2024-10-21 16:52:46,392] [INFO] [ocrmypdf._pipeline] Total file size ratio: 1.12 savings: 11.0%
[2024-10-21 16:52:46,451] [INFO] [ocrmypdf._pipelines._common] Output file is a PDF/A-2B (as expected)
[2024-10-21 16:53:06,796] [DEBUG] [paperless.parsing.tesseract] Using text from sidecar file
[2024-10-21 16:53:06,815] [DEBUG] [paperless.consumer] Generating thumbnail for Scannen.pdf...
[2024-10-21 16:53:07,166] [DEBUG] [paperless.parsing] Execute: convert -density 300 -scale 500x5000> -alpha remove -strip -auto-orient -define pdf:use-cropbox=true /tmp/paperless/paperless-opgsy1_6/archive.pdf[0] /tmp/paperless/paperless-opgsy1_6/convert.webp
[2024-10-21 16:53:10,286] [INFO] [paperless.parsing] convert exited 0
[2024-10-21 16:53:16,629] [DEBUG] [paperless.consumer] Saving record to database
[2024-10-21 16:53:16,630] [DEBUG] [paperless.consumer] Creation date from parse_date: 2024-10-01 00:00:00+02:00
[2024-10-21 16:53:18,683] [INFO] [paperless.handlers] Assigning correspondent Helsana Versicherungen AG to 2024-10-01 Scannen
[2024-10-21 16:53:18,813] [INFO] [paperless.handlers] Assigning document type Korrespondenz to 2024-10-01 Helsana Versicherungen AG Scannen
[2024-10-21 16:53:18,917] [INFO] [paperless.handlers] Tagging "2024-10-01 Helsana Versicherungen AG Scannen" with "Versicherung"
[2024-10-21 16:53:19,431] [INFO] [paperless.matching] Document did not match Workflow: Benutzerdefinierte Felder in Rechnungen
[2024-10-21 16:53:19,431] [DEBUG] [paperless.matching] ('Document doc type Korrespondenz does not match Rechnung',)
[2024-10-21 16:53:19,441] [INFO] [paperless.matching] Document did not match Workflow: Steuerrelevant in Quittung
[2024-10-21 16:53:19,442] [DEBUG] [paperless.matching] ('Document doc type Korrespondenz does not match Quittung',)
[2024-10-21 16:53:19,456] [INFO] [paperless.matching] Document did not match Workflow: Betrag in Unterhaltskosten
[2024-10-21 16:53:19,456] [DEBUG] [paperless.matching] ('Document tags <QuerySet [<Tag: Inbox>, <Tag: Versicherung>]> do not include <QuerySet [<Tag: Unterhaltskosten>]>',)
[2024-10-21 16:53:19,466] [INFO] [paperless.matching] Document matched WorkflowTrigger 9 from Workflow: Helsana Tag Gesundheit und Versicherung zuweisen
[2024-10-21 16:53:19,466] [INFO] [paperless.handlers] Applying WorkflowAction 4 from Workflow: Helsana Tag Gesundheit und Versicherung zuweisen
[2024-10-21 16:53:19,473] [INFO] [paperless.handlers] Applying WorkflowAction 5 from Workflow: Helsana Tag Gesundheit und Versicherung zuweisen
[2024-10-21 16:53:20,035] [DEBUG] [paperless.consumer] Deleting file /tmp/paperless/paperless-ngxmoprfx88/Scannen.pdf
[2024-10-21 16:53:20,304] [DEBUG] [paperless.parsing.tesseract] Deleting directory /tmp/paperless/paperless-opgsy1_6
[2024-10-21 16:53:20,313] [INFO] [paperless.consumer] Document 2024-10-01 Helsana Versicherungen AG Scannen consumption finished
[2024-10-21 16:53:20,343] [INFO] [paperless.tasks] ConsumeTaskPlugin completed with: Success. New document id 244 created