Probleme beim mehrseitigen Input via Scanner

Ja, Michael, das wäre auch erwartbar. Aber leider ist die Lösung doch noch ausgeblieben…

Weshalb?

Ich hatte anfangs ja geschrieben, dass es zu diesem Mehrfach-Einlesen kommt, wenn Dokumente über den Scanner an Paperless zur Verfügung gestellt werden.

Und auch, dass es keinerlei Probleme gibt, wenn ich eine (oder mehrere) PDFs manuell per drag and drop direkt in Paperless einfüge.

Dann hatte mich der Vorschlag, den Scanner die eingescannten Dokumente an einem anderen Ort ablegen zu lassen und sie nach Fertigstellung per rsync an Paperless zu übergeben, doch als Lösungsmöglichkeit sehr fasziniert.

Auf das Problem mit dem Funktionieren der internen rsync-Aufgabe gehe ich jetzt mal nicht weiter ein. Wir „umschifften“ es dadurch, dass wir ein ohnehin regelmäßig genutztes rsync-Tool auf der Windows-Ebene mit dieser Arbeit beauftragten.

Dies funktioniert auch tadellos (und nur hierauf bezog sich meine Bemerkung mit »bereits lösen können«)!

Aber:
Der ursprüngliche Fehler mit den redundanten Mehrfachdokumenten in Paperless ist nach wie vor vorhanden.

Bei zwei Tests (ich hatte drei je etwa 140-seitige Abschlussberichte übergeben) führten dann wieder dazu, dass ein jedes Dokument etwa 140mal erzeugt wurde… :face_with_raised_eyebrow:

Eine manuelle Übergabe an Paperless funktioniert dagegen tadellos.

Nun bin ich mit dem Latein am Ende und möchte Euch mit meinem offensichtlich exotischen Problem (erst mal) nicht weiter zur Last fallen.

Peter

Hey Peter,

Alles Gut…
Mich interessiert es nur selbst und @Stefan hat vielleicht noch ne Idee dazu.

Lass uns doch mal von den Zeitraum das Protokoll von Paperless hier zukommen.
Damit sieht man relativ schnell Fehlermeldungen die vielleicht mit dem PDF oder so zusammen hängen.

Wie wurden diese Dokumente denn erstellt ?

@prh

Zur Last fallen, so ein Quatsch. Wir helfen uns gegenseitig und lernen alle dabei.

Wo scannst Du das Dokument hin, dass du es manuell an paperless übergeben kannst?

Sind im consume Ordner mehrere pdf-Dateien bei mehrseitigen Scans? Oder nur eine Datei die nicht korrekt verarbeitet wird?

Moin,

der Scanner schreibt direkt in den Ordner »scaninput« (eingerichtet nach den Vorgaben des Kurses hier).

Ansonsten habe ich manuell eine mehrseitige PDF in den Ordner kopiert. Eine Logdatei stelle ich gleich nochmal online (= paperless.log oder celery.log?)

Peter

Moin,

die PDFs werden entweder direkt durch den Scanner erzeugt oder über die Exportfunktionen verschiedener (meist) Office-Programme.

Der beobachtete Effekt ist jedoch nur abhängig von der Dokumentlänge, nicht von der erzeugenden Quelle.

Eine Logdatei füge ich hier mal an:

[2024-02-27 06:43:14,807] [INFO] [paperless.management.consumer] Using inotify to watch directory for changes: /usr/src/paperless/consume
[2024-02-27 06:43:19,695] [INFO] [paperless.management.consumer] Adding /usr/src/paperless/consume/Erstfassung Abschlussbericht_RR_prh.pdf to the task queue.
[2024-02-27 06:43:19,881] [INFO] [paperless.management.consumer] Using inotify to watch directory for changes: /usr/src/paperless/consume
[2024-02-27 06:43:20,156] [DEBUG] [paperless.tasks] Skipping plugin CollatePlugin
[2024-02-27 06:43:20,156] [DEBUG] [paperless.tasks] Executing plugin BarcodePlugin
[2024-02-27 06:43:20,157] [DEBUG] [paperless.barcodes] Scanning for barcodes using PYZBAR
[2024-02-27 06:43:24,876] [INFO] [paperless.management.consumer] Adding /usr/src/paperless/consume/Erstfassung Abschlussbericht_RR_prh.pdf to the task queue.
[2024-02-27 06:43:25,312] [INFO] [paperless.management.consumer] Using inotify to watch directory for changes: /usr/src/paperless/consume
[2024-02-27 06:43:25,493] [DEBUG] [paperless.tasks] Skipping plugin CollatePlugin
[2024-02-27 06:43:25,493] [DEBUG] [paperless.tasks] Executing plugin BarcodePlugin
[2024-02-27 06:43:25,494] [DEBUG] [paperless.barcodes] Scanning for barcodes using PYZBAR
[2024-02-27 06:43:30,813] [INFO] [paperless.management.consumer] Adding /usr/src/paperless/consume/Erstfassung Abschlussbericht_RR_prh.pdf to the task queue.
[2024-02-27 06:43:30,920] [INFO] [paperless.management.consumer] Using inotify to watch directory for changes: /usr/src/paperless/consume
[2024-02-27 06:43:36,419] [INFO] [paperless.management.consumer] Adding /usr/src/paperless/consume/Erstfassung Abschlussbericht_RR_prh.pdf to the task queue.
[2024-02-27 06:43:36,544] [INFO] [paperless.management.consumer] Using inotify to watch directory for changes: /usr/src/paperless/consume
[2024-02-27 06:43:43,704] [INFO] [paperless.management.consumer] Adding /usr/src/paperless/consume/Erstfassung Abschlussbericht_RR_prh.pdf to the task queue.
[2024-02-27 06:43:43,826] [INFO] [paperless.management.consumer] Using inotify to watch directory for changes: /usr/src/paperless/consume
[2024-02-27 06:43:57,627] [INFO] [paperless.tasks] BarcodePlugin completed with no message
[2024-02-27 06:43:57,735] [DEBUG] [paperless.tasks] Executing plugin WorkflowTriggerPlugin
[2024-02-27 06:43:57,767] [INFO] [paperless.tasks] WorkflowTriggerPlugin completed with no message
[2024-02-27 06:43:58,227] [INFO] [paperless.consumer] Consuming Erstfassung Abschlussbericht_RR_prh.pdf
[2024-02-27 06:43:58,420] [DEBUG] [paperless.consumer] Detected mime type: application/pdf
[2024-02-27 06:43:58,452] [DEBUG] [paperless.consumer] Parser: RasterisedDocumentParser
[2024-02-27 06:43:58,456] [DEBUG] [paperless.consumer] Parsing Erstfassung Abschlussbericht_RR_prh.pdf...
[2024-02-27 06:44:01,158] [INFO] [paperless.management.consumer] Adding /usr/src/paperless/consume/Erstfassung Abschlussbericht_RR_prh.pdf to the task queue.
[2024-02-27 06:44:03,143] [INFO] [paperless.management.consumer] Using inotify to watch directory for changes: /usr/src/paperless/consume
[2024-02-27 06:44:05,411] [INFO] [paperless.tasks] BarcodePlugin completed with no message
[2024-02-27 06:44:05,463] [DEBUG] [paperless.parsing.tesseract] Calling OCRmyPDF with args: {'input_file': PosixPath('/tmp/paperless/paperless-ngxjzb88tsp/Erstfassung Abschlussbericht_RR_prh.pdf'), 'output_file': PosixPath('/tmp/paperless/paperless-6fy2erwt/archive.pdf'), 'use_threads': True, 'jobs': 2, 'language': 'deu', 'output_type': 'pdf', 'progress_bar': False, 'skip_text': True, 'clean': True, 'deskew': True, 'rotate_pages': True, 'rotate_pages_threshold': 12.0, 'sidecar': PosixPath('/tmp/paperless/paperless-6fy2erwt/sidecar.txt'), 'invalidate_digital_signatures': True}
[2024-02-27 06:44:05,514] [DEBUG] [paperless.tasks] Executing plugin WorkflowTriggerPlugin
[2024-02-27 06:44:05,518] [INFO] [paperless.tasks] WorkflowTriggerPlugin completed with no message
[2024-02-27 06:44:05,541] [INFO] [paperless.consumer] Consuming Erstfassung Abschlussbericht_RR_prh.pdf
[2024-02-27 06:44:05,550] [DEBUG] [paperless.consumer] Detected mime type: application/pdf
[2024-02-27 06:44:05,561] [DEBUG] [paperless.consumer] Parser: RasterisedDocumentParser
[2024-02-27 06:44:05,565] [DEBUG] [paperless.consumer] Parsing Erstfassung Abschlussbericht_RR_prh.pdf...
[2024-02-27 06:44:08,255] [DEBUG] [paperless.parsing.tesseract] Calling OCRmyPDF with args: {'input_file': PosixPath('/tmp/paperless/paperless-ngxnx4mqx_2/Erstfassung Abschlussbericht_RR_prh.pdf'), 'output_file': PosixPath('/tmp/paperless/paperless-a3v7um6j/archive.pdf'), 'use_threads': True, 'jobs': 2, 'language': 'deu', 'output_type': 'pdf', 'progress_bar': False, 'skip_text': True, 'clean': True, 'deskew': True, 'rotate_pages': True, 'rotate_pages_threshold': 12.0, 'sidecar': PosixPath('/tmp/paperless/paperless-a3v7um6j/sidecar.txt'), 'invalidate_digital_signatures': True}
[2024-02-27 06:44:11,721] [INFO] [paperless.management.consumer] Adding /usr/src/paperless/consume/Erstfassung Abschlussbericht_RR_prh.pdf to the task queue.
[2024-02-27 06:44:11,910] [INFO] [paperless.management.consumer] Using inotify to watch directory for changes: /usr/src/paperless/consume
[2024-02-27 06:44:16,771] [INFO] [paperless.management.consumer] Adding /usr/src/paperless/consume/Erstfassung Abschlussbericht_RR_prh.pdf to the task queue.
[2024-02-27 06:44:16,900] [INFO] [paperless.management.consumer] Using inotify to watch directory for changes: /usr/src/paperless/consume
[2024-02-27 06:44:21,229] [INFO] [paperless.management.consumer] Adding /usr/src/paperless/consume/Erstfassung Abschlussbericht_RR_prh.pdf to the task queue.
[2024-02-27 06:44:21,324] [INFO] [paperless.management.consumer] Using inotify to watch directory for changes: /usr/src/paperless/consume
[2024-02-27 06:44:21,684] [DEBUG] [paperless.parsing.tesseract] Incomplete sidecar file: discarding.
[2024-02-27 06:44:22,296] [DEBUG] [paperless.parsing.tesseract] Incomplete sidecar file: discarding.
[2024-02-27 06:44:24,116] [DEBUG] [paperless.consumer] Generating thumbnail for Erstfassung Abschlussbericht_RR_prh.pdf...
[2024-02-27 06:44:24,120] [DEBUG] [paperless.parsing] Execute: convert -density 300 -scale 500x5000> -alpha remove -strip -auto-orient -define pdf:use-cropbox=true /tmp/paperless/paperless-6fy2erwt/archive.pdf[0] /tmp/paperless/paperless-6fy2erwt/convert.webp
[2024-02-27 06:44:24,722] [DEBUG] [paperless.consumer] Generating thumbnail for Erstfassung Abschlussbericht_RR_prh.pdf...
[2024-02-27 06:44:24,729] [DEBUG] [paperless.parsing] Execute: convert -density 300 -scale 500x5000> -alpha remove -strip -auto-orient -define pdf:use-cropbox=true /tmp/paperless/paperless-a3v7um6j/archive.pdf[0] /tmp/paperless/paperless-a3v7um6j/convert.webp
[2024-02-27 06:44:26,438] [INFO] [paperless.management.consumer] Adding /usr/src/paperless/consume/Erstfassung Abschlussbericht_RR_prh.pdf to the task queue.
[2024-02-27 06:44:26,531] [INFO] [paperless.management.consumer] Using inotify to watch directory for changes: /usr/src/paperless/consume
[2024-02-27 06:44:28,493] [DEBUG] [paperless.consumer] Saving record to database
[2024-02-27 06:44:28,495] [DEBUG] [paperless.consumer] Creation date from parse_date: 2015-10-01 00:00:00+02:00
[2024-02-27 06:44:28,495] [DEBUG] [paperless.consumer] Saving record to database
[2024-02-27 06:44:28,501] [DEBUG] [paperless.consumer] Creation date from parse_date: 2015-10-01 00:00:00+02:00
[2024-02-27 06:44:30,204] [INFO] [paperless.handlers] Assigning correspondent .TEXTE to 2015-10-01 Erstfassung Abschlussbericht_RR_prh
[2024-02-27 06:44:30,443] [INFO] [paperless.handlers] Assigning document type TEXT to 2015-10-01 .TEXTE Erstfassung Abschlussbericht_RR_prh
[2024-02-27 06:44:30,831] [INFO] [paperless.handlers] Assigning storage path BaS to 2015-10-01 .TEXTE Erstfassung Abschlussbericht_RR_prh
[2024-02-27 06:44:31,564] [DEBUG] [paperless.filehandling] Document has storage_path 1 ({created_year}/{created_month}/BaS/{correspondent}-{title}) set
[2024-02-27 06:44:31,607] [DEBUG] [paperless.filehandling] Document has storage_path 1 ({created_year}/{created_month}/BaS/{correspondent}-{title}) set
[2024-02-27 06:44:31,615] [DEBUG] [paperless.consumer] Deleting file /tmp/paperless/paperless-ngxnx4mqx_2/Erstfassung Abschlussbericht_RR_prh.pdf
[2024-02-27 06:44:31,646] [DEBUG] [paperless.parsing.tesseract] Deleting directory /tmp/paperless/paperless-a3v7um6j
[2024-02-27 06:44:31,649] [INFO] [paperless.consumer] Document 2015-10-01 .TEXTE Erstfassung Abschlussbericht_RR_prh consumption finished
[2024-02-27 06:44:31,652] [ERROR] [paperless.consumer] The following error occurred while storing document Erstfassung Abschlussbericht_RR_prh.pdf after parsing: duplicate key value violates unique constraint "documents_document_checksum_75209391_uniq"
DETAIL:  Key (checksum)=(a239d7bebbcd69632390f73aa4ccefcc) already exists.
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/django/db/backends/utils.py", line 89, in _execute
    return self.cursor.execute(sql, params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
psycopg2.errors.UniqueViolation: duplicate key value violates unique constraint "documents_document_checksum_75209391_uniq"
DETAIL:  Key (checksum)=(a239d7bebbcd69632390f73aa4ccefcc) already exists.


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/asgiref/sync.py", line 349, in main_wrap
    raise exc_info[1]
  File "/usr/src/paperless/src/documents/consumer.py", line 579, in try_consume_file
    document = self._store(text=text, date=date, mime_type=mime_type)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/paperless/src/documents/consumer.py", line 744, in _store
    document = Document.objects.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/db/models/manager.py", line 87, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/db/models/query.py", line 658, in create
    obj.save(force_insert=True, using=self.db)
  File "/usr/local/lib/python3.11/site-packages/django/db/models/base.py", line 814, in save
    self.save_base(
  File "/usr/local/lib/python3.11/site-packages/django/db/models/base.py", line 877, in save_base
    updated = self._save_table(
              ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/db/models/base.py", line 1020, in _save_table
    results = self._do_insert(
              ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/db/models/base.py", line 1061, in _do_insert
    return manager._insert(
           ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/db/models/manager.py", line 87, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/db/models/query.py", line 1805, in _insert
    return query.get_compiler(using=using).execute_sql(returning_fields)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/db/models/sql/compiler.py", line 1822, in execute_sql
    cursor.execute(sql, params)
  File "/usr/local/lib/python3.11/site-packages/django/db/backends/utils.py", line 67, in execute
    return self._execute_with_wrappers(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/db/backends/utils.py", line 80, in _execute_with_wrappers
    return executor(sql, params, many, context)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/db/backends/utils.py", line 84, in _execute
    with self.db.wrap_database_errors:
  File "/usr/local/lib/python3.11/site-packages/django/db/utils.py", line 91, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/usr/local/lib/python3.11/site-packages/django/db/backends/utils.py", line 89, in _execute
    return self.cursor.execute(sql, params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
django.db.utils.IntegrityError: duplicate key value violates unique constraint "documents_document_checksum_75209391_uniq"
DETAIL:  Key (checksum)=(a239d7bebbcd69632390f73aa4ccefcc) already exists.

[2024-02-27 06:44:31,833] [DEBUG] [paperless.parsing.tesseract] Deleting directory /tmp/paperless/paperless-6fy2erwt
[2024-02-27 06:44:31,874] [INFO] [paperless.management.consumer] Using inotify to watch directory for changes: /usr/src/paperless/consume
[2024-02-27 06:44:32,517] [DEBUG] [paperless.tasks] Skipping plugin CollatePlugin
[2024-02-27 06:44:32,518] [DEBUG] [paperless.tasks] Executing plugin BarcodePlugin
[2024-02-27 06:44:32,518] [DEBUG] [paperless.barcodes] Scanning for barcodes using PYZBAR
[2024-02-27 06:44:32,554] [WARNING] [paperless.barcodes] File is likely password protected, not checking for barcodes: Unable to get page count.
I/O Error: Couldn't open file '/usr/src/paperless/consume/Erstfassung Abschlussbericht_RR_prh.pdf': No such file or directory.

[2024-02-27 06:44:32,555] [INFO] [paperless.tasks] BarcodePlugin completed with no message
[2024-02-27 06:44:32,555] [DEBUG] [paperless.tasks] Executing plugin WorkflowTriggerPlugin
[2024-02-27 06:44:32,559] [INFO] [paperless.tasks] WorkflowTriggerPlugin completed with no message
[2024-02-27 06:44:32,571] [ERROR] [paperless.consumer] Cannot consume /usr/src/paperless/consume/Erstfassung Abschlussbericht_RR_prh.pdf: File not found.
[2024-02-27 06:44:32,866] [DEBUG] [paperless.tasks] Skipping plugin CollatePlugin
[2024-02-27 06:44:32,866] [DEBUG] [paperless.tasks] Executing plugin BarcodePlugin
[2024-02-27 06:44:32,867] [DEBUG] [paperless.barcodes] Scanning for barcodes using PYZBAR
[2024-02-27 06:44:32,876] [WARNING] [paperless.barcodes] File is likely password protected, not checking for barcodes: Unable to get page count.
I/O Error: Couldn't open file '/usr/src/paperless/consume/Erstfassung Abschlussbericht_RR_prh.pdf': No such file or directory.

[2024-02-27 06:44:32,876] [INFO] [paperless.tasks] BarcodePlugin completed with no message
[2024-02-27 06:44:32,877] [DEBUG] [paperless.tasks] Executing plugin WorkflowTriggerPlugin
[2024-02-27 06:44:32,880] [INFO] [paperless.tasks] WorkflowTriggerPlugin completed with no message
[2024-02-27 06:44:32,891] [ERROR] [paperless.consumer] Cannot consume /usr/src/paperless/consume/Erstfassung Abschlussbericht_RR_prh.pdf: File not found.
[2024-02-27 06:44:33,316] [DEBUG] [paperless.tasks] Skipping plugin CollatePlugin
[2024-02-27 06:44:33,317] [DEBUG] [paperless.tasks] Executing plugin BarcodePlugin
[2024-02-27 06:44:33,317] [DEBUG] [paperless.barcodes] Scanning for barcodes using PYZBAR
[2024-02-27 06:44:33,327] [WARNING] [paperless.barcodes] File is likely password protected, not checking for barcodes: Unable to get page count.
I/O Error: Couldn't open file '/usr/src/paperless/consume/Erstfassung Abschlussbericht_RR_prh.pdf': No such file or directory.

[2024-02-27 06:44:33,328] [INFO] [paperless.tasks] BarcodePlugin completed with no message
[2024-02-27 06:44:33,329] [DEBUG] [paperless.tasks] Executing plugin WorkflowTriggerPlugin
[2024-02-27 06:44:33,332] [INFO] [paperless.tasks] WorkflowTriggerPlugin completed with no message
[2024-02-27 06:44:33,347] [ERROR] [paperless.consumer] Cannot consume /usr/src/paperless/consume/Erstfassung Abschlussbericht_RR_prh.pdf: File not found.
[2024-02-27 06:44:33,799] [DEBUG] [paperless.tasks] Skipping plugin CollatePlugin
[2024-02-27 06:44:33,799] [DEBUG] [paperless.tasks] Executing plugin BarcodePlugin
[2024-02-27 06:44:33,800] [DEBUG] [paperless.barcodes] Scanning for barcodes using PYZBAR
[2024-02-27 06:44:33,809] [WARNING] [paperless.barcodes] File is likely password protected, not checking for barcodes: Unable to get page count.
I/O Error: Couldn't open file '/usr/src/paperless/consume/Erstfassung Abschlussbericht_RR_prh.pdf': No such file or directory.

[2024-02-27 06:44:33,809] [INFO] [paperless.tasks] BarcodePlugin completed with no message
[2024-02-27 06:44:33,810] [DEBUG] [paperless.tasks] Executing plugin WorkflowTriggerPlugin
[2024-02-27 06:44:33,813] [INFO] [paperless.tasks] WorkflowTriggerPlugin completed with no message
[2024-02-27 06:44:33,824] [ERROR] [paperless.consumer] Cannot consume /usr/src/paperless/consume/Erstfassung Abschlussbericht_RR_prh.pdf: File not found.
[2024-02-27 06:44:34,474] [DEBUG] [paperless.tasks] Skipping plugin CollatePlugin
[2024-02-27 06:44:34,474] [DEBUG] [paperless.tasks] Executing plugin BarcodePlugin
[2024-02-27 06:44:34,475] [DEBUG] [paperless.barcodes] Scanning for barcodes using PYZBAR
[2024-02-27 06:44:34,493] [WARNING] [paperless.barcodes] File is likely password protected, not checking for barcodes: Unable to get page count.
I/O Error: Couldn't open file '/usr/src/paperless/consume/Erstfassung Abschlussbericht_RR_prh.pdf': No such file or directory.

[2024-02-27 06:44:34,493] [INFO] [paperless.tasks] BarcodePlugin completed with no message
[2024-02-27 06:44:34,494] [DEBUG] [paperless.tasks] Executing plugin WorkflowTriggerPlugin
[2024-02-27 06:44:34,497] [INFO] [paperless.tasks] WorkflowTriggerPlugin completed with no message
[2024-02-27 06:44:34,522] [ERROR] [paperless.consumer] Cannot consume /usr/src/paperless/consume/Erstfassung Abschlussbericht_RR_prh.pdf: File not found.
[2024-02-27 06:44:35,796] [DEBUG] [paperless.tasks] Skipping plugin CollatePlugin
[2024-02-27 06:44:35,796] [DEBUG] [paperless.tasks] Executing plugin BarcodePlugin
[2024-02-27 06:44:35,797] [DEBUG] [paperless.barcodes] Scanning for barcodes using PYZBAR
[2024-02-27 06:44:35,807] [WARNING] [paperless.barcodes] File is likely password protected, not checking for barcodes: Unable to get page count.
I/O Error: Couldn't open file '/usr/src/paperless/consume/Erstfassung Abschlussbericht_RR_prh.pdf': No such file or directory.

[2024-02-27 06:44:35,807] [INFO] [paperless.tasks] BarcodePlugin completed with no message
[2024-02-27 06:44:35,808] [DEBUG] [paperless.tasks] Executing plugin WorkflowTriggerPlugin
[2024-02-27 06:44:35,811] [INFO] [paperless.tasks] WorkflowTriggerPlugin completed with no message
[2024-02-27 06:44:35,823] [ERROR] [paperless.consumer] Cannot consume /usr/src/paperless/consume/Erstfassung Abschlussbericht_RR_prh.pdf: File not found.
[2024-02-27 06:44:35,915] [DEBUG] [paperless.tasks] Skipping plugin CollatePlugin
[2024-02-27 06:44:35,915] [DEBUG] [paperless.tasks] Executing plugin BarcodePlugin
[2024-02-27 06:44:35,916] [DEBUG] [paperless.barcodes] Scanning for barcodes using PYZBAR
[2024-02-27 06:44:35,925] [WARNING] [paperless.barcodes] File is likely password protected, not checking for barcodes: Unable to get page count.
I/O Error: Couldn't open file '/usr/src/paperless/consume/Erstfassung Abschlussbericht_RR_prh.pdf': No such file or directory.

[2024-02-27 06:44:35,926] [INFO] [paperless.tasks] BarcodePlugin completed with no message
[2024-02-27 06:44:35,926] [DEBUG] [paperless.tasks] Executing plugin WorkflowTriggerPlugin
[2024-02-27 06:44:35,929] [INFO] [paperless.tasks] WorkflowTriggerPlugin completed with no message
[2024-02-27 06:44:35,941] [ERROR] [paperless.consumer] Cannot consume /usr/src/paperless/consume/Erstfassung Abschlussbericht_RR_prh.pdf: File not found.
[2024-02-27 06:44:36,162] [INFO] [paperless.management.consumer] Using inotify to watch directory for changes: /usr/src/paperless/consume
[2024-02-27 06:44:37,107] [DEBUG] [paperless.tasks] Skipping plugin CollatePlugin
[2024-02-27 06:44:37,107] [DEBUG] [paperless.tasks] Executing plugin BarcodePlugin
[2024-02-27 06:44:37,108] [DEBUG] [paperless.barcodes] Scanning for barcodes using PYZBAR
[2024-02-27 06:44:37,117] [WARNING] [paperless.barcodes] File is likely password protected, not checking for barcodes: Unable to get page count.
I/O Error: Couldn't open file '/usr/src/paperless/consume/Erstfassung Abschlussbericht_RR_prh.pdf': No such file or directory.

[2024-02-27 06:44:37,117] [INFO] [paperless.tasks] BarcodePlugin completed with no message
[2024-02-27 06:44:37,118] [DEBUG] [paperless.tasks] Executing plugin WorkflowTriggerPlugin
[2024-02-27 06:44:37,121] [INFO] [paperless.tasks] WorkflowTriggerPlugin completed with no message
[2024-02-27 06:44:37,132] [ERROR] [paperless.consumer] Cannot consume /usr/src/paperless/consume/Erstfassung Abschlussbericht_RR_prh.pdf: File not found.
[2024-02-27 06:44:40,488] [INFO] [paperless.management.consumer] Using inotify to watch directory for changes: /usr/src/paperless/consume

Peter

Moin Peter,

hier sehe ich nur das er was von Passwortschutz schreibt… sind die Dokumente Passwortgeschützt oder verschlüsselt ?

[WARNING] [paperless.barcodes] File is likely password protected, not checking for barcodes: Unable to get page count.

Hast du auch ein Dokument das nicht Datenschutzt relevant ist um es mal auf einer anderen Installation zu probieren ?
Hätte hier 2 Pi 4’s n Synology DS918+ und n DIY-NAS auf Debian-Basis zum Testen.
Hab damit auch kein Problem wenn ich dir ne Datenschutzerklärung unterschreibe … bin das von meinen Kunden im Militärischen Sektor und co gewöhnt und hab noch keine Baupläne von der A400 oder dem Eurofighter verkauft :slight_smile:

Lieber Michael,

alle getesteten PDFs besaßen keinen Passwortschutz!
Ich habe gerade mal aktuell eine neue PDF mit ~ 225 Seiten erstellt. Die kann ich Dir zur Verfügung stellen - würde sie Dir aber gern per PN zukommen lassen. Geht das hier im Forum?

Peter

Ich habe bisher nur Trick 17 gefunden Stefan eine PN zu senden… Aber ein anderer Weg ist mir nicht bekannt.
Kannst mir die Datei entweder per mail zukommen lassen oder via Dropbox, GoogleDrive oder OneDrive oder andere Transferdienstleister.

Wie groß ist das PDF denn ?
Denn Mails sind meist begrenzt auf 25 MB ca.

Teil mir mit wie es dir lieber ist mit der Übertragung dann kann ich dir die entsprechende Mail-Adresse nennen.

1 „Gefällt mir“

Ich kann Dir gern eine Mail schicken - die PDF selbst ist rund 7 MB groß…

Vielen Dank schon mal!
:slight_smile:
Peter

Also für mich klingt das Ganze tatsächlich nach dem bekannten Problem mancher Scanner, dass sie das PDF in „Etappen“ schreiben.

Kannst du einmal zeigen, wie genau du die Zeitverzögerung eingestellt hast?

Und hast du paperless danach neu gestartet? Ohne das klappt es nicht.

@Stefan Er hat das Problem ja auch beim verschieben des fertigen Dokuments

@prh File erhalten und im TEST

1 „Gefällt mir“

Danke, @anon58924890 , ich habe Dir ein »etwas« größeres PDF geschickt…

Peter

1 „Gefällt mir“

Zwischenstatus Import via SMB Kopieren → einfügen in /scaninput

DS918+ :

[2024-02-27 16:37:35,546] [INFO] [paperless.management.consumer] Adding /usr/src/paperless/consume/BEMdok.pdf to the task queue.

[2024-02-27 16:37:37,181] [DEBUG] [paperless.tasks] Skipping plugin CollatePlugin

[2024-02-27 16:37:37,182] [DEBUG] [paperless.tasks] Executing plugin BarcodePlugin

[2024-02-27 16:37:37,183] [DEBUG] [paperless.barcodes] Scanning for barcodes using PYZBAR

[2024-02-27 16:40:05,037] [DEBUG] [paperless.barcodes] Barcode of type QRCODE found: http://service.bemdok.de/setup.exe

[2024-02-27 16:43:12,190] [DEBUG] [paperless.barcodes] Barcode of type QRCODE found: www.ipeco.de

[2024-02-27 16:43:12,801] [DEBUG] [paperless.barcodes] Barcode of type QRCODE found: www.bemdok.de

[2024-02-27 16:43:40,307] [INFO] [paperless.tasks] BarcodePlugin completed with no message

[2024-02-27 16:43:43,686] [DEBUG] [paperless.tasks] Executing plugin WorkflowTriggerPlugin

[2024-02-27 16:43:49,142] [INFO] [paperless.tasks] WorkflowTriggerPlugin completed with no message

[2024-02-27 16:43:56,727] [INFO] [paperless.consumer] Consuming BEMdok.pdf

[2024-02-27 16:43:57,387] [DEBUG] [paperless.consumer] Detected mime type: application/pdf

[2024-02-27 16:43:58,625] [DEBUG] [paperless.consumer] Parser: RasterisedDocumentParser

[2024-02-27 16:43:58,665] [DEBUG] [paperless.consumer] Parsing BEMdok.pdf...

[2024-02-27 16:44:20,897] [DEBUG] [paperless.parsing.tesseract] Calling OCRmyPDF with args: {'input_file': PosixPath('/tmp/paperless/paperless-ngx15tzdvtb/BEMdok.pdf'), 'output_file': PosixPath('/tmp/paperless/paperless-ee5vi7ci/archive.pdf'), 'use_threads': True, 'jobs': 4, 'language': 'deu', 'output_type': 'pdfa', 'progress_bar': False, 'color_conversion_strategy': 'RGB', 'skip_text': True, 'clean': True, 'deskew': True, 'rotate_pages': True, 'rotate_pages_threshold': 12.0, 'sidecar': PosixPath('/tmp/paperless/paperless-ee5vi7ci/sidecar.txt')}

[2024-02-27 16:45:31,310] [DEBUG] [paperless.parsing.tesseract] Incomplete sidecar file: discarding.

[2024-02-27 16:45:41,849] [DEBUG] [paperless.consumer] Generating thumbnail for BEMdok.pdf...

[2024-02-27 16:45:41,879] [DEBUG] [paperless.parsing] Execute: convert -density 300 -scale 500x5000> -alpha remove -strip -auto-orient -define pdf:use-cropbox=true /tmp/paperless/paperless-ee5vi7ci/archive.pdf[0] /tmp/paperless/paperless-ee5vi7ci/convert.webp

[2024-02-27 16:46:07,141] [DEBUG] [paperless.consumer] Saving record to database

[2024-02-27 16:46:07,142] [DEBUG] [paperless.consumer] Creation date from parse_date: 2024-02-27 00:00:00+01:00

[2024-02-27 16:46:11,987] [INFO] [paperless.handlers] Assigning correspondent VR-Bank to 2024-02-27 BEMdok

[2024-02-27 16:46:12,857] [INFO] [paperless.handlers] Assigning document type Vertrag to 2024-02-27 VR-Bank BEMdok

[2024-02-27 16:46:14,879] [INFO] [paperless.handlers] Assigning storage path Rechnungen to 2024-02-27 VR-Bank BEMdok

[2024-02-27 16:46:17,707] [DEBUG] [paperless.filehandling] Document has storage_path 5 ({correspondent}/{created_year}/{document_type}-{created_month_name_short}) set

[2024-02-27 16:46:17,989] [DEBUG] [paperless.filehandling] Document has storage_path 5 ({correspondent}/{created_year}/{document_type}-{created_month_name_short}) set

[2024-02-27 16:46:18,001] [DEBUG] [paperless.consumer] Deleting file /tmp/paperless/paperless-ngx15tzdvtb/BEMdok.pdf

[2024-02-27 16:46:18,155] [DEBUG] [paperless.parsing.tesseract] Deleting directory /tmp/paperless/paperless-ee5vi7ci

[2024-02-27 16:46:18,158] [INFO] [paperless.consumer] Document 2024-02-27 VR-Bank BEMdok consumption finished

DIY-NAS:

[2024-02-27 16:49:49,893] [INFO] [paperless.management.consumer] Adding /usr/src/paperless/consume/BEMdok.pdf to the task queue.

[2024-02-27 16:49:50,270] [INFO] [paperless.management.consumer] Using inotify to watch directory for changes: /usr/src/paperless/consume

[2024-02-27 16:49:51,486] [DEBUG] [paperless.tasks] Skipping plugin CollatePlugin

[2024-02-27 16:49:51,487] [DEBUG] [paperless.tasks] Executing plugin BarcodePlugin

[2024-02-27 16:49:51,487] [DEBUG] [paperless.barcodes] Scanning for barcodes using PYZBAR

[2024-02-27 16:51:03,728] [DEBUG] [paperless.barcodes] Barcode of type QRCODE found: http://service.bemdok.de/setup.exe

[2024-02-27 16:51:21,698] [DEBUG] [paperless.barcodes] Barcode of type QRCODE found: www.ipeco.de

[2024-02-27 16:51:21,709] [DEBUG] [paperless.barcodes] Barcode of type QRCODE found: www.bemdok.de

[2024-02-27 16:51:23,286] [INFO] [paperless.tasks] BarcodePlugin completed with no message

[2024-02-27 16:51:23,895] [DEBUG] [paperless.tasks] Executing plugin WorkflowTriggerPlugin

[2024-02-27 16:51:23,929] [INFO] [paperless.tasks] WorkflowTriggerPlugin completed with no message

[2024-02-27 16:51:23,953] [INFO] [paperless.consumer] Consuming BEMdok.pdf

[2024-02-27 16:51:23,963] [DEBUG] [paperless.consumer] Detected mime type: application/pdf

[2024-02-27 16:51:23,988] [DEBUG] [paperless.consumer] Parser: RasterisedDocumentParser

[2024-02-27 16:51:23,991] [DEBUG] [paperless.consumer] Parsing BEMdok.pdf...

[2024-02-27 16:51:25,555] [DEBUG] [paperless.parsing.tesseract] Calling OCRmyPDF with args: {'input_file': PosixPath('/tmp/paperless/paperless-ngx7yiucv2u/BEMdok.pdf'), 'output_file': PosixPath('/tmp/paperless/paperless-dey78mqk/archive.pdf'), 'use_threads': True, 'jobs': 4, 'language': 'deu', 'output_type': 'pdfa', 'progress_bar': False, 'color_conversion_strategy': 'RGB', 'skip_text': True, 'clean': True, 'deskew': True, 'rotate_pages': True, 'rotate_pages_threshold': 12.0, 'sidecar': PosixPath('/tmp/paperless/paperless-dey78mqk/sidecar.txt')}

[2024-02-27 16:51:54,583] [DEBUG] [paperless.parsing.tesseract] Incomplete sidecar file: discarding.

[2024-02-27 16:52:00,643] [DEBUG] [paperless.consumer] Generating thumbnail for BEMdok.pdf...

[2024-02-27 16:52:00,646] [DEBUG] [paperless.parsing] Execute: convert -density 300 -scale 500x5000> -alpha remove -strip -auto-orient -define pdf:use-cropbox=true /tmp/paperless/paperless-dey78mqk/archive.pdf[0] /tmp/paperless/paperless-dey78mqk/convert.webp

[2024-02-27 16:52:02,632] [DEBUG] [paperless.classifier] Document classification model does not exist (yet), not performing automatic matching.

[2024-02-27 16:52:02,636] [DEBUG] [paperless.consumer] Saving record to database

[2024-02-27 16:52:02,636] [DEBUG] [paperless.consumer] Creation date from parse_date: 2024-02-27 00:00:00+01:00

[2024-02-27 16:52:03,578] [DEBUG] [paperless.consumer] Deleting file /tmp/paperless/paperless-ngx7yiucv2u/BEMdok.pdf

[2024-02-27 16:52:03,643] [DEBUG] [paperless.parsing.tesseract] Deleting directory /tmp/paperless/paperless-dey78mqk

[2024-02-27 16:52:03,644] [INFO] [paperless.consumer] Document 2024-02-27 BEMdok consumption finished

meine 2 Pi 4B’s rechnen noch fröhlich.

Einer hat aufgegeben mit:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/billiard/pool.py", line 1264, in mark_as_worker_lost
    raise WorkerLostError(
billiard.exceptions.WorkerLostError: Worker exited prematurely: signal 9 (SIGKILL) Job: 66.

Scheint also in der Tat ein Problem des Setups zu sein.

Der zweite Pi war dann irgendwann im Nirvana und nicht mehr zu erreichen… vielleicht auch aufgrund der Temperatur ^^

Man merke… größere Dokumente sind nichts für kleine PI’s :smiley:
Evtl. müsste man die Worker und Cores anpassen damit auch das funktioniert.

@prh Peter welches System nutzt du mit welcher Hardwareausstattung ? ( Wie viel RAM ? )

@anon58924890
Wir setzen hier eine Synology DS 224+ mit 6 GByte RAm ein.
OS: DSM 7.2.1-69057 Update 4
RAM: 6 GByte

Weiß jemand wie es bei der 224+ aussieht an Maximaler RAM und welche CPU verbaut ist.
Am RAM mit 6 GB kanns nicht liegen denn meine 918+ hat 8 GB.

@prh was läuft da noch alles an Diensten darauf das genutzt wird ?

Ich habe seit gestern Nachmittag/Abend einige Videos auf dem NAS abgelegt.
Das kann aber keinen Einfluss haben, da der beschriebene Fehler schon zuvor auftrat.

Danke an Dich, Michael, und die anderen Mitstreiter. Aber ich denke, dass der Lösungsaufwand, der hier betrieben wird, langsam unverhältnismäßig hoch wird!

Ich denke, ich werde damit leben können/müssen. Die grundsätzliche Funktion der Dokumentenverwaltung ist soweit ja gegeben. Und dann werden die zusätzlichen Einleseversuche eben ignoriert…

Peter

Hallo Peter,
Lösungsaufwand kann NIE zu hoch sein in einem Forum, zumal das Problem ja jeden betreffen kann. Um Probleme zu lösen ist das Forum da.
Gruß
Mario

2 „Gefällt mir“