Dokument werden nicht getrennt

Hallo, ich habe ein PDF-Dokument mit mehreren Dokumenten mit ASN-Labels (von digitalisierung-mit-kopf) in Paperless-ngx eingelesen. Laut Protokoll werden die ASN-Nummern auch erkannt. Jedoch wird nur ein Dokument in mit allen Seiten erstellt.
Hier die Einstellung in der env-Datei:
PAPERLESS_OCR_MODE=skip

Barcodes

PAPERLESS_CONSUMER_BARCODE_SCANNER=ZXING
PAPERLESS_CONSUMER_ENABLE_ASN_BARCODE=true
PAPERLESS_BARCODE_SPLIT_ISOLATED=false
PAPERLESS_BARCODE_SPLIT_PATTERNS=ASN\d{5}
PAPERLESS_CONSUMER_POLLING=5

Hier noch ein LOG:
[2025-03-20 16:37:16,310] [INFO] [paperless.management.consumer] Adding /usr/src/paperless/consume/Scannen_neu.pdf to the task queue.

[2025-03-20 16:37:17,449] [DEBUG] [paperless.tasks] Skipping plugin CollatePlugin

[2025-03-20 16:37:17,451] [DEBUG] [paperless.tasks] Executing plugin BarcodePlugin

[2025-03-20 16:37:17,453] [DEBUG] [paperless.barcodes] Scanning for barcodes using ZXING

[2025-03-20 16:37:17,464] [DEBUG] [paperless.barcodes] PDF has 2 pages

[2025-03-20 16:37:17,465] [DEBUG] [paperless.barcodes] Processing page 0

[2025-03-20 16:37:22,302] [DEBUG] [paperless.barcodes] Image is at /tmp/paperless/tmpygm94v4y/barcodee007z3xy/f6b17671-2896-4818-bbc7-9fd1ad5d9b1c-1.ppm

[2025-03-20 16:37:23,993] [DEBUG] [paperless.barcodes] Barcode of type BarcodeFormat.QRCode found: ASN02700

[2025-03-20 16:37:24,124] [DEBUG] [paperless.barcodes] Processing page 1

[2025-03-20 16:37:29,397] [DEBUG] [paperless.barcodes] Image is at /tmp/paperless/tmpygm94v4y/barcodee007z3xy/d5abc90d-1e41-479f-8cd4-016521596dcd-2.ppm

[2025-03-20 16:37:31,002] [DEBUG] [paperless.barcodes] Barcode of type BarcodeFormat.QRCode found: ASN02699

[2025-03-20 16:37:31,020] [DEBUG] [paperless.barcodes] Found ASN Barcode: ASN02700

[2025-03-20 16:37:31,021] [INFO] [paperless.barcodes] Found ASN in barcode: 2700

[2025-03-20 16:37:31,023] [INFO] [paperless.tasks] BarcodePlugin completed with no message

[2025-03-20 16:37:31,043] [DEBUG] [paperless.tasks] Executing plugin WorkflowTriggerPlugin

[2025-03-20 16:37:31,067] [INFO] [paperless.tasks] WorkflowTriggerPlugin completed with:

[2025-03-20 16:37:31,070] [DEBUG] [paperless.tasks] Executing plugin ConsumeTaskPlugin

[2025-03-20 16:37:31,284] [INFO] [paperless.consumer] Consuming Scannen_neu.pdf

[2025-03-20 16:37:31,324] [DEBUG] [paperless.consumer] Detected mime type: application/pdf

[2025-03-20 16:37:31,373] [DEBUG] [paperless.consumer] Parser: RasterisedDocumentParser

[2025-03-20 16:37:31,387] [DEBUG] [paperless.consumer] Parsing Scannen_neu.pdf…

[2025-03-20 16:37:31,452] [INFO] [paperless.parsing.tesseract] pdftotext exited 0

[2025-03-20 16:37:32,949] [DEBUG] [paperless.parsing.tesseract] Calling OCRmyPDF with args: {‚input_file‘: PosixPath(‚/tmp/paperless/paperless-ngxwttyuj2g/Scannen_neu.pdf‘), ‚output_file‘: PosixPath(‚/tmp/paperless/paperless-bbu76kuo/archive.pdf‘), ‚use_threads‘: True, ‚jobs‘: 4, ‚language‘: ‚deu‘, ‚output_type‘: ‚pdfa‘, ‚progress_bar‘: False, ‚color_conversion_strategy‘: ‚RGB‘, ‚skip_text‘: True, ‚clean‘: True, ‚deskew‘: True, ‚rotate_pages‘: True, ‚rotate_pages_threshold‘: 12.0, ‚sidecar‘: PosixPath(‚/tmp/paperless/paperless-bbu76kuo/sidecar.txt‘)}

[2025-03-20 16:37:38,500] [INFO] [ocrmypdf._pipelines.ocr] Start processing 2 pages concurrently

[2025-03-20 16:37:49,675] [INFO] [ocrmypdf._pipeline] page is facing ⇧, confidence 23.22 - rotation appears correct

[2025-03-20 16:37:49,924] [INFO] [ocrmypdf._pipeline] page is facing ⇧, confidence 21.74 - rotation appears correct

[2025-03-20 16:39:07,934] [INFO] [ocrmypdf._pipelines.ocr] Postprocessing…

[2025-03-20 16:39:16,678] [INFO] [ocrmypdf._pipeline] Image optimization ratio: 1.48 savings: 32.3%

[2025-03-20 16:39:16,680] [INFO] [ocrmypdf._pipeline] Total file size ratio: 2.21 savings: 54.7%

[2025-03-20 16:39:16,694] [INFO] [ocrmypdf._pipelines._common] Output file is a PDF/A-2B (as expected)

[2025-03-20 16:39:19,071] [DEBUG] [paperless.parsing.tesseract] Using text from sidecar file

[2025-03-20 16:39:19,073] [DEBUG] [paperless.consumer] Generating thumbnail for Scannen_neu.pdf…

[2025-03-20 16:39:19,132] [DEBUG] [paperless.parsing] Execute: convert -density 300 -scale 500x5000> -alpha remove -strip -auto-orient -define pdf:use-cropbox=true /tmp/paperless/paperless-bbu76kuo/archive.pdf[0] /tmp/paperless/paperless-bbu76kuo/convert.webp

[2025-03-20 16:39:24,582] [INFO] [paperless.parsing] convert exited 0

[2025-03-20 16:39:29,615] [DEBUG] [paperless.consumer] Saving record to database

[2025-03-20 16:39:29,617] [DEBUG] [paperless.consumer] Creation date from parse_date: 2025-01-22 00:00:00+01:00

[2025-03-20 16:39:34,896] [INFO] [paperless.handlers] Assigning correspondent Hovawart BG-Holstein e.V. to 2025-01-22 Scannen_neu

[2025-03-20 16:39:34,955] [INFO] [paperless.handlers] Assigning document type Rechnung to 2025-01-22 Hovawart BG-Holstein e.V. Scannen_neu

[2025-03-20 16:39:36,382] [DEBUG] [paperless.consumer] Deleting file /tmp/paperless/paperless-ngxwttyuj2g/Scannen_neu.pdf

[2025-03-20 16:39:36,461] [DEBUG] [paperless.parsing.tesseract] Deleting directory /tmp/paperless/paperless-bbu76kuo

[2025-03-20 16:39:36,472] [INFO] [paperless.consumer] Document 2025-01-22 Hovawart BG-Holstein e.V. Scannen_neu consumption finished

[2025-03-20 16:39:36,491] [INFO] [paperless.tasks] ConsumeTaskPlugin completed with: Success. New document id 68 created

Mir ging es glaub genauso. Dir fehlt vermutlich dieser Parameter, der zusätzlich zu dem ASN Parameter notwendig ist, damit das getrennt wird:

PAPERLESS_CONSUMER_ENABLE_BARCODES=true

Probier mal ob sich was ändert, wenn Du das noch hinzufügst. Ich hab bei mir auch nur diese drei Zeilen drin, das reicht bei mir:

PAPERLESS_CONSUMER_BARCODE_SCANNER=ZXING
PAPERLESS_CONSUMER_ENABLE_ASN_BARCODE=true
PAPERLESS_CONSUMER_ENABLE_BARCODES=true

Hallo shakebox,
super, danke.
Nachdem ich noch „PAPERLESS_CONSUMER_ENABLE_BARCODES=true“ hinzugefügt hatte, war das Problem gelöst.

Dieses Thema wurde automatisch 2 Tage nach der letzten Antwort geschlossen. Es sind keine neuen Antworten mehr erlaubt.