Neu hier und schon eine Frage zum /consume Verzeichnis

Hallo zusammen!

Ich bin gerade neu hier im Forum, weil ich mir paperless-ngx installiert habe.

Jetzt habe ich das erste Problem erschlagen, indem ich die Installation anstatt nach paperless-ngx nach paperless gemacht habe und schon kommt das nächste Problem.

Zum Setup:
Ich habe paperless im Docker-compose auf einem Raspberry 4 laufen.

Über die Oberfläche kann ich Dokumente hinzufügen.
Leider bleiben Dokumente im /consume Verzeichnis unberührt.

Ich habe in der docker-compose.yml unter

volumes: - data:/usr/src/paperless/data - /home/scan/paperless:/usr/src/paperless/media - ./export:/usr/src/paperless/export - /home/scan/paperless/consume:/usr/src/paperless/consume
Das Verzeichnis /home/scan/paperless/consume habe ich auf mein NAS gemountet, wo die Dateien rein laufen.

Per FTP kann ich die Dateien auch im /home/scan/paperless/consume auf dem Raspberry sehen.

Woran kann es liegen, dass diese Dateien von paperless anscheinend nicht gesehen werden?

Vielen Dank ins Forum :smiley:
Bernd

Ohne paperless.log Auszüge kann dir das keiner genau sagen.

Ich vermute mal Userrechte.

Wieso so kompliziert per FTP und so ?

SMB hab ich via Webmin konfiguriert da mir das zu mühselig ist per config und ich nur die Debian Minimalinstallation laufen habe.

Ich hab hier 2 Pi 4B’s laufen direkt mit Paperless-NGX installiert aber den Docker Ordner ausgelagert nach /docker…

Hier ist mein paperless.log

[2024-04-12 16:37:57,416] [INFO] [paperless.management.consumer] Using inotify to watch directory for changes: /usr/src/paperless/consume
[2024-04-12 17:05:00,220] [INFO] [paperless.tasks] No automatic matching items, not training
:
:
[2024-04-14 00:05:00,250] [INFO] [paperless.tasks] No automatic matching items, not training
[2024-04-14 00:30:00,179] [WARNING] [paperless.sanity_checker] Orphaned file in media dir: /usr/src/paperless/media/docker-compose.env
[2024-04-14 00:30:00,180] [WARNING] [paperless.sanity_checker] Orphaned file in media dir: /usr/src/paperless/media/.env
[2024-04-14 00:30:00,181] [WARNING] [paperless.sanity_checker] Orphaned file in media dir: /usr/src/paperless/media/docker-compose.yml
[2024-04-14 01:05:00,234] [INFO] [paperless.tasks] No automatic matching items, not training
:
:
[2024-04-14 10:05:00,234] [INFO] [paperless.tasks] No automatic matching items, not training
[2024-04-14 10:07:45,467] [DEBUG] [paperless.classifier] Document classification model does not exist (yet), not performing automatic matching.
[2024-04-14 10:07:45,468] [DEBUG] [paperless.classifier] Document classification model does not exist (yet), not performing automatic matching.
[2024-04-14 11:05:00,232] [INFO] [paperless.tasks] No automatic matching items, not training
:
:
[2024-04-16 10:05:00,222] [INFO] [paperless.tasks] No automatic matching items, not training
[2024-04-16 10:10:22,020] [DEBUG] [paperless.classifier] Document classification model does not exist (yet), not performing automatic matching.
:
:
[2024-04-16 10:35:07,480] [DEBUG] [paperless.classifier] Document classification model does not exist (yet), not performing automatic matching.
[2024-04-16 11:05:00,229] [INFO] [paperless.tasks] No automatic matching items, not training
:
[2024-04-16 13:05:00,232] [INFO] [paperless.tasks] No automatic matching items, not training
[2024-04-16 13:18:00,749] [DEBUG] [paperless.tasks] Skipping plugin CollatePlugin
[2024-04-16 13:18:00,750] [DEBUG] [paperless.tasks] Skipping plugin BarcodePlugin
[2024-04-16 13:18:00,751] [DEBUG] [paperless.tasks] Executing plugin WorkflowTriggerPlugin
[2024-04-16 13:18:00,759] [INFO] [paperless.tasks] WorkflowTriggerPlugin completed with: 
[2024-04-16 13:18:00,784] [INFO] [paperless.consumer] Consuming TESTDATEI_PER_BROWSER.pdf
[2024-04-16 13:18:00,789] [DEBUG] [paperless.consumer] Detected mime type: application/pdf
[2024-04-16 13:18:00,811] [DEBUG] [paperless.consumer] Parser: RasterisedDocumentParser
[2024-04-16 13:18:00,820] [DEBUG] [paperless.consumer] Parsing TESTDATEI_PER_BROWSER.pdf...
[2024-04-16 13:18:01,584] [INFO] [paperless.parsing.tesseract] pdftotext exited 0
[2024-04-16 13:18:14,214] [DEBUG] [paperless.parsing.tesseract] Calling OCRmyPDF with args: {'input_file': PosixPath('/tmp/paperless/paperless-ngx0qt6kuum/TESTDATEI_PER_BROWSER.pdf'), 'output_file': PosixPath('/tmp/paperless/paperless-1wk85bov/archive.pdf'), 'use_threads': True, 'jobs': 4, 'language': 'deu', 'output_type': 'pdfa', 'progress_bar': False, 'color_conversion_strategy': 'RGB', 'skip_text': True, 'clean': True, 'deskew': True, 'rotate_pages': True, 'rotate_pages_threshold': 12.0, 'sidecar': PosixPath('/tmp/paperless/paperless-1wk85bov/sidecar.txt')}
[2024-04-16 13:18:16,287] [INFO] [ocrmypdf._pipelines.ocr] Start processing 4 pages concurrently
[2024-04-16 13:18:16,291] [INFO] [ocrmypdf._pipeline] skipping all processing on this page
:
:
2024-04-16 13:18:16,294] [INFO] [ocrmypdf._pipeline] skipping all processing on this page
[2024-04-16 13:18:16,360] [INFO] [ocrmypdf._pipelines.ocr] Postprocessing...
[2024-04-16 13:18:17,159] [ERROR] [ocrmypdf.optimize] xref 60: While extracting this image, an error occurred
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/ocrmypdf/optimize.py", line 323, in extract_images
    result = extract_fn(
             ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/ocrmypdf/optimize.py", line 162, in extract_image_jbig2
    ext = pim.extract_to(stream=f)
          ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pikepdf/models/image.py", line 709, in extract_to
    return self._extract_to_stream(stream=stream)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pikepdf/models/image.py", line 649, in _extract_to_stream
    direct_extraction = self._extract_direct(stream=stream)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pikepdf/models/image.py", line 541, in _extract_direct
    stream.write(self._generate_ccitt_header(data, icc=icc))
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pikepdf/models/image.py", line 776, in _generate_ccitt_header
    decode = self._decode_array
             ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pikepdf/models/image.py", line 181, in _decode_array
    raise NotImplementedError(
NotImplementedError: Don't how to retrieve default /Decode array for image<pikepdf.PdfImage image mode=? size=496x154 at 0x7f95a376d0>
[2024-04-16 13:18:17,180] [ERROR] [ocrmypdf.optimize] xref 61: While extracting this image, an error occurred
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/ocrmypdf/optimize.py", line 323, in extract_images
    result = extract_fn(
             ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/ocrmypdf/optimize.py", line 162, in extract_image_jbig2
    ext = pim.extract_to(stream=f)
          ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pikepdf/models/image.py", line 709, in extract_to
    return self._extract_to_stream(stream=stream)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pikepdf/models/image.py", line 649, in _extract_to_stream
    direct_extraction = self._extract_direct(stream=stream)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pikepdf/models/image.py", line 541, in _extract_direct
    stream.write(self._generate_ccitt_header(data, icc=icc))
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pikepdf/models/image.py", line 776, in _generate_ccitt_header
    decode = self._decode_array
             ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pikepdf/models/image.py", line 181, in _decode_array
    raise NotImplementedError(
NotImplementedError: Don't how to retrieve default /Decode array for image<pikepdf.PdfImage image mode=? size=496x154 at 0x7f95a36c50>
[2024-04-16 13:18:17,250] [INFO] [ocrmypdf._pipeline] Image optimization ratio: 1.00 savings: 0.2%
[2024-04-16 13:18:17,251] [INFO] [ocrmypdf._pipeline] Total file size ratio: 5.80 savings: 82.7%
[2024-04-16 13:18:17,263] [INFO] [ocrmypdf._pipelines._common] Output file is a PDF/A-2B (as expected)
[2024-04-16 13:18:17,359] [DEBUG] [paperless.parsing.tesseract] Incomplete sidecar file: discarding.
[2024-04-16 13:18:17,646] [INFO] [paperless.parsing.tesseract] pdftotext exited 0
[2024-04-16 13:18:17,690] [DEBUG] [paperless.consumer] Generating thumbnail for TESTDATEI_PER_BROWSER.pdf...
[2024-04-16 13:18:17,699] [DEBUG] [paperless.parsing] Execute: convert -density 300 -scale 500x5000> -alpha remove -strip -auto-orient -define pdf:use-cropbox=true /tmp/paperless/paperless-1wk85bov/archive.pdf[0] /tmp/paperless/paperless-1wk85bov/convert.webp
[2024-04-16 13:18:22,577] [INFO] [paperless.parsing] convert exited 0
[2024-04-16 13:18:24,269] [DEBUG] [paperless.classifier] Document classification model does not exist (yet), not performing automatic matching.
[2024-04-16 13:18:24,277] [DEBUG] [paperless.consumer] Saving record to database
[2024-04-16 13:18:24,278] [DEBUG] [paperless.consumer] Creation date from parse_date: 2023-12-29 00:00:00+01:00
[2024-04-16 13:18:24,880] [DEBUG] [paperless.consumer] Deleting file /tmp/paperless/paperless-ngx0qt6kuum/TESTDATEI_PER_BROWSER.pdf
[2024-04-16 13:18:24,922] [DEBUG] [paperless.parsing.tesseract] Deleting directory /tmp/paperless/paperless-1wk85bov
[2024-04-16 13:18:24,924] [INFO] [paperless.consumer] Document 2023-12-29 TESTDATEI_PER_BROWSER consumption finished
[2024-04-16 13:19:09,389] [DEBUG] [paperless.tasks] Skipping plugin CollatePlugin
[2024-04-16 13:19:09,390] [DEBUG] [paperless.tasks] Skipping plugin BarcodePlugin
[2024-04-16 13:19:09,391] [DEBUG] [paperless.tasks] Executing plugin WorkflowTriggerPlugin
[2024-04-16 13:19:09,396] [INFO] [paperless.tasks] WorkflowTriggerPlugin completed with: 
[2024-04-16 13:19:09,419] [INFO] [paperless.consumer] Consuming TESTDATEI2_PER_BROWSER.pdf
[2024-04-16 13:19:09,424] [DEBUG] [paperless.consumer] Detected mime type: application/pdf
[2024-04-16 13:19:09,444] [DEBUG] [paperless.consumer] Parser: RasterisedDocumentParser
[2024-04-16 13:19:09,453] [DEBUG] [paperless.consumer] Parsing TESTDATEI2_PER_BROWSER.pdf...
[2024-04-16 13:19:09,484] [INFO] [paperless.parsing.tesseract] pdftotext exited 0
[2024-04-16 13:19:10,346] [DEBUG] [paperless.parsing.tesseract] Calling OCRmyPDF with args: {'input_file': PosixPath('/tmp/paperless/paperless-ngxrsaj4vvt/TESTDATEI2_PER_BROWSER.pdf'), 'output_file': PosixPath('/tmp/paperless/paperless-bh9l41ec/archive.pdf'), 'use_threads': True, 'jobs': 4, 'language': 'deu', 'output_type': 'pdfa', 'progress_bar': False, 'color_conversion_strategy': 'RGB', 'skip_text': True, 'clean': True, 'deskew': True, 'rotate_pages': True, 'rotate_pages_threshold': 12.0, 'sidecar': PosixPath('/tmp/paperless/paperless-bh9l41ec/sidecar.txt')}
[2024-04-16 13:19:10,711] [INFO] [ocrmypdf._pipelines.ocr] Start processing 2 pages concurrently
[2024-04-16 13:19:10,715] [INFO] [ocrmypdf._pipeline] skipping all processing on this page
[2024-04-16 13:19:10,716] [INFO] [ocrmypdf._pipeline] skipping all processing on this page
[2024-04-16 13:19:10,735] [INFO] [ocrmypdf._pipelines.ocr] Postprocessing...
[2024-04-16 13:19:11,132] [WARNING] [ocrmypdf._metadata] Some input metadata could not be copied because it is not permitted in PDF/A. You may wish to examine the output PDF's XMP metadata.
[2024-04-16 13:19:11,261] [INFO] [ocrmypdf._pipeline] Image optimization ratio: 1.02 savings: 2.3%
[2024-04-16 13:19:11,262] [INFO] [ocrmypdf._pipeline] Total file size ratio: 1.29 savings: 22.5%
[2024-04-16 13:19:11,271] [INFO] [ocrmypdf._pipelines._common] Output file is a PDF/A-2B (as expected)
[2024-04-16 13:19:11,328] [DEBUG] [paperless.parsing.tesseract] Incomplete sidecar file: discarding.
[2024-04-16 13:19:11,456] [INFO] [paperless.parsing.tesseract] pdftotext exited 0
[2024-04-16 13:19:11,461] [DEBUG] [paperless.consumer] Generating thumbnail for TESTDATEI2_PER_BROWSER.pdf...
[2024-04-16 13:19:11,470] [DEBUG] [paperless.parsing] Execute: convert -density 300 -scale 500x5000> -alpha remove -strip -auto-orient -define pdf:use-cropbox=true /tmp/paperless/paperless-bh9l41ec/archive.pdf[0] /tmp/paperless/paperless-bh9l41ec/convert.webp
[2024-04-16 13:19:14,122] [INFO] [paperless.parsing] convert exited 0
[2024-04-16 13:19:15,643] [DEBUG] [paperless.classifier] Document classification model does not exist (yet), not performing automatic matching.
[2024-04-16 13:19:15,652] [DEBUG] [paperless.consumer] Saving record to database
[2024-04-16 13:19:15,653] [DEBUG] [paperless.consumer] Creation date from parse_date: 2024-04-02 00:00:00+02:00
[2024-04-16 13:19:15,917] [DEBUG] [paperless.consumer] Deleting file /tmp/paperless/paperless-ngxrsaj4vvt/TESTDATEI2_PER_BROWSER.pdf
[2024-04-16 13:19:15,956] [DEBUG] [paperless.parsing.tesseract] Deleting directory /tmp/paperless/paperless-bh9l41ec
[2024-04-16 13:19:15,958] [INFO] [paperless.consumer] Document 2024-04-02 TESTDATEI2_PER_BROWSER consumption finished
[2024-04-16 14:05:00,237] [INFO] [paperless.tasks] No automatic matching items, not training
[2024-04-16 15:01:07,970] [DEBUG] [paperless.classifier] Document classification model does not exist (yet), not performing automatic matching.
[2024-04-16 15:01:07,970] [DEBUG] [paperless.classifier] Document classification model does not exist (yet), not performing automatic matching.
[2024-04-16 15:05:00,233] [INFO] [paperless.tasks] No automatic matching items, not training
:
:
[2024-04-19 12:05:00,171] [INFO] [paperless.tasks] No automatic matching items, not training

Der user, mit dem paperless läuft heist „scan“

scan@paperless:~ $  ls -la /home/scan/paperless/consume/
total 832
drwxrwxrwx 2 scan scan        0 Apr 16 15:02 .
drwxr-xr-x 5 scan docker   4096 Apr 16 13:18 ..
-rwxrwxrwx 1 scan scan   272757 Apr 16 10:27 TESTDATEI_PER_CONSUME.pdf
-rwxrwxrwx 1 scan scan   279242 Apr 16 15:02 TESTDATEI2_PER_CONSUME.pdf

Das mit Webmin verstehe ich nicht :woozy_face: Ist aber, glaube ich, nicht das Problem…

hier dürfte schon mal der Pfad nicht stimmen.
Ausserdem sollten die files in /paperless-ngx/config residieren und die .env kannst vmtl. löschen.

poste bitte mal folgende ausgaben sowie die docker-compose files .env und yml in codeboxen
„sudo id scan“ diese ID und GID müssen in deiner ENV stehen

Dann muss man weiter gucken.

Auf welcher Basis hast du Paperless überhaupt installiert ?
Stefans Config im Webshop ?

Was meinst du damit? und wenn ich die .env lösche und das falsch war? Stört die evtl.?

scan@paperless:~ $  sudo id scan
uid=1000(scan) gid=1000(scan) groups=1000(scan),4(adm),20(dialout),24(cdrom),27(sudo),29(audio),44(video),46(plugdev),60(games),100(users),102(input),105(render),106(netdev),995(spi),994(i2c),993(gpio),115(lpadmin),991(docker)

.env:

COMPOSE_PROJECT_NAME=paperless

docker-compose.env:

USERMAP_GID=991
PAPERLESS_TIME_ZONE=Europe/Berlin
PAPERLESS_OCR_LANGUAGE=deu
PAPERLESS_SECRET_KEY='GANZ GEHEIM'

docker-compose.yml:

# Docker Compose file for running paperless from the docker container registry.
# This file contains everything paperless needs to run.
# Paperless supports amd64, arm and arm64 hardware.
#
# All compose files of paperless configure paperless in the following way:
#
# - Paperless is (re)started on system boot, if it was running before shutdown.
# - Docker volumes for storing data are managed by Docker.
# - Folders for importing and exporting files are created in the same directory
#   as this file and mounted to the correct folders inside the container.
# - Paperless listens on port 8000.
#
# In addition to that, this Docker Compose file adds the following optional
# configurations:
#
# - Instead of SQLite (default), PostgreSQL is used as the database server.
# - Apache Tika and Gotenberg servers are started with paperless and paperless
#   is configured to use these services. These provide support for consuming
#   Office documents (Word, Excel, Power Point and their LibreOffice counter-
#   parts.
#
# To install and update paperless with this file, do the following:
#
# - Copy this file as 'docker-compose.yml' and the files 'docker-compose.env'
#   and '.env' into a folder.
# - Run 'docker compose pull'.
# - Run 'docker compose run --rm webserver createsuperuser' to create a user.
# - Run 'docker compose up -d'.
#
# For more extensive installation and update instructions, refer to the
# documentation.

version: "3.4"
services:
  broker:
    image: docker.io/library/redis:7
    restart: unless-stopped
    volumes:
      - redisdata:/data

  db:
    image: docker.io/library/postgres:15
    restart: unless-stopped
    volumes:
      - pgdata:/var/lib/postgresql/data
    environment:
      POSTGRES_DB: paperless
      POSTGRES_USER: paperless
      POSTGRES_PASSWORD: paperless

  webserver:
    image: ghcr.io/paperless-ngx/paperless-ngx:latest
    restart: unless-stopped
    depends_on:
      - db
      - broker
      - gotenberg
      - tika
    ports:
      - "8000:8000"
    volumes:
      - data:/usr/src/paperless/data
      - /home/scan/paperless:/usr/src/paperless/media
      - ./export:/usr/src/paperless/export
      - /home/scan/paperless/consume:/usr/src/paperless/consume
    env_file: docker-compose.env
    environment:
      PAPERLESS_REDIS: redis://broker:6379
      PAPERLESS_DBHOST: db
      PAPERLESS_TIKA_ENABLED: 1
      PAPERLESS_TIKA_GOTENBERG_ENDPOINT: http://gotenberg:3000
      PAPERLESS_TIKA_ENDPOINT: http://tika:9998

  gotenberg:
    image: docker.io/gotenberg/gotenberg:7.10
    restart: unless-stopped

    # The gotenberg chromium route is used to convert .eml files. We do not
    # want to allow external content like tracking pixels or even javascript.
    command:
      - "gotenberg"
      - "--chromium-disable-javascript=true"
      - "--chromium-allow-list=file:///tmp/.*"

  tika:
    image: ghcr.io/paperless-ngx/tika:latest
    restart: unless-stopped

volumes:
  data:
  pgdata:
  redisdata:

Also meine zwei files sehen selbst auf meinen zwei Pi’s anders aus.
Hier fehlen ganz viele Dinge und wundert mich nicht das es nicht vollständig funktioniert.

Diese fast leere env erfüllt eigentlich keinen Zweck und kann weg… Ausser es ist Kunst :wink:

Auf welcher Basis hast du die Installation aufm PI durchgeführt ?

Deine UID und GID müssen in der ENV stehen, bei dir steht nur die GID und auch noch ne andere Gruppe.

Ich kann dir nur empfehlen dir wenigstens Stefans Raspberry Config zu holen für den kleinen Taler oder die Paperless Masterclass.

Diese config funktioniert ohne Probleme und man muss sich eigentlich wegen nichts mehr ärgern.

Danke für den Tipp, das werde ich wohl mal versuchen. Ich dachte, dass das Consume Verzeichnis, wenn etwas dort rein kommt direkt verarbeitet wird. Aber wahrscheinlich fehlt da noch Wissen zum docker container.