Hier ist mein paperless.log
[2024-04-12 16:37:57,416] [INFO] [paperless.management.consumer] Using inotify to watch directory for changes: /usr/src/paperless/consume
[2024-04-12 17:05:00,220] [INFO] [paperless.tasks] No automatic matching items, not training
:
:
[2024-04-14 00:05:00,250] [INFO] [paperless.tasks] No automatic matching items, not training
[2024-04-14 00:30:00,179] [WARNING] [paperless.sanity_checker] Orphaned file in media dir: /usr/src/paperless/media/docker-compose.env
[2024-04-14 00:30:00,180] [WARNING] [paperless.sanity_checker] Orphaned file in media dir: /usr/src/paperless/media/.env
[2024-04-14 00:30:00,181] [WARNING] [paperless.sanity_checker] Orphaned file in media dir: /usr/src/paperless/media/docker-compose.yml
[2024-04-14 01:05:00,234] [INFO] [paperless.tasks] No automatic matching items, not training
:
:
[2024-04-14 10:05:00,234] [INFO] [paperless.tasks] No automatic matching items, not training
[2024-04-14 10:07:45,467] [DEBUG] [paperless.classifier] Document classification model does not exist (yet), not performing automatic matching.
[2024-04-14 10:07:45,468] [DEBUG] [paperless.classifier] Document classification model does not exist (yet), not performing automatic matching.
[2024-04-14 11:05:00,232] [INFO] [paperless.tasks] No automatic matching items, not training
:
:
[2024-04-16 10:05:00,222] [INFO] [paperless.tasks] No automatic matching items, not training
[2024-04-16 10:10:22,020] [DEBUG] [paperless.classifier] Document classification model does not exist (yet), not performing automatic matching.
:
:
[2024-04-16 10:35:07,480] [DEBUG] [paperless.classifier] Document classification model does not exist (yet), not performing automatic matching.
[2024-04-16 11:05:00,229] [INFO] [paperless.tasks] No automatic matching items, not training
:
[2024-04-16 13:05:00,232] [INFO] [paperless.tasks] No automatic matching items, not training
[2024-04-16 13:18:00,749] [DEBUG] [paperless.tasks] Skipping plugin CollatePlugin
[2024-04-16 13:18:00,750] [DEBUG] [paperless.tasks] Skipping plugin BarcodePlugin
[2024-04-16 13:18:00,751] [DEBUG] [paperless.tasks] Executing plugin WorkflowTriggerPlugin
[2024-04-16 13:18:00,759] [INFO] [paperless.tasks] WorkflowTriggerPlugin completed with:
[2024-04-16 13:18:00,784] [INFO] [paperless.consumer] Consuming TESTDATEI_PER_BROWSER.pdf
[2024-04-16 13:18:00,789] [DEBUG] [paperless.consumer] Detected mime type: application/pdf
[2024-04-16 13:18:00,811] [DEBUG] [paperless.consumer] Parser: RasterisedDocumentParser
[2024-04-16 13:18:00,820] [DEBUG] [paperless.consumer] Parsing TESTDATEI_PER_BROWSER.pdf...
[2024-04-16 13:18:01,584] [INFO] [paperless.parsing.tesseract] pdftotext exited 0
[2024-04-16 13:18:14,214] [DEBUG] [paperless.parsing.tesseract] Calling OCRmyPDF with args: {'input_file': PosixPath('/tmp/paperless/paperless-ngx0qt6kuum/TESTDATEI_PER_BROWSER.pdf'), 'output_file': PosixPath('/tmp/paperless/paperless-1wk85bov/archive.pdf'), 'use_threads': True, 'jobs': 4, 'language': 'deu', 'output_type': 'pdfa', 'progress_bar': False, 'color_conversion_strategy': 'RGB', 'skip_text': True, 'clean': True, 'deskew': True, 'rotate_pages': True, 'rotate_pages_threshold': 12.0, 'sidecar': PosixPath('/tmp/paperless/paperless-1wk85bov/sidecar.txt')}
[2024-04-16 13:18:16,287] [INFO] [ocrmypdf._pipelines.ocr] Start processing 4 pages concurrently
[2024-04-16 13:18:16,291] [INFO] [ocrmypdf._pipeline] skipping all processing on this page
:
:
2024-04-16 13:18:16,294] [INFO] [ocrmypdf._pipeline] skipping all processing on this page
[2024-04-16 13:18:16,360] [INFO] [ocrmypdf._pipelines.ocr] Postprocessing...
[2024-04-16 13:18:17,159] [ERROR] [ocrmypdf.optimize] xref 60: While extracting this image, an error occurred
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/ocrmypdf/optimize.py", line 323, in extract_images
result = extract_fn(
^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/ocrmypdf/optimize.py", line 162, in extract_image_jbig2
ext = pim.extract_to(stream=f)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pikepdf/models/image.py", line 709, in extract_to
return self._extract_to_stream(stream=stream)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pikepdf/models/image.py", line 649, in _extract_to_stream
direct_extraction = self._extract_direct(stream=stream)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pikepdf/models/image.py", line 541, in _extract_direct
stream.write(self._generate_ccitt_header(data, icc=icc))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pikepdf/models/image.py", line 776, in _generate_ccitt_header
decode = self._decode_array
^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pikepdf/models/image.py", line 181, in _decode_array
raise NotImplementedError(
NotImplementedError: Don't how to retrieve default /Decode array for image<pikepdf.PdfImage image mode=? size=496x154 at 0x7f95a376d0>
[2024-04-16 13:18:17,180] [ERROR] [ocrmypdf.optimize] xref 61: While extracting this image, an error occurred
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/ocrmypdf/optimize.py", line 323, in extract_images
result = extract_fn(
^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/ocrmypdf/optimize.py", line 162, in extract_image_jbig2
ext = pim.extract_to(stream=f)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pikepdf/models/image.py", line 709, in extract_to
return self._extract_to_stream(stream=stream)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pikepdf/models/image.py", line 649, in _extract_to_stream
direct_extraction = self._extract_direct(stream=stream)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pikepdf/models/image.py", line 541, in _extract_direct
stream.write(self._generate_ccitt_header(data, icc=icc))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pikepdf/models/image.py", line 776, in _generate_ccitt_header
decode = self._decode_array
^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pikepdf/models/image.py", line 181, in _decode_array
raise NotImplementedError(
NotImplementedError: Don't how to retrieve default /Decode array for image<pikepdf.PdfImage image mode=? size=496x154 at 0x7f95a36c50>
[2024-04-16 13:18:17,250] [INFO] [ocrmypdf._pipeline] Image optimization ratio: 1.00 savings: 0.2%
[2024-04-16 13:18:17,251] [INFO] [ocrmypdf._pipeline] Total file size ratio: 5.80 savings: 82.7%
[2024-04-16 13:18:17,263] [INFO] [ocrmypdf._pipelines._common] Output file is a PDF/A-2B (as expected)
[2024-04-16 13:18:17,359] [DEBUG] [paperless.parsing.tesseract] Incomplete sidecar file: discarding.
[2024-04-16 13:18:17,646] [INFO] [paperless.parsing.tesseract] pdftotext exited 0
[2024-04-16 13:18:17,690] [DEBUG] [paperless.consumer] Generating thumbnail for TESTDATEI_PER_BROWSER.pdf...
[2024-04-16 13:18:17,699] [DEBUG] [paperless.parsing] Execute: convert -density 300 -scale 500x5000> -alpha remove -strip -auto-orient -define pdf:use-cropbox=true /tmp/paperless/paperless-1wk85bov/archive.pdf[0] /tmp/paperless/paperless-1wk85bov/convert.webp
[2024-04-16 13:18:22,577] [INFO] [paperless.parsing] convert exited 0
[2024-04-16 13:18:24,269] [DEBUG] [paperless.classifier] Document classification model does not exist (yet), not performing automatic matching.
[2024-04-16 13:18:24,277] [DEBUG] [paperless.consumer] Saving record to database
[2024-04-16 13:18:24,278] [DEBUG] [paperless.consumer] Creation date from parse_date: 2023-12-29 00:00:00+01:00
[2024-04-16 13:18:24,880] [DEBUG] [paperless.consumer] Deleting file /tmp/paperless/paperless-ngx0qt6kuum/TESTDATEI_PER_BROWSER.pdf
[2024-04-16 13:18:24,922] [DEBUG] [paperless.parsing.tesseract] Deleting directory /tmp/paperless/paperless-1wk85bov
[2024-04-16 13:18:24,924] [INFO] [paperless.consumer] Document 2023-12-29 TESTDATEI_PER_BROWSER consumption finished
[2024-04-16 13:19:09,389] [DEBUG] [paperless.tasks] Skipping plugin CollatePlugin
[2024-04-16 13:19:09,390] [DEBUG] [paperless.tasks] Skipping plugin BarcodePlugin
[2024-04-16 13:19:09,391] [DEBUG] [paperless.tasks] Executing plugin WorkflowTriggerPlugin
[2024-04-16 13:19:09,396] [INFO] [paperless.tasks] WorkflowTriggerPlugin completed with:
[2024-04-16 13:19:09,419] [INFO] [paperless.consumer] Consuming TESTDATEI2_PER_BROWSER.pdf
[2024-04-16 13:19:09,424] [DEBUG] [paperless.consumer] Detected mime type: application/pdf
[2024-04-16 13:19:09,444] [DEBUG] [paperless.consumer] Parser: RasterisedDocumentParser
[2024-04-16 13:19:09,453] [DEBUG] [paperless.consumer] Parsing TESTDATEI2_PER_BROWSER.pdf...
[2024-04-16 13:19:09,484] [INFO] [paperless.parsing.tesseract] pdftotext exited 0
[2024-04-16 13:19:10,346] [DEBUG] [paperless.parsing.tesseract] Calling OCRmyPDF with args: {'input_file': PosixPath('/tmp/paperless/paperless-ngxrsaj4vvt/TESTDATEI2_PER_BROWSER.pdf'), 'output_file': PosixPath('/tmp/paperless/paperless-bh9l41ec/archive.pdf'), 'use_threads': True, 'jobs': 4, 'language': 'deu', 'output_type': 'pdfa', 'progress_bar': False, 'color_conversion_strategy': 'RGB', 'skip_text': True, 'clean': True, 'deskew': True, 'rotate_pages': True, 'rotate_pages_threshold': 12.0, 'sidecar': PosixPath('/tmp/paperless/paperless-bh9l41ec/sidecar.txt')}
[2024-04-16 13:19:10,711] [INFO] [ocrmypdf._pipelines.ocr] Start processing 2 pages concurrently
[2024-04-16 13:19:10,715] [INFO] [ocrmypdf._pipeline] skipping all processing on this page
[2024-04-16 13:19:10,716] [INFO] [ocrmypdf._pipeline] skipping all processing on this page
[2024-04-16 13:19:10,735] [INFO] [ocrmypdf._pipelines.ocr] Postprocessing...
[2024-04-16 13:19:11,132] [WARNING] [ocrmypdf._metadata] Some input metadata could not be copied because it is not permitted in PDF/A. You may wish to examine the output PDF's XMP metadata.
[2024-04-16 13:19:11,261] [INFO] [ocrmypdf._pipeline] Image optimization ratio: 1.02 savings: 2.3%
[2024-04-16 13:19:11,262] [INFO] [ocrmypdf._pipeline] Total file size ratio: 1.29 savings: 22.5%
[2024-04-16 13:19:11,271] [INFO] [ocrmypdf._pipelines._common] Output file is a PDF/A-2B (as expected)
[2024-04-16 13:19:11,328] [DEBUG] [paperless.parsing.tesseract] Incomplete sidecar file: discarding.
[2024-04-16 13:19:11,456] [INFO] [paperless.parsing.tesseract] pdftotext exited 0
[2024-04-16 13:19:11,461] [DEBUG] [paperless.consumer] Generating thumbnail for TESTDATEI2_PER_BROWSER.pdf...
[2024-04-16 13:19:11,470] [DEBUG] [paperless.parsing] Execute: convert -density 300 -scale 500x5000> -alpha remove -strip -auto-orient -define pdf:use-cropbox=true /tmp/paperless/paperless-bh9l41ec/archive.pdf[0] /tmp/paperless/paperless-bh9l41ec/convert.webp
[2024-04-16 13:19:14,122] [INFO] [paperless.parsing] convert exited 0
[2024-04-16 13:19:15,643] [DEBUG] [paperless.classifier] Document classification model does not exist (yet), not performing automatic matching.
[2024-04-16 13:19:15,652] [DEBUG] [paperless.consumer] Saving record to database
[2024-04-16 13:19:15,653] [DEBUG] [paperless.consumer] Creation date from parse_date: 2024-04-02 00:00:00+02:00
[2024-04-16 13:19:15,917] [DEBUG] [paperless.consumer] Deleting file /tmp/paperless/paperless-ngxrsaj4vvt/TESTDATEI2_PER_BROWSER.pdf
[2024-04-16 13:19:15,956] [DEBUG] [paperless.parsing.tesseract] Deleting directory /tmp/paperless/paperless-bh9l41ec
[2024-04-16 13:19:15,958] [INFO] [paperless.consumer] Document 2024-04-02 TESTDATEI2_PER_BROWSER consumption finished
[2024-04-16 14:05:00,237] [INFO] [paperless.tasks] No automatic matching items, not training
[2024-04-16 15:01:07,970] [DEBUG] [paperless.classifier] Document classification model does not exist (yet), not performing automatic matching.
[2024-04-16 15:01:07,970] [DEBUG] [paperless.classifier] Document classification model does not exist (yet), not performing automatic matching.
[2024-04-16 15:05:00,233] [INFO] [paperless.tasks] No automatic matching items, not training
:
:
[2024-04-19 12:05:00,171] [INFO] [paperless.tasks] No automatic matching items, not training
Der user, mit dem paperless läuft heist „scan“
scan@paperless:~ $ ls -la /home/scan/paperless/consume/
total 832
drwxrwxrwx 2 scan scan 0 Apr 16 15:02 .
drwxr-xr-x 5 scan docker 4096 Apr 16 13:18 ..
-rwxrwxrwx 1 scan scan 272757 Apr 16 10:27 TESTDATEI_PER_CONSUME.pdf
-rwxrwxrwx 1 scan scan 279242 Apr 16 15:02 TESTDATEI2_PER_CONSUME.pdf
Das mit Webmin verstehe ich nicht Ist aber, glaube ich, nicht das Problem…