Vorstellung: Paperless-AI | KI für Paperless-NGX

Jockel.dfg2 · 21. Juli 2025 um 08:32

Hallo,

ich habe Paperless AI installiert, aber die Anbindung zu Paperless NGX funktioniert nicht. Paperless AI zieht sich keine Dokumente – alles, was in Paperless NGX verarbeitet wurde, taucht in Paperless AI nicht auf.

API-Token und Server-Adresse sind korrekt eingetragen.
Woran kann das liegen?
Gibt es Logs oder bekannte Probleme, auf die ich achten sollte?
Bitte um einen pragmatischen Lösungsvorschlag.

Danke.

Jake · 21. Juli 2025 um 08:47

Hi,

Die Entwickler von Paperless haben AI ebenfalls auf dem Schirm:

github.com/paperless-ngx/paperless-ngx

Feature: Paperless AI

dev ← feature-ai

offen 06:02AM - 03 Jul 25 UTC

shamoon

+5184 -101

_I hope it does not lead to controversy or something but I think I'm ready to di…scuss this one (hence draft, but it's ready-to-go I believe)._ LLM's are here to stay and frankly, they are _very good_ at some things. Paperless is also very good at some things, so my intent here is not to replace that functionality and as far as I'm concerned these kinds of features should forever and always remain **opt-in**. This PR implements AI-enabled (LLM) suggestions in a way that can also easily support more backends as well as 'document chat'. For now it supports the OpenAI api and Ollama locally. In my experience OpenAI is very impressive, llama3 less so, other models mixed. Adding more backends should be trivial. There of course is a lot that can be iterated on so this is a start. I also do not claim to be even remotely an expert in this stuff, so welcome input. Here's a video of suggestions: https://github.com/user-attachments/assets/8e692999-1ea3-4c22-a101-a42806e6f1e9 Chat: https://github.com/user-attachments/assets/bad313fc-c752-4901-858c-6e18c711830b <img width="1814" alt="Screenshot 2025-04-24 at 1 49 00 PM" src="https://github.com/user-attachments/assets/7a98b571-8d9a-4dd2-b086-7281de356fd8" /> More details: - Adds `PAPERLESS_AI_ENABLED`, `PAPERLESS_LLM_BACKEND`, `PAPERLESS_LLM_MODEL`, `PAPERLESS_LLM_API_KEY`, `PAPERLESS_LLM_URL`, all of which can also be configured via the frontend. - Again, all default to disabled. - Adds a 'Suggest' button: - When AI is enabled this will offer novel suggestions (this is where LLM's shine, of course). Tags, doc types etc. can be created directly from this little menu. - AI-suggested objects are also fuzzy-matched to existing ones. - Adds title suggestions (!) - When AI is disabled we still use the classifier suggestions. A small change here is that we now only auto-fetch suggestions when an inbox tag exists. - The custom fields button also moves to above the doc editing area and the area is a little wider. This has always conceptually made more sense but it is a little tricky layout-wise. - Adds 'document chat' with real-time streaming Explanation of approach: 1. Local Vector Indexing with LlamaIndex - Built a new document vector store using LlamaIndex and FaissVectorStore. - Added a management command and Celery task for automatic reindexing. - Embedded documents using a flexible embedding backend (local HuggingFace by default; OpenAI optional). - Storage of the index persisted to disk (settings.LLM_INDEX_DIR). 2. Document similarity retrieval (Retrieval-Augmented Generation aka RAG) - Added a backend system (query_similar_documents) that retrieves top-k similar documents based on vector search. - Supports feeding retrieved context into LLM prompts Limitations / future ideas: - Add more control over the prompt in general as well as the ability to include existing tags/doc types etc. - Support some kind of 'accept all' button for suggestions. - At the moment I've essentially left off storage paths. I think it doesn't make sense to suggest them unless we pass existing paths. - Obviously lots more outside of just this stuff is possible Here's the example API response from OpenAI from the video: ``` { "title": "SAT Practice Test 1 - Reading Section (Excerpt)", "tags": ["education", "test prep", "SAT", "practice test"], "correspondents": ["The College Board", "Lydia Minatoya"], "document_types": ["practice test", "standardized test material"], "storage_paths": ["Education/Test Prep/SAT"], "dates": ["2016-01-01"] } ``` Closes #(issue or discussion) ## Type of change - [ ] Bug fix: non-breaking change which fixes an issue. - [x] New feature / Enhancement: non-breaking change which adds functionality. _Please read the important note above._ - [ ] Breaking change: fix or feature that would cause existing functionality to not work as expected. - [ ] Documentation only. - [ ] Other. Please explain: ## Checklist: - [ ] I have read & agree with the [contributing guidelines](https://github.com/paperless-ngx/paperless-ngx/blob/main/CONTRIBUTING.md). - [ ] If applicable, I have included testing coverage for new code in this PR, for [backend](https://docs.paperless-ngx.com/development/#testing) and / or [front-end](https://docs.paperless-ngx.com/development/#testing-and-code-style) changes. - [ ] If applicable, I have tested my code for new features & regressions on both mobile & desktop devices, using the latest version of major browsers. - [ ] If applicable, I have checked that all tests pass, see [documentation](https://docs.paperless-ngx.com/development/#back-end-development). - [ ] I have run all `pre-commit` hooks, see [documentation](https://docs.paperless-ngx.com/development/#code-formatting-with-pre-commit-hooks). - [ ] I have made corresponding changes to the documentation as needed. - [ ] I have checked my modifications for any breaking changes.