docs: Add documentation for scan_documents_ai command

Added comprehensive documentation in administration.md including: - Command syntax and all available options - Usage examples for different scenarios - Explanation of confidence levels - Performance considerations for large document sets Co-authored-by: dawnsystem <42047891+dawnsystem@users.noreply.github.com>
2026-01-05 21:15:59 +01:00 · 2025-11-14 15:53:06 +00:00 · 2025-11-14 15:53:06 +00:00 · 879a65ed9f
commit 879a65ed9f
parent 920998f6b5
1 changed files with 74 additions and 0 deletions
--- a/docs/administration.md
+++ b/docs/administration.md
@ -416,6 +416,80 @@ assigned. `-f` works differently for tags: By default, only additional
 tags get added to documents, no tags will be removed. With `-f`, tags
 that don't match a document anymore get removed as well.

+### AI Document Scanner {#ai-scanner}
+
+The AI Document Scanner uses machine learning and natural language processing to automatically
+analyze documents and suggest metadata (tags, correspondents, document types, storage paths,
+custom fields, and workflows). This is useful for applying AI analysis to existing documents
+that were imported before the AI scanner was enabled, or to re-scan documents with updated
+AI models.
+
+```
+scan_documents_ai [-h] [--all] [--filter-by-type TYPE_ID [TYPE_ID ...]]
+                  [--date-range START_DATE END_DATE] [--id-range START_ID END_ID]
+                  [--dry-run] [--auto-apply-high-confidence]
+                  [--confidence-threshold THRESHOLD] [--no-progress-bar]
+                  [--batch-size SIZE]
+
+optional arguments:
+--all                           Scan all documents in the system
+--filter-by-type TYPE_ID        Filter by document type ID(s)
+--date-range START_DATE END_DATE
+                                Filter by creation date range (YYYY-MM-DD format)
+--id-range START_ID END_ID      Filter by document ID range
+--dry-run                       Preview suggestions without applying changes
+--auto-apply-high-confidence    Automatically apply high confidence suggestions (≥80%)
+--confidence-threshold THRESHOLD
+                                Minimum confidence threshold (0.0-1.0, default: 0.60)
+--no-progress-bar               Disable progress bar display
+--batch-size SIZE               Number of documents to process at once (default: 100)
+```
+
+The command processes documents through the comprehensive AI scanner and generates
+suggestions for metadata. You must specify at least one filter option (`--all`,
+`--filter-by-type`, `--date-range`, or `--id-range`).
+
+**Examples:**
+
+Scan all documents in dry-run mode (preview only):
+```bash
+python manage.py scan_documents_ai --all --dry-run
+```
+
+Scan documents of a specific type and auto-apply high confidence suggestions:
+```bash
+python manage.py scan_documents_ai --filter-by-type 1 3 --auto-apply-high-confidence
+```
+
+Scan documents from a date range:
+```bash
+python manage.py scan_documents_ai --date-range 2024-01-01 2024-12-31 --dry-run
+```
+
+Scan a specific range of document IDs:
+```bash
+python manage.py scan_documents_ai --id-range 100 200 --auto-apply-high-confidence
+```
+
+**Understanding Confidence Levels:**
+
+The AI scanner assigns confidence scores to each suggestion:
+- **High confidence (≥80%)**: Very reliable suggestions that can be auto-applied with `--auto-apply-high-confidence`
+- **Medium confidence (60-79%)**: Suggestions that should be reviewed before applying
+- **Low confidence (<60%)**: Not shown by default, increase with `--confidence-threshold` if needed
+
+The command displays a detailed summary at the end, including:
+- Number of documents processed
+- Total suggestions generated
+- Sample suggestions for the first 5 documents with suggestions
+- Any errors encountered during processing
+
+**Performance Considerations:**
+
+For large document sets, the scanner processes documents in batches (default: 100 documents).
+You can adjust this with `--batch-size` to balance between memory usage and processing speed.
+The scanner is designed to handle thousands of documents without affecting system performance.
+
 ### Managing the Automatic matching algorithm

 The _Auto_ matching algorithm requires a trained neural network to work.