2025-11-09 00:58:28 +00:00
|
|
|
# IntelliDocs-ngx Technical Functions Guide
|
|
|
|
|
|
|
|
|
|
## Complete Function Reference
|
|
|
|
|
|
|
|
|
|
This document provides detailed documentation for all major functions in IntelliDocs-ngx.
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## Table of Contents
|
|
|
|
|
|
|
|
|
|
1. [Documents Module Functions](#1-documents-module-functions)
|
|
|
|
|
2. [Paperless Core Functions](#2-paperless-core-functions)
|
|
|
|
|
3. [Mail Integration Functions](#3-mail-integration-functions)
|
|
|
|
|
4. [OCR & Parsing Functions](#4-ocr--parsing-functions)
|
|
|
|
|
5. [API & Serialization Functions](#5-api--serialization-functions)
|
|
|
|
|
6. [Frontend Services & Components](#6-frontend-services--components)
|
|
|
|
|
7. [Utility Functions](#7-utility-functions)
|
|
|
|
|
8. [Database Models & Methods](#8-database-models--methods)
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## 1. Documents Module Functions
|
|
|
|
|
|
|
|
|
|
### 1.1 Consumer Module (`documents/consumer.py`)
|
|
|
|
|
|
|
|
|
|
#### Class: `Consumer`
|
|
|
|
|
Main class responsible for consuming and processing documents.
|
|
|
|
|
|
|
|
|
|
##### `__init__(self)`
|
|
|
|
|
```python
|
|
|
|
|
def __init__(self)
|
|
|
|
|
```
|
|
|
|
|
**Purpose**: Initialize the consumer with logging and configuration.
|
|
|
|
|
|
|
|
|
|
**Parameters**: None
|
|
|
|
|
|
|
|
|
|
**Returns**: Consumer instance
|
|
|
|
|
|
|
|
|
|
**Usage**:
|
|
|
|
|
```python
|
|
|
|
|
consumer = Consumer()
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
##### `try_consume_file(self, path, override_filename=None, override_title=None, ...)`
|
|
|
|
|
```python
|
|
|
|
|
def try_consume_file(
|
|
|
|
|
self,
|
|
|
|
|
path,
|
|
|
|
|
override_filename=None,
|
|
|
|
|
override_title=None,
|
|
|
|
|
override_correspondent_id=None,
|
|
|
|
|
override_document_type_id=None,
|
|
|
|
|
override_tag_ids=None,
|
|
|
|
|
override_created=None,
|
|
|
|
|
override_asn=None,
|
|
|
|
|
task_id=None,
|
|
|
|
|
...
|
|
|
|
|
)
|
|
|
|
|
```
|
|
|
|
|
**Purpose**: Entry point for consuming a document file.
|
|
|
|
|
|
|
|
|
|
**Parameters**:
|
|
|
|
|
- `path` (str): Full path to the document file
|
|
|
|
|
- `override_filename` (str, optional): Custom filename to use
|
|
|
|
|
- `override_title` (str, optional): Custom document title
|
|
|
|
|
- `override_correspondent_id` (int, optional): Force specific correspondent
|
|
|
|
|
- `override_document_type_id` (int, optional): Force specific document type
|
|
|
|
|
- `override_tag_ids` (list, optional): Force specific tags
|
|
|
|
|
- `override_created` (datetime, optional): Override creation date
|
|
|
|
|
- `override_asn` (int, optional): Archive serial number
|
|
|
|
|
- `task_id` (str, optional): Celery task ID for progress tracking
|
|
|
|
|
|
|
|
|
|
**Returns**: Document ID (int) or raises exception
|
|
|
|
|
|
|
|
|
|
**Raises**:
|
|
|
|
|
- `ConsumerError`: If document consumption fails
|
|
|
|
|
- `FileNotFoundError`: If file doesn't exist
|
|
|
|
|
|
|
|
|
|
**Process Flow**:
|
|
|
|
|
1. Validate file exists and is readable
|
|
|
|
|
2. Determine file type
|
|
|
|
|
3. Select appropriate parser
|
|
|
|
|
4. Extract text via OCR/parsing
|
|
|
|
|
5. Apply classification rules
|
|
|
|
|
6. Extract metadata
|
|
|
|
|
7. Create thumbnails
|
|
|
|
|
8. Save to database
|
|
|
|
|
9. Trigger post-consumption workflows
|
|
|
|
|
10. Cleanup temporary files
|
|
|
|
|
|
|
|
|
|
**Example**:
|
|
|
|
|
```python
|
|
|
|
|
doc_id = consumer.try_consume_file(
|
|
|
|
|
path="/tmp/invoice.pdf",
|
|
|
|
|
override_correspondent_id=5,
|
|
|
|
|
override_tag_ids=[1, 3, 7]
|
|
|
|
|
)
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
##### `_consume(self, path, document, ...)`
|
|
|
|
|
```python
|
|
|
|
|
def _consume(self, path, document, metadata_from_path)
|
|
|
|
|
```
|
|
|
|
|
**Purpose**: Internal method that performs the actual document consumption.
|
|
|
|
|
|
|
|
|
|
**Parameters**:
|
|
|
|
|
- `path` (str): Path to document
|
|
|
|
|
- `document` (Document): Document model instance
|
|
|
|
|
- `metadata_from_path` (dict): Extracted metadata from filename
|
|
|
|
|
|
|
|
|
|
**Returns**: None (modifies document in place)
|
|
|
|
|
|
|
|
|
|
**Process**:
|
|
|
|
|
1. Parse document with selected parser
|
|
|
|
|
2. Extract text content
|
|
|
|
|
3. Store original file
|
|
|
|
|
4. Generate archive version
|
|
|
|
|
5. Create thumbnails
|
|
|
|
|
6. Index for search
|
|
|
|
|
7. Run classifier if enabled
|
|
|
|
|
8. Apply matching rules
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
##### `_write(self, document, path, original_filename, ...)`
|
|
|
|
|
```python
|
|
|
|
|
def _write(self, document, path, original_filename, original_checksum, ...):
|
|
|
|
|
```
|
|
|
|
|
**Purpose**: Save document to database and filesystem.
|
|
|
|
|
|
|
|
|
|
**Parameters**:
|
|
|
|
|
- `document` (Document): Document instance to save
|
|
|
|
|
- `path` (str): Source file path
|
|
|
|
|
- `original_filename` (str): Original filename
|
|
|
|
|
- `original_checksum` (str): MD5/SHA256 checksum
|
|
|
|
|
|
|
|
|
|
**Returns**: None
|
|
|
|
|
|
|
|
|
|
**Side Effects**:
|
|
|
|
|
- Saves document to database
|
|
|
|
|
- Moves files to final locations
|
|
|
|
|
- Creates backup entries
|
|
|
|
|
- Triggers post-save signals
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
### 1.2 Classifier Module (`documents/classifier.py`)
|
|
|
|
|
|
|
|
|
|
#### Class: `DocumentClassifier`
|
|
|
|
|
Implements machine learning classification for automatic document categorization.
|
|
|
|
|
|
|
|
|
|
##### `__init__(self)`
|
|
|
|
|
```python
|
|
|
|
|
def __init__(self)
|
|
|
|
|
```
|
|
|
|
|
**Purpose**: Initialize classifier with sklearn models.
|
|
|
|
|
|
|
|
|
|
**Components**:
|
|
|
|
|
- `vectorizer`: TfidfVectorizer for text feature extraction
|
|
|
|
|
- `correspondent_classifier`: LinearSVC for correspondent prediction
|
|
|
|
|
- `document_type_classifier`: LinearSVC for document type prediction
|
|
|
|
|
- `tag_classifier`: OneVsRestClassifier for multi-label tag prediction
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
##### `train(self)`
|
|
|
|
|
```python
|
|
|
|
|
def train(self) -> bool
|
|
|
|
|
```
|
|
|
|
|
**Purpose**: Train classification models on existing documents.
|
|
|
|
|
|
|
|
|
|
**Parameters**: None
|
|
|
|
|
|
2025-11-17 19:17:49 +00:00
|
|
|
**Returns**:
|
2025-11-09 00:58:28 +00:00
|
|
|
- `True` if training successful
|
|
|
|
|
- `False` if insufficient data
|
|
|
|
|
|
|
|
|
|
**Requirements**:
|
|
|
|
|
- Minimum 50 documents with correspondents for correspondent training
|
|
|
|
|
- Minimum 50 documents with document types for type training
|
|
|
|
|
- Minimum 50 documents with tags for tag training
|
|
|
|
|
|
|
|
|
|
**Process**:
|
|
|
|
|
1. Load all documents from database
|
|
|
|
|
2. Extract text features using TF-IDF
|
|
|
|
|
3. Train correspondent classifier
|
|
|
|
|
4. Train document type classifier
|
|
|
|
|
5. Train tag classifier (multi-label)
|
|
|
|
|
6. Save models to disk
|
|
|
|
|
7. Log accuracy metrics
|
|
|
|
|
|
|
|
|
|
**Example**:
|
|
|
|
|
```python
|
|
|
|
|
classifier = DocumentClassifier()
|
|
|
|
|
success = classifier.train()
|
|
|
|
|
if success:
|
|
|
|
|
print("Classifier trained successfully")
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
##### `classify_document(self, document)`
|
|
|
|
|
```python
|
|
|
|
|
def classify_document(self, document) -> dict
|
|
|
|
|
```
|
|
|
|
|
**Purpose**: Predict classifications for a document.
|
|
|
|
|
|
|
|
|
|
**Parameters**:
|
|
|
|
|
- `document` (Document): Document to classify
|
|
|
|
|
|
|
|
|
|
**Returns**: Dictionary with predictions:
|
|
|
|
|
```python
|
|
|
|
|
{
|
|
|
|
|
'correspondent': int or None,
|
|
|
|
|
'document_type': int or None,
|
|
|
|
|
'tags': list of int,
|
|
|
|
|
'correspondent_confidence': float,
|
|
|
|
|
'document_type_confidence': float,
|
|
|
|
|
'tags_confidence': list of float
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
**Example**:
|
|
|
|
|
```python
|
|
|
|
|
predictions = classifier.classify_document(my_document)
|
|
|
|
|
print(f"Suggested correspondent: {predictions['correspondent']}")
|
|
|
|
|
print(f"Confidence: {predictions['correspondent_confidence']}")
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
##### `calculate_best_correspondent(self, document)`
|
|
|
|
|
```python
|
|
|
|
|
def calculate_best_correspondent(self, document) -> tuple
|
|
|
|
|
```
|
|
|
|
|
**Purpose**: Find the most likely correspondent for a document.
|
|
|
|
|
|
|
|
|
|
**Parameters**:
|
|
|
|
|
- `document` (Document): Document to analyze
|
|
|
|
|
|
|
|
|
|
**Returns**: `(correspondent_id, confidence_score)`
|
|
|
|
|
|
|
|
|
|
**Algorithm**:
|
|
|
|
|
1. Check for matching rules (highest priority)
|
|
|
|
|
2. If no match, use ML classifier
|
|
|
|
|
3. Calculate confidence based on decision function
|
|
|
|
|
4. Return correspondent if confidence > threshold
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
##### `calculate_best_document_type(self, document)`
|
|
|
|
|
```python
|
|
|
|
|
def calculate_best_document_type(self, document) -> tuple
|
|
|
|
|
```
|
|
|
|
|
**Purpose**: Determine the best document type classification.
|
|
|
|
|
|
|
|
|
|
**Parameters**:
|
|
|
|
|
- `document` (Document): Document to classify
|
|
|
|
|
|
|
|
|
|
**Returns**: `(document_type_id, confidence_score)`
|
|
|
|
|
|
|
|
|
|
**Similar to correspondent classification but for document types.**
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
##### `calculate_best_tags(self, document)`
|
|
|
|
|
```python
|
|
|
|
|
def calculate_best_tags(self, document) -> list
|
|
|
|
|
```
|
|
|
|
|
**Purpose**: Suggest relevant tags for a document.
|
|
|
|
|
|
|
|
|
|
**Parameters**:
|
|
|
|
|
- `document` (Document): Document to tag
|
|
|
|
|
|
|
|
|
|
**Returns**: List of `(tag_id, confidence_score)` tuples
|
|
|
|
|
|
|
|
|
|
**Multi-label Classification**:
|
|
|
|
|
- Can return multiple tags
|
|
|
|
|
- Each tag has independent confidence score
|
|
|
|
|
- Returns tags above confidence threshold
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
### 1.3 Index Module (`documents/index.py`)
|
|
|
|
|
|
|
|
|
|
#### Class: `DocumentIndex`
|
|
|
|
|
Manages full-text search indexing for documents.
|
|
|
|
|
|
|
|
|
|
##### `__init__(self, index_dir=None)`
|
|
|
|
|
```python
|
|
|
|
|
def __init__(self, index_dir=None)
|
|
|
|
|
```
|
|
|
|
|
**Purpose**: Initialize search index.
|
|
|
|
|
|
|
|
|
|
**Parameters**:
|
|
|
|
|
- `index_dir` (str, optional): Path to index directory
|
|
|
|
|
|
|
|
|
|
**Components**:
|
|
|
|
|
- Uses Whoosh library for indexing
|
|
|
|
|
- Creates schema with fields: id, title, content, correspondent, tags
|
|
|
|
|
- Supports stemming and stop words
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
##### `add_or_update_document(self, document)`
|
|
|
|
|
```python
|
|
|
|
|
def add_or_update_document(self, document) -> None
|
|
|
|
|
```
|
|
|
|
|
**Purpose**: Add or update a document in the search index.
|
|
|
|
|
|
|
|
|
|
**Parameters**:
|
|
|
|
|
- `document` (Document): Document to index
|
|
|
|
|
|
|
|
|
|
**Process**:
|
|
|
|
|
1. Extract searchable text
|
|
|
|
|
2. Tokenize and stem words
|
|
|
|
|
3. Build search index entry
|
|
|
|
|
4. Update or insert into index
|
|
|
|
|
5. Commit changes
|
|
|
|
|
|
|
|
|
|
**Example**:
|
|
|
|
|
```python
|
|
|
|
|
index = DocumentIndex()
|
|
|
|
|
index.add_or_update_document(my_document)
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
##### `remove_document(self, document_id)`
|
|
|
|
|
```python
|
|
|
|
|
def remove_document(self, document_id) -> None
|
|
|
|
|
```
|
|
|
|
|
**Purpose**: Remove a document from search index.
|
|
|
|
|
|
|
|
|
|
**Parameters**:
|
|
|
|
|
- `document_id` (int): ID of document to remove
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
##### `search(self, query_string, limit=50)`
|
|
|
|
|
```python
|
|
|
|
|
def search(self, query_string, limit=50) -> list
|
|
|
|
|
```
|
|
|
|
|
**Purpose**: Perform full-text search.
|
|
|
|
|
|
|
|
|
|
**Parameters**:
|
|
|
|
|
- `query_string` (str): Search query
|
|
|
|
|
- `limit` (int): Maximum results to return
|
|
|
|
|
|
|
|
|
|
**Returns**: List of document IDs, ranked by relevance
|
|
|
|
|
|
|
|
|
|
**Features**:
|
|
|
|
|
- Boolean operators (AND, OR, NOT)
|
|
|
|
|
- Phrase search ("exact phrase")
|
|
|
|
|
- Wildcard search (docu*)
|
|
|
|
|
- Field-specific search (title:invoice)
|
|
|
|
|
- Ranking by TF-IDF and BM25
|
|
|
|
|
|
|
|
|
|
**Example**:
|
|
|
|
|
```python
|
|
|
|
|
results = index.search("invoice AND 2023")
|
|
|
|
|
documents = Document.objects.filter(id__in=results)
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
### 1.4 Matching Module (`documents/matching.py`)
|
|
|
|
|
|
|
|
|
|
#### Class: `Match`
|
|
|
|
|
Represents a matching rule for automatic classification.
|
|
|
|
|
|
|
|
|
|
##### Properties:
|
|
|
|
|
- `matching_algorithm`: "any", "all", "literal", "regex", "fuzzy"
|
|
|
|
|
- `match`: Pattern to match
|
|
|
|
|
- `is_insensitive`: Case-insensitive matching
|
|
|
|
|
|
|
|
|
|
##### `matches(self, text)`
|
|
|
|
|
```python
|
|
|
|
|
def matches(self, text) -> bool
|
|
|
|
|
```
|
|
|
|
|
**Purpose**: Check if text matches this rule.
|
|
|
|
|
|
|
|
|
|
**Parameters**:
|
|
|
|
|
- `text` (str): Text to check
|
|
|
|
|
|
|
|
|
|
**Returns**: True if matches, False otherwise
|
|
|
|
|
|
|
|
|
|
**Algorithms**:
|
|
|
|
|
- **any**: Match if any word in pattern is in text
|
|
|
|
|
- **all**: Match if all words in pattern are in text
|
|
|
|
|
- **literal**: Exact substring match
|
|
|
|
|
- **regex**: Regular expression match
|
|
|
|
|
- **fuzzy**: Fuzzy string matching (Levenshtein distance)
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
#### Function: `match_correspondents(document, classifier=None)`
|
|
|
|
|
```python
|
|
|
|
|
def match_correspondents(document, classifier=None) -> int or None
|
|
|
|
|
```
|
|
|
|
|
**Purpose**: Find correspondent for document using rules and classifier.
|
|
|
|
|
|
|
|
|
|
**Parameters**:
|
|
|
|
|
- `document` (Document): Document to match
|
|
|
|
|
- `classifier` (DocumentClassifier, optional): ML classifier
|
|
|
|
|
|
|
|
|
|
**Returns**: Correspondent ID or None
|
|
|
|
|
|
|
|
|
|
**Process**:
|
|
|
|
|
1. Check manual assignment
|
|
|
|
|
2. Apply matching rules (in order of priority)
|
|
|
|
|
3. If no match, use ML classifier
|
|
|
|
|
4. Return correspondent if confidence sufficient
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
#### Function: `match_document_type(document, classifier=None)`
|
|
|
|
|
```python
|
|
|
|
|
def match_document_type(document, classifier=None) -> int or None
|
|
|
|
|
```
|
|
|
|
|
**Purpose**: Find document type using rules and classifier.
|
|
|
|
|
|
|
|
|
|
**Similar to correspondent matching.**
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
#### Function: `match_tags(document, classifier=None)`
|
|
|
|
|
```python
|
|
|
|
|
def match_tags(document, classifier=None) -> list
|
|
|
|
|
```
|
|
|
|
|
**Purpose**: Find matching tags using rules and classifier.
|
|
|
|
|
|
|
|
|
|
**Returns**: List of tag IDs
|
|
|
|
|
|
|
|
|
|
**Multi-label**: Can return multiple tags.
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
### 1.5 Barcode Module (`documents/barcodes.py`)
|
|
|
|
|
|
|
|
|
|
#### Function: `get_barcodes(path, pages=None)`
|
|
|
|
|
```python
|
|
|
|
|
def get_barcodes(path, pages=None) -> list
|
|
|
|
|
```
|
|
|
|
|
**Purpose**: Extract barcodes from document.
|
|
|
|
|
|
|
|
|
|
**Parameters**:
|
|
|
|
|
- `path` (str): Path to document
|
|
|
|
|
- `pages` (list, optional): Specific pages to scan
|
|
|
|
|
|
|
|
|
|
**Returns**: List of barcode dictionaries:
|
|
|
|
|
```python
|
|
|
|
|
[
|
|
|
|
|
{
|
|
|
|
|
'type': 'CODE128',
|
|
|
|
|
'data': 'ABC123',
|
|
|
|
|
'page': 1,
|
|
|
|
|
'bbox': [x, y, w, h]
|
|
|
|
|
},
|
|
|
|
|
...
|
|
|
|
|
]
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
**Supported Formats**:
|
|
|
|
|
- CODE128, CODE39, QR Code, Data Matrix, EAN, UPC
|
|
|
|
|
|
2025-11-17 19:17:49 +00:00
|
|
|
**Uses**:
|
2025-11-09 00:58:28 +00:00
|
|
|
- pyzbar library for barcode detection
|
|
|
|
|
- OpenCV for image processing
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
#### Function: `barcode_reader(path)`
|
|
|
|
|
```python
|
|
|
|
|
def barcode_reader(path) -> dict
|
|
|
|
|
```
|
|
|
|
|
**Purpose**: Read and interpret barcode data.
|
|
|
|
|
|
|
|
|
|
**Returns**: Parsed barcode information with metadata.
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
#### Function: `separate_pages(path, barcodes)`
|
|
|
|
|
```python
|
|
|
|
|
def separate_pages(path, barcodes) -> list
|
|
|
|
|
```
|
|
|
|
|
**Purpose**: Split document based on separator barcodes.
|
|
|
|
|
|
|
|
|
|
**Parameters**:
|
|
|
|
|
- `path` (str): Path to multi-page document
|
|
|
|
|
- `barcodes` (list): Detected barcodes with page numbers
|
|
|
|
|
|
|
|
|
|
**Returns**: List of paths to separated documents
|
|
|
|
|
|
2025-11-17 19:17:49 +00:00
|
|
|
**Use Case**:
|
2025-11-09 00:58:28 +00:00
|
|
|
- Batch scanning with separator sheets
|
|
|
|
|
- Automatic document splitting
|
|
|
|
|
|
|
|
|
|
**Example**:
|
|
|
|
|
```python
|
|
|
|
|
# Scan stack of documents with barcode separators
|
|
|
|
|
barcodes = get_barcodes("/tmp/batch.pdf")
|
|
|
|
|
documents = separate_pages("/tmp/batch.pdf", barcodes)
|
|
|
|
|
for doc_path in documents:
|
|
|
|
|
consumer.try_consume_file(doc_path)
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
### 1.6 Bulk Edit Module (`documents/bulk_edit.py`)
|
|
|
|
|
|
|
|
|
|
#### Class: `BulkEditService`
|
|
|
|
|
Handles mass document operations efficiently.
|
|
|
|
|
|
|
|
|
|
##### `update_documents(self, document_ids, updates)`
|
|
|
|
|
```python
|
|
|
|
|
def update_documents(self, document_ids, updates) -> dict
|
|
|
|
|
```
|
|
|
|
|
**Purpose**: Update multiple documents at once.
|
|
|
|
|
|
|
|
|
|
**Parameters**:
|
|
|
|
|
- `document_ids` (list): List of document IDs
|
|
|
|
|
- `updates` (dict): Fields to update
|
|
|
|
|
|
|
|
|
|
**Returns**: Result summary:
|
|
|
|
|
```python
|
|
|
|
|
{
|
|
|
|
|
'updated': 42,
|
|
|
|
|
'failed': 0,
|
|
|
|
|
'errors': []
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
**Supported Updates**:
|
|
|
|
|
- correspondent
|
|
|
|
|
- document_type
|
|
|
|
|
- tags (add, remove, replace)
|
|
|
|
|
- storage_path
|
|
|
|
|
- custom fields
|
|
|
|
|
- permissions
|
|
|
|
|
|
|
|
|
|
**Optimizations**:
|
|
|
|
|
- Batched database operations
|
|
|
|
|
- Minimal signal triggering
|
|
|
|
|
- Deferred index updates
|
|
|
|
|
|
|
|
|
|
**Example**:
|
|
|
|
|
```python
|
|
|
|
|
service = BulkEditService()
|
|
|
|
|
result = service.update_documents(
|
|
|
|
|
document_ids=[1, 2, 3, 4, 5],
|
|
|
|
|
updates={
|
|
|
|
|
'document_type': 3,
|
|
|
|
|
'tags_add': [7, 8],
|
|
|
|
|
'tags_remove': [2]
|
|
|
|
|
}
|
|
|
|
|
)
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
##### `merge_documents(self, document_ids, target_id=None)`
|
|
|
|
|
```python
|
|
|
|
|
def merge_documents(self, document_ids, target_id=None) -> int
|
|
|
|
|
```
|
|
|
|
|
**Purpose**: Combine multiple documents into one.
|
|
|
|
|
|
|
|
|
|
**Parameters**:
|
|
|
|
|
- `document_ids` (list): Documents to merge
|
|
|
|
|
- `target_id` (int, optional): ID of target document
|
|
|
|
|
|
|
|
|
|
**Returns**: ID of merged document
|
|
|
|
|
|
|
|
|
|
**Process**:
|
|
|
|
|
1. Combine PDFs
|
|
|
|
|
2. Merge metadata (tags, etc.)
|
|
|
|
|
3. Preserve all original files
|
|
|
|
|
4. Update search index
|
|
|
|
|
5. Delete source documents (soft delete)
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
##### `split_document(self, document_id, split_pages)`
|
|
|
|
|
```python
|
|
|
|
|
def split_document(self, document_id, split_pages) -> list
|
|
|
|
|
```
|
|
|
|
|
**Purpose**: Split a document into multiple documents.
|
|
|
|
|
|
|
|
|
|
**Parameters**:
|
|
|
|
|
- `document_id` (int): Document to split
|
|
|
|
|
- `split_pages` (list): Page ranges for each new document
|
|
|
|
|
|
|
|
|
|
**Returns**: List of new document IDs
|
|
|
|
|
|
|
|
|
|
**Example**:
|
|
|
|
|
```python
|
|
|
|
|
# Split 10-page document into 3 documents
|
|
|
|
|
new_docs = service.split_document(
|
|
|
|
|
document_id=42,
|
|
|
|
|
split_pages=[
|
|
|
|
|
[1, 2, 3], # First 3 pages
|
|
|
|
|
[4, 5, 6, 7], # Middle 4 pages
|
|
|
|
|
[8, 9, 10] # Last 3 pages
|
|
|
|
|
]
|
|
|
|
|
)
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
### 1.7 Workflow Module (`documents/workflows/`)
|
|
|
|
|
|
|
|
|
|
#### Class: `WorkflowEngine`
|
|
|
|
|
Executes automated document workflows.
|
|
|
|
|
|
|
|
|
|
##### `execute_workflow(self, workflow, document, trigger_type)`
|
|
|
|
|
```python
|
|
|
|
|
def execute_workflow(self, workflow, document, trigger_type) -> dict
|
|
|
|
|
```
|
|
|
|
|
**Purpose**: Run a workflow on a document.
|
|
|
|
|
|
|
|
|
|
**Parameters**:
|
|
|
|
|
- `workflow` (Workflow): Workflow definition
|
|
|
|
|
- `document` (Document): Target document
|
|
|
|
|
- `trigger_type` (str): What triggered this workflow
|
|
|
|
|
|
|
|
|
|
**Returns**: Execution result:
|
|
|
|
|
```python
|
|
|
|
|
{
|
|
|
|
|
'success': True,
|
|
|
|
|
'actions_executed': 5,
|
|
|
|
|
'actions_failed': 0,
|
|
|
|
|
'errors': []
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
**Workflow Components**:
|
2025-11-17 19:17:49 +00:00
|
|
|
1. **Triggers**:
|
2025-11-09 00:58:28 +00:00
|
|
|
- consumption
|
|
|
|
|
- manual
|
|
|
|
|
- scheduled
|
|
|
|
|
- webhook
|
2025-11-17 19:17:49 +00:00
|
|
|
|
2025-11-09 00:58:28 +00:00
|
|
|
2. **Conditions**:
|
|
|
|
|
- Document properties
|
|
|
|
|
- Content matching
|
|
|
|
|
- Date ranges
|
|
|
|
|
- Custom field values
|
2025-11-17 19:17:49 +00:00
|
|
|
|
2025-11-09 00:58:28 +00:00
|
|
|
3. **Actions**:
|
|
|
|
|
- Set correspondent
|
|
|
|
|
- Set document type
|
|
|
|
|
- Add/remove tags
|
|
|
|
|
- Set custom fields
|
|
|
|
|
- Execute webhook
|
|
|
|
|
- Send email
|
|
|
|
|
- Run script
|
|
|
|
|
|
|
|
|
|
**Example Workflow**:
|
|
|
|
|
```python
|
|
|
|
|
workflow = {
|
|
|
|
|
'name': 'Invoice Processing',
|
|
|
|
|
'trigger': 'consumption',
|
|
|
|
|
'conditions': [
|
|
|
|
|
{'field': 'content', 'operator': 'contains', 'value': 'INVOICE'}
|
|
|
|
|
],
|
|
|
|
|
'actions': [
|
|
|
|
|
{'type': 'set_document_type', 'value': 2},
|
|
|
|
|
{'type': 'add_tags', 'value': [5, 6]},
|
|
|
|
|
{'type': 'webhook', 'url': 'https://api.example.com/invoice'}
|
|
|
|
|
]
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## 2. Paperless Core Functions
|
|
|
|
|
|
|
|
|
|
### 2.1 Settings Module (`paperless/settings.py`)
|
|
|
|
|
|
|
|
|
|
#### Configuration Functions
|
|
|
|
|
|
|
|
|
|
##### `load_config_from_env()`
|
|
|
|
|
```python
|
|
|
|
|
def load_config_from_env() -> dict
|
|
|
|
|
```
|
|
|
|
|
**Purpose**: Load configuration from environment variables.
|
|
|
|
|
|
|
|
|
|
**Returns**: Configuration dictionary
|
|
|
|
|
|
|
|
|
|
**Environment Variables**:
|
|
|
|
|
- `PAPERLESS_DBHOST`: Database host
|
|
|
|
|
- `PAPERLESS_DBPORT`: Database port
|
|
|
|
|
- `PAPERLESS_OCR_LANGUAGE`: OCR languages
|
|
|
|
|
- `PAPERLESS_CONSUMER_POLLING`: Polling interval
|
|
|
|
|
- `PAPERLESS_TASK_WORKERS`: Number of workers
|
|
|
|
|
- `PAPERLESS_SECRET_KEY`: Django secret key
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
##### `validate_settings(settings)`
|
|
|
|
|
```python
|
|
|
|
|
def validate_settings(settings) -> list
|
|
|
|
|
```
|
|
|
|
|
**Purpose**: Validate configuration for errors.
|
|
|
|
|
|
|
|
|
|
**Returns**: List of validation errors
|
|
|
|
|
|
|
|
|
|
**Checks**:
|
|
|
|
|
- Required settings present
|
|
|
|
|
- Valid database configuration
|
|
|
|
|
- OCR languages available
|
|
|
|
|
- Storage paths exist
|
|
|
|
|
- Secret key security
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
### 2.2 Celery Module (`paperless/celery.py`)
|
|
|
|
|
|
|
|
|
|
#### Task Configuration
|
|
|
|
|
|
|
|
|
|
##### `@app.task`
|
|
|
|
|
Decorator for creating Celery tasks.
|
|
|
|
|
|
|
|
|
|
**Example**:
|
|
|
|
|
```python
|
|
|
|
|
@app.task(bind=True, max_retries=3)
|
|
|
|
|
def process_document(self, doc_id):
|
|
|
|
|
try:
|
|
|
|
|
document = Document.objects.get(id=doc_id)
|
|
|
|
|
# Process document
|
|
|
|
|
except Exception as exc:
|
|
|
|
|
raise self.retry(exc=exc, countdown=60)
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
##### Periodic Tasks
|
|
|
|
|
|
|
|
|
|
```python
|
|
|
|
|
@app.on_after_finalize.connect
|
|
|
|
|
def setup_periodic_tasks(sender, **kwargs):
|
|
|
|
|
# Run sanity check daily at 3:30 AM
|
|
|
|
|
sender.add_periodic_task(
|
|
|
|
|
crontab(hour=3, minute=30),
|
|
|
|
|
sanity_check.s(),
|
|
|
|
|
name='daily-sanity-check'
|
|
|
|
|
)
|
2025-11-17 19:17:49 +00:00
|
|
|
|
2025-11-09 00:58:28 +00:00
|
|
|
# Train classifier weekly
|
|
|
|
|
sender.add_periodic_task(
|
|
|
|
|
crontab(day_of_week=0, hour=2, minute=0),
|
|
|
|
|
train_classifier.s(),
|
|
|
|
|
name='weekly-classifier-training'
|
|
|
|
|
)
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
### 2.3 Authentication Module (`paperless/auth.py`)
|
|
|
|
|
|
|
|
|
|
#### Class: `PaperlessRemoteUserBackend`
|
|
|
|
|
Custom authentication backend.
|
|
|
|
|
|
|
|
|
|
##### `authenticate(self, request, remote_user=None)`
|
|
|
|
|
```python
|
|
|
|
|
def authenticate(self, request, remote_user=None) -> User or None
|
|
|
|
|
```
|
|
|
|
|
**Purpose**: Authenticate user via HTTP header (SSO).
|
|
|
|
|
|
|
|
|
|
**Parameters**:
|
|
|
|
|
- `request`: HTTP request
|
|
|
|
|
- `remote_user`: Username from header
|
|
|
|
|
|
|
|
|
|
**Returns**: User instance or None
|
|
|
|
|
|
|
|
|
|
**Supports**:
|
|
|
|
|
- HTTP_REMOTE_USER header
|
|
|
|
|
- LDAP integration
|
|
|
|
|
- OAuth2 providers
|
|
|
|
|
- SAML
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## 3. Mail Integration Functions
|
|
|
|
|
|
|
|
|
|
### 3.1 Mail Processing (`paperless_mail/mail.py`)
|
|
|
|
|
|
|
|
|
|
#### Class: `MailAccountHandler`
|
|
|
|
|
|
|
|
|
|
##### `get_messages(self, max_messages=100)`
|
|
|
|
|
```python
|
|
|
|
|
def get_messages(self, max_messages=100) -> list
|
|
|
|
|
```
|
|
|
|
|
**Purpose**: Fetch emails from mail account.
|
|
|
|
|
|
|
|
|
|
**Parameters**:
|
|
|
|
|
- `max_messages` (int): Maximum emails to fetch
|
|
|
|
|
|
|
|
|
|
**Returns**: List of email message objects
|
|
|
|
|
|
|
|
|
|
**Protocols**:
|
|
|
|
|
- IMAP
|
|
|
|
|
- IMAP with OAuth2 (Gmail, Outlook)
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
##### `process_message(self, message)`
|
|
|
|
|
```python
|
|
|
|
|
def process_message(self, message) -> Document or None
|
|
|
|
|
```
|
|
|
|
|
**Purpose**: Convert email to document.
|
|
|
|
|
|
|
|
|
|
**Parameters**:
|
|
|
|
|
- `message`: Email message object
|
|
|
|
|
|
|
|
|
|
**Returns**: Created document or None
|
|
|
|
|
|
|
|
|
|
**Process**:
|
|
|
|
|
1. Extract email metadata (from, to, subject, date)
|
|
|
|
|
2. Extract body text
|
|
|
|
|
3. Download attachments
|
|
|
|
|
4. Create document for email body
|
|
|
|
|
5. Create documents for attachments
|
|
|
|
|
6. Link documents together
|
|
|
|
|
7. Apply mail rules
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
##### `handle_attachments(self, message)`
|
|
|
|
|
```python
|
|
|
|
|
def handle_attachments(self, message) -> list
|
|
|
|
|
```
|
|
|
|
|
**Purpose**: Extract and process email attachments.
|
|
|
|
|
|
|
|
|
|
**Returns**: List of attachment file paths
|
|
|
|
|
|
|
|
|
|
**Supported**:
|
|
|
|
|
- PDF attachments
|
|
|
|
|
- Image attachments
|
|
|
|
|
- Office documents
|
|
|
|
|
- Archives (extracts)
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## 4. OCR & Parsing Functions
|
|
|
|
|
|
|
|
|
|
### 4.1 Tesseract Parser (`paperless_tesseract/parsers.py`)
|
|
|
|
|
|
|
|
|
|
#### Class: `RasterisedDocumentParser`
|
|
|
|
|
|
|
|
|
|
##### `parse(self, document_path, mime_type)`
|
|
|
|
|
```python
|
|
|
|
|
def parse(self, document_path, mime_type) -> dict
|
|
|
|
|
```
|
|
|
|
|
**Purpose**: OCR document using Tesseract.
|
|
|
|
|
|
|
|
|
|
**Parameters**:
|
|
|
|
|
- `document_path` (str): Path to document
|
|
|
|
|
- `mime_type` (str): MIME type
|
|
|
|
|
|
|
|
|
|
**Returns**: Parsed document data:
|
|
|
|
|
```python
|
|
|
|
|
{
|
|
|
|
|
'text': 'Extracted text content',
|
|
|
|
|
'metadata': {...},
|
|
|
|
|
'pages': 10,
|
|
|
|
|
'language': 'eng'
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
**Process**:
|
|
|
|
|
1. Convert to images (if PDF)
|
|
|
|
|
2. Preprocess images (deskew, denoise)
|
|
|
|
|
3. Detect language
|
|
|
|
|
4. Run Tesseract OCR
|
|
|
|
|
5. Post-process text (fix common errors)
|
|
|
|
|
6. Create searchable PDF
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
##### `construct_ocrmypdf_parameters(self)`
|
|
|
|
|
```python
|
|
|
|
|
def construct_ocrmypdf_parameters(self) -> list
|
|
|
|
|
```
|
|
|
|
|
**Purpose**: Build command-line arguments for OCRmyPDF.
|
|
|
|
|
|
|
|
|
|
**Returns**: List of arguments
|
|
|
|
|
|
|
|
|
|
**Configuration**:
|
|
|
|
|
- Language selection
|
|
|
|
|
- OCR mode (redo, skip, force)
|
|
|
|
|
- Image preprocessing
|
|
|
|
|
- PDF/A creation
|
|
|
|
|
- Optimization level
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
### 4.2 Tika Parser (`paperless_tika/parsers.py`)
|
|
|
|
|
|
|
|
|
|
#### Class: `TikaDocumentParser`
|
|
|
|
|
|
|
|
|
|
##### `parse(self, document_path, mime_type)`
|
|
|
|
|
```python
|
|
|
|
|
def parse(self, document_path, mime_type) -> dict
|
|
|
|
|
```
|
|
|
|
|
**Purpose**: Parse document using Apache Tika.
|
|
|
|
|
|
|
|
|
|
**Supported Formats**:
|
|
|
|
|
- Microsoft Office (doc, docx, xls, xlsx, ppt, pptx)
|
|
|
|
|
- LibreOffice (odt, ods, odp)
|
|
|
|
|
- Rich Text Format (rtf)
|
|
|
|
|
- Archives (zip, tar, rar)
|
|
|
|
|
- Images with metadata
|
|
|
|
|
|
|
|
|
|
**Returns**: Parsed content and metadata
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## 5. API & Serialization Functions
|
|
|
|
|
|
|
|
|
|
### 5.1 Document ViewSet (`documents/views.py`)
|
|
|
|
|
|
|
|
|
|
#### Class: `DocumentViewSet`
|
|
|
|
|
|
|
|
|
|
##### `list(self, request)`
|
|
|
|
|
```python
|
|
|
|
|
def list(self, request) -> Response
|
|
|
|
|
```
|
|
|
|
|
**Purpose**: List documents with filtering and pagination.
|
|
|
|
|
|
|
|
|
|
**Query Parameters**:
|
|
|
|
|
- `page`: Page number
|
|
|
|
|
- `page_size`: Results per page
|
|
|
|
|
- `ordering`: Sort field
|
|
|
|
|
- `correspondent__id`: Filter by correspondent
|
|
|
|
|
- `document_type__id`: Filter by type
|
|
|
|
|
- `tags__id__in`: Filter by tags
|
|
|
|
|
- `created__date__gt`: Filter by date
|
|
|
|
|
- `query`: Full-text search
|
|
|
|
|
|
|
|
|
|
**Response**:
|
|
|
|
|
```python
|
|
|
|
|
{
|
|
|
|
|
'count': 100,
|
|
|
|
|
'next': 'http://api/documents/?page=2',
|
|
|
|
|
'previous': null,
|
|
|
|
|
'results': [...]
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
##### `retrieve(self, request, pk=None)`
|
|
|
|
|
```python
|
|
|
|
|
def retrieve(self, request, pk=None) -> Response
|
|
|
|
|
```
|
|
|
|
|
**Purpose**: Get single document details.
|
|
|
|
|
|
|
|
|
|
**Parameters**:
|
|
|
|
|
- `pk`: Document ID
|
|
|
|
|
|
|
|
|
|
**Response**: Full document JSON with metadata
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
##### `download(self, request, pk=None)`
|
|
|
|
|
```python
|
|
|
|
|
@action(detail=True, methods=['get'])
|
|
|
|
|
def download(self, request, pk=None) -> FileResponse
|
|
|
|
|
```
|
|
|
|
|
**Purpose**: Download document file.
|
|
|
|
|
|
|
|
|
|
**Query Parameters**:
|
|
|
|
|
- `original`: Download original vs archive version
|
|
|
|
|
|
|
|
|
|
**Returns**: File download response
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
##### `preview(self, request, pk=None)`
|
|
|
|
|
```python
|
|
|
|
|
@action(detail=True, methods=['get'])
|
|
|
|
|
def preview(self, request, pk=None) -> FileResponse
|
|
|
|
|
```
|
|
|
|
|
**Purpose**: Generate document preview image.
|
|
|
|
|
|
|
|
|
|
**Returns**: PNG/JPEG image
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
##### `metadata(self, request, pk=None)`
|
|
|
|
|
```python
|
|
|
|
|
@action(detail=True, methods=['get'])
|
|
|
|
|
def metadata(self, request, pk=None) -> Response
|
|
|
|
|
```
|
|
|
|
|
**Purpose**: Get/update document metadata.
|
|
|
|
|
|
|
|
|
|
**GET Response**:
|
|
|
|
|
```python
|
|
|
|
|
{
|
|
|
|
|
'original_filename': 'invoice.pdf',
|
|
|
|
|
'media_filename': '0000042.pdf',
|
|
|
|
|
'created': '2023-01-15T10:30:00Z',
|
|
|
|
|
'modified': '2023-01-15T10:30:00Z',
|
|
|
|
|
'added': '2023-01-15T10:30:00Z',
|
|
|
|
|
'archive_checksum': 'sha256:abc123...',
|
|
|
|
|
'original_checksum': 'sha256:def456...',
|
|
|
|
|
'original_size': 245760,
|
|
|
|
|
'archive_size': 180000,
|
|
|
|
|
'original_mime_type': 'application/pdf'
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
##### `suggestions(self, request, pk=None)`
|
|
|
|
|
```python
|
|
|
|
|
@action(detail=True, methods=['get'])
|
|
|
|
|
def suggestions(self, request, pk=None) -> Response
|
|
|
|
|
```
|
|
|
|
|
**Purpose**: Get ML classification suggestions.
|
|
|
|
|
|
|
|
|
|
**Response**:
|
|
|
|
|
```python
|
|
|
|
|
{
|
|
|
|
|
'correspondents': [
|
|
|
|
|
{'id': 5, 'name': 'Acme Corp', 'confidence': 0.87},
|
|
|
|
|
{'id': 2, 'name': 'Beta Inc', 'confidence': 0.12}
|
|
|
|
|
],
|
|
|
|
|
'document_types': [...],
|
|
|
|
|
'tags': [...]
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
##### `bulk_edit(self, request)`
|
|
|
|
|
```python
|
|
|
|
|
@action(detail=False, methods=['post'])
|
|
|
|
|
def bulk_edit(self, request) -> Response
|
|
|
|
|
```
|
|
|
|
|
**Purpose**: Bulk update multiple documents.
|
|
|
|
|
|
|
|
|
|
**Request Body**:
|
|
|
|
|
```python
|
|
|
|
|
{
|
|
|
|
|
'documents': [1, 2, 3, 4, 5],
|
|
|
|
|
'method': 'set_correspondent',
|
|
|
|
|
'parameters': {'correspondent': 7}
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
**Methods**:
|
|
|
|
|
- `set_correspondent`
|
|
|
|
|
- `set_document_type`
|
|
|
|
|
- `set_storage_path`
|
|
|
|
|
- `add_tag` / `remove_tag`
|
|
|
|
|
- `modify_tags`
|
|
|
|
|
- `delete`
|
|
|
|
|
- `merge`
|
|
|
|
|
- `split`
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## 6. Frontend Services & Components
|
|
|
|
|
|
|
|
|
|
### 6.1 Document Service (`src-ui/src/app/services/rest/document.service.ts`)
|
|
|
|
|
|
|
|
|
|
#### Class: `DocumentService`
|
|
|
|
|
|
|
|
|
|
##### `listFiltered(page, pageSize, sortField, sortReverse, filterRules, extraParams?)`
|
|
|
|
|
```typescript
|
|
|
|
|
listFiltered(
|
|
|
|
|
page?: number,
|
|
|
|
|
pageSize?: number,
|
|
|
|
|
sortField?: string,
|
|
|
|
|
sortReverse?: boolean,
|
|
|
|
|
filterRules?: FilterRule[],
|
|
|
|
|
extraParams?: any
|
|
|
|
|
): Observable<PaginatedResults<Document>>
|
|
|
|
|
```
|
|
|
|
|
**Purpose**: Get filtered list of documents.
|
|
|
|
|
|
|
|
|
|
**Parameters**:
|
|
|
|
|
- `page`: Page number (1-indexed)
|
|
|
|
|
- `pageSize`: Results per page
|
|
|
|
|
- `sortField`: Field to sort by
|
|
|
|
|
- `sortReverse`: Reverse sort order
|
|
|
|
|
- `filterRules`: Array of filter rules
|
|
|
|
|
- `extraParams`: Additional query parameters
|
|
|
|
|
|
|
|
|
|
**Returns**: Observable of paginated results
|
|
|
|
|
|
|
|
|
|
**Example**:
|
|
|
|
|
```typescript
|
|
|
|
|
this.documentService.listFiltered(
|
|
|
|
|
1,
|
|
|
|
|
50,
|
|
|
|
|
'created',
|
|
|
|
|
true,
|
|
|
|
|
[
|
|
|
|
|
{rule_type: FILTER_CORRESPONDENT, value: '5'},
|
|
|
|
|
{rule_type: FILTER_HAS_TAGS_ALL, value: '1,3,7'}
|
|
|
|
|
]
|
|
|
|
|
).subscribe(results => {
|
|
|
|
|
this.documents = results.results;
|
|
|
|
|
});
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
##### `get(id: number)`
|
|
|
|
|
```typescript
|
|
|
|
|
get(id: number): Observable<Document>
|
|
|
|
|
```
|
|
|
|
|
**Purpose**: Get single document by ID.
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
##### `update(document: Document)`
|
|
|
|
|
```typescript
|
|
|
|
|
update(document: Document): Observable<Document>
|
|
|
|
|
```
|
|
|
|
|
**Purpose**: Update document metadata.
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
##### `upload(formData: FormData)`
|
|
|
|
|
```typescript
|
|
|
|
|
upload(formData: FormData): Observable<any>
|
|
|
|
|
```
|
|
|
|
|
**Purpose**: Upload new document.
|
|
|
|
|
|
|
|
|
|
**FormData fields**:
|
|
|
|
|
- `document`: File
|
|
|
|
|
- `title`: Optional title
|
|
|
|
|
- `correspondent`: Optional correspondent ID
|
|
|
|
|
- `document_type`: Optional type ID
|
|
|
|
|
- `tags`: Optional tag IDs
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
##### `download(id: number, original: boolean)`
|
|
|
|
|
```typescript
|
|
|
|
|
download(id: number, original: boolean = false): Observable<Blob>
|
|
|
|
|
```
|
|
|
|
|
**Purpose**: Download document file.
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
##### `getPreviewUrl(id: number)`
|
|
|
|
|
```typescript
|
|
|
|
|
getPreviewUrl(id: number): string
|
|
|
|
|
```
|
|
|
|
|
**Purpose**: Get URL for document preview.
|
|
|
|
|
|
|
|
|
|
**Returns**: URL string
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
##### `getThumbUrl(id: number)`
|
|
|
|
|
```typescript
|
|
|
|
|
getThumbUrl(id: number): string
|
|
|
|
|
```
|
|
|
|
|
**Purpose**: Get URL for document thumbnail.
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
##### `bulkEdit(documentIds: number[], method: string, parameters: any)`
|
|
|
|
|
```typescript
|
|
|
|
|
bulkEdit(
|
|
|
|
|
documentIds: number[],
|
|
|
|
|
method: string,
|
|
|
|
|
parameters: any
|
|
|
|
|
): Observable<any>
|
|
|
|
|
```
|
|
|
|
|
**Purpose**: Perform bulk operation on documents.
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
### 6.2 Search Service (`src-ui/src/app/services/search.service.ts`)
|
|
|
|
|
|
|
|
|
|
#### Class: `SearchService`
|
|
|
|
|
|
|
|
|
|
##### `search(query: string)`
|
|
|
|
|
```typescript
|
|
|
|
|
search(query: string): Observable<SearchResult[]>
|
|
|
|
|
```
|
|
|
|
|
**Purpose**: Perform full-text search.
|
|
|
|
|
|
|
|
|
|
**Query Syntax**:
|
|
|
|
|
- Simple: `invoice 2023`
|
|
|
|
|
- Phrase: `"exact phrase"`
|
|
|
|
|
- Boolean: `invoice AND 2023`
|
|
|
|
|
- Field: `title:invoice`
|
|
|
|
|
- Wildcard: `doc*`
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
##### `advancedSearch(query: SearchQuery)`
|
|
|
|
|
```typescript
|
|
|
|
|
advancedSearch(query: SearchQuery): Observable<SearchResult[]>
|
|
|
|
|
```
|
|
|
|
|
**Purpose**: Advanced search with multiple criteria.
|
|
|
|
|
|
|
|
|
|
**SearchQuery**:
|
|
|
|
|
```typescript
|
|
|
|
|
interface SearchQuery {
|
|
|
|
|
text?: string;
|
|
|
|
|
correspondent?: number;
|
|
|
|
|
documentType?: number;
|
|
|
|
|
tags?: number[];
|
|
|
|
|
dateFrom?: Date;
|
|
|
|
|
dateTo?: Date;
|
|
|
|
|
customFields?: {[key: string]: any};
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
### 6.3 Settings Service (`src-ui/src/app/services/settings.service.ts`)
|
|
|
|
|
|
|
|
|
|
#### Class: `SettingsService`
|
|
|
|
|
|
|
|
|
|
##### `getSettings()`
|
|
|
|
|
```typescript
|
|
|
|
|
getSettings(): Observable<PaperlessSettings>
|
|
|
|
|
```
|
|
|
|
|
**Purpose**: Get user/system settings.
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
##### `updateSettings(settings: PaperlessSettings)`
|
|
|
|
|
```typescript
|
|
|
|
|
updateSettings(settings: PaperlessSettings): Observable<PaperlessSettings>
|
|
|
|
|
```
|
|
|
|
|
**Purpose**: Update settings.
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## 7. Utility Functions
|
|
|
|
|
|
|
|
|
|
### 7.1 File Handling Utilities (`documents/file_handling.py`)
|
|
|
|
|
|
|
|
|
|
#### `generate_unique_filename(filename, suffix="")`
|
|
|
|
|
```python
|
|
|
|
|
def generate_unique_filename(filename, suffix="") -> str
|
|
|
|
|
```
|
|
|
|
|
**Purpose**: Generate unique filename to avoid collisions.
|
|
|
|
|
|
|
|
|
|
**Parameters**:
|
|
|
|
|
- `filename` (str): Base filename
|
|
|
|
|
- `suffix` (str): Optional suffix
|
|
|
|
|
|
|
|
|
|
**Returns**: Unique filename with timestamp
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
#### `create_source_path_directory(source_path)`
|
|
|
|
|
```python
|
|
|
|
|
def create_source_path_directory(source_path) -> None
|
|
|
|
|
```
|
|
|
|
|
**Purpose**: Create directory structure for document storage.
|
|
|
|
|
|
|
|
|
|
**Parameters**:
|
|
|
|
|
- `source_path` (str): Path template with variables
|
|
|
|
|
|
|
|
|
|
**Variables**:
|
|
|
|
|
- `{correspondent}`: Correspondent name
|
|
|
|
|
- `{document_type}`: Document type
|
|
|
|
|
- `{created}`: Creation date
|
|
|
|
|
- `{created_year}`: Year
|
|
|
|
|
- `{created_month}`: Month
|
|
|
|
|
- `{title}`: Document title
|
|
|
|
|
- `{asn}`: Archive serial number
|
|
|
|
|
|
|
|
|
|
**Example**:
|
|
|
|
|
```python
|
|
|
|
|
# Template: {correspondent}/{created_year}/{document_type}
|
|
|
|
|
# Result: Acme Corp/2023/Invoices/
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
#### `safe_rename(old_path, new_path)`
|
|
|
|
|
```python
|
|
|
|
|
def safe_rename(old_path, new_path) -> None
|
|
|
|
|
```
|
|
|
|
|
**Purpose**: Safely rename file with atomic operation.
|
|
|
|
|
|
|
|
|
|
**Ensures**: No data loss if operation fails
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
### 7.2 Data Utilities (`paperless/utils.py`)
|
|
|
|
|
|
|
|
|
|
#### `copy_basic_file_stats(src, dst)`
|
|
|
|
|
```python
|
|
|
|
|
def copy_basic_file_stats(src, dst) -> None
|
|
|
|
|
```
|
|
|
|
|
**Purpose**: Copy file metadata (timestamps, permissions).
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
#### `maybe_override_pixel_limit()`
|
|
|
|
|
```python
|
|
|
|
|
def maybe_override_pixel_limit() -> None
|
|
|
|
|
```
|
|
|
|
|
**Purpose**: Increase PIL image size limit for large documents.
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## 8. Database Models & Methods
|
|
|
|
|
|
|
|
|
|
### 8.1 Document Model (`documents/models.py`)
|
|
|
|
|
|
|
|
|
|
#### Class: `Document`
|
|
|
|
|
|
|
|
|
|
##### Model Fields:
|
|
|
|
|
```python
|
|
|
|
|
class Document(models.Model):
|
|
|
|
|
title = models.CharField(max_length=255)
|
|
|
|
|
content = models.TextField()
|
|
|
|
|
correspondent = models.ForeignKey(Correspondent, ...)
|
|
|
|
|
document_type = models.ForeignKey(DocumentType, ...)
|
|
|
|
|
tags = models.ManyToManyField(Tag, ...)
|
|
|
|
|
created = models.DateTimeField(...)
|
|
|
|
|
modified = models.DateTimeField(auto_now=True)
|
|
|
|
|
added = models.DateTimeField(auto_now_add=True)
|
|
|
|
|
storage_path = models.ForeignKey(StoragePath, ...)
|
|
|
|
|
archive_serial_number = models.IntegerField(...)
|
|
|
|
|
original_filename = models.CharField(max_length=1024)
|
|
|
|
|
checksum = models.CharField(max_length=64)
|
|
|
|
|
archive_checksum = models.CharField(max_length=64)
|
|
|
|
|
owner = models.ForeignKey(User, ...)
|
|
|
|
|
custom_fields = models.ManyToManyField(CustomField, ...)
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
##### `save(self, *args, **kwargs)`
|
|
|
|
|
```python
|
|
|
|
|
def save(self, *args, **kwargs) -> None
|
|
|
|
|
```
|
|
|
|
|
**Purpose**: Override save to add custom logic.
|
|
|
|
|
|
|
|
|
|
**Custom Logic**:
|
|
|
|
|
1. Generate archive serial number if not set
|
|
|
|
|
2. Update modification timestamp
|
|
|
|
|
3. Trigger signals
|
|
|
|
|
4. Update search index
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
##### `filename(self)`
|
|
|
|
|
```python
|
|
|
|
|
@property
|
|
|
|
|
def filename(self) -> str
|
|
|
|
|
```
|
|
|
|
|
**Purpose**: Get the document filename.
|
|
|
|
|
|
|
|
|
|
**Returns**: Formatted filename based on template
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
##### `source_path(self)`
|
|
|
|
|
```python
|
|
|
|
|
@property
|
|
|
|
|
def source_path(self) -> str
|
|
|
|
|
```
|
|
|
|
|
**Purpose**: Get full path to source file.
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
##### `archive_path(self)`
|
|
|
|
|
```python
|
|
|
|
|
@property
|
|
|
|
|
def archive_path(self) -> str
|
|
|
|
|
```
|
|
|
|
|
**Purpose**: Get full path to archive file.
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
##### `get_public_filename(self)`
|
|
|
|
|
```python
|
|
|
|
|
def get_public_filename(self) -> str
|
|
|
|
|
```
|
|
|
|
|
**Purpose**: Get sanitized filename for downloads.
|
|
|
|
|
|
|
|
|
|
**Returns**: Safe filename without path traversal characters
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
### 8.2 Correspondent Model
|
|
|
|
|
|
|
|
|
|
#### Class: `Correspondent`
|
|
|
|
|
|
|
|
|
|
```python
|
|
|
|
|
class Correspondent(models.Model):
|
|
|
|
|
name = models.CharField(max_length=255, unique=True)
|
|
|
|
|
match = models.CharField(max_length=255, blank=True)
|
|
|
|
|
matching_algorithm = models.IntegerField(choices=MATCH_CHOICES)
|
|
|
|
|
is_insensitive = models.BooleanField(default=True)
|
|
|
|
|
document_count = models.IntegerField(default=0)
|
|
|
|
|
last_correspondence = models.DateTimeField(null=True)
|
|
|
|
|
owner = models.ForeignKey(User, ...)
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
### 8.3 Workflow Model
|
|
|
|
|
|
|
|
|
|
#### Class: `Workflow`
|
|
|
|
|
|
|
|
|
|
```python
|
|
|
|
|
class Workflow(models.Model):
|
|
|
|
|
name = models.CharField(max_length=255)
|
|
|
|
|
enabled = models.BooleanField(default=True)
|
|
|
|
|
order = models.IntegerField(default=0)
|
|
|
|
|
triggers = models.ManyToManyField(WorkflowTrigger)
|
|
|
|
|
conditions = models.ManyToManyField(WorkflowCondition)
|
|
|
|
|
actions = models.ManyToManyField(WorkflowAction)
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## Summary
|
|
|
|
|
|
|
|
|
|
This guide provides comprehensive documentation for the major functions in IntelliDocs-ngx. For detailed API documentation, refer to:
|
|
|
|
|
|
|
|
|
|
- **Backend API**: `/api/schema/` (OpenAPI/Swagger)
|
|
|
|
|
- **Frontend Docs**: Generated via Compodoc
|
|
|
|
|
- **Database Schema**: Django migrations in `migrations/` directories
|
|
|
|
|
|
|
|
|
|
For implementation examples and testing, see the test files in each module's `tests/` directory.
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
*Last Updated: 2025-11-09*
|
|
|
|
|
*Version: 2.19.5*
|