paperless-ngx/TECHNICAL_FUNCTIONS_GUIDE.md
Claude e7b426caf1
fix(linting): corrige errores de formato y sintaxis detectados por pre-commit
- Elimina import duplicado de DeletionRequestViewSet en urls.py (F811)
- Aplica formato automático con ruff format a 12 archivos Python
- Agrega comas finales faltantes (COM812) en 74 ubicaciones
- Normaliza formato de dependencias en pyproject.toml
- Corrige ortografía en archivos de documentación (codespell)

Errores corregidos:
- src/paperless/urls.py: Import duplicado de DeletionRequestViewSet
- 74 violaciones de COM812 (comas finales faltantes)
- Formato inconsistente en múltiples archivos Python

Este commit asegura que el código pase el linting check de pre-commit
y resuelve los problemas de formato introducidos en el commit anterior.

Archivos Python reformateados: 12
Archivos de documentación corregidos: 35
Comas finales agregadas: 74
2025-11-17 19:17:49 +00:00

32 KiB

IntelliDocs-ngx Technical Functions Guide

Complete Function Reference

This document provides detailed documentation for all major functions in IntelliDocs-ngx.


Table of Contents

  1. Documents Module Functions
  2. Paperless Core Functions
  3. Mail Integration Functions
  4. OCR & Parsing Functions
  5. API & Serialization Functions
  6. Frontend Services & Components
  7. Utility Functions
  8. Database Models & Methods

1. Documents Module Functions

1.1 Consumer Module (documents/consumer.py)

Class: Consumer

Main class responsible for consuming and processing documents.

__init__(self)
def __init__(self)

Purpose: Initialize the consumer with logging and configuration.

Parameters: None

Returns: Consumer instance

Usage:

consumer = Consumer()

try_consume_file(self, path, override_filename=None, override_title=None, ...)
def try_consume_file(
    self,
    path,
    override_filename=None,
    override_title=None,
    override_correspondent_id=None,
    override_document_type_id=None,
    override_tag_ids=None,
    override_created=None,
    override_asn=None,
    task_id=None,
    ...
)

Purpose: Entry point for consuming a document file.

Parameters:

  • path (str): Full path to the document file
  • override_filename (str, optional): Custom filename to use
  • override_title (str, optional): Custom document title
  • override_correspondent_id (int, optional): Force specific correspondent
  • override_document_type_id (int, optional): Force specific document type
  • override_tag_ids (list, optional): Force specific tags
  • override_created (datetime, optional): Override creation date
  • override_asn (int, optional): Archive serial number
  • task_id (str, optional): Celery task ID for progress tracking

Returns: Document ID (int) or raises exception

Raises:

  • ConsumerError: If document consumption fails
  • FileNotFoundError: If file doesn't exist

Process Flow:

  1. Validate file exists and is readable
  2. Determine file type
  3. Select appropriate parser
  4. Extract text via OCR/parsing
  5. Apply classification rules
  6. Extract metadata
  7. Create thumbnails
  8. Save to database
  9. Trigger post-consumption workflows
  10. Cleanup temporary files

Example:

doc_id = consumer.try_consume_file(
    path="/tmp/invoice.pdf",
    override_correspondent_id=5,
    override_tag_ids=[1, 3, 7]
)

_consume(self, path, document, ...)
def _consume(self, path, document, metadata_from_path)

Purpose: Internal method that performs the actual document consumption.

Parameters:

  • path (str): Path to document
  • document (Document): Document model instance
  • metadata_from_path (dict): Extracted metadata from filename

Returns: None (modifies document in place)

Process:

  1. Parse document with selected parser
  2. Extract text content
  3. Store original file
  4. Generate archive version
  5. Create thumbnails
  6. Index for search
  7. Run classifier if enabled
  8. Apply matching rules

_write(self, document, path, original_filename, ...)
def _write(self, document, path, original_filename, original_checksum, ...):

Purpose: Save document to database and filesystem.

Parameters:

  • document (Document): Document instance to save
  • path (str): Source file path
  • original_filename (str): Original filename
  • original_checksum (str): MD5/SHA256 checksum

Returns: None

Side Effects:

  • Saves document to database
  • Moves files to final locations
  • Creates backup entries
  • Triggers post-save signals

1.2 Classifier Module (documents/classifier.py)

Class: DocumentClassifier

Implements machine learning classification for automatic document categorization.

__init__(self)
def __init__(self)

Purpose: Initialize classifier with sklearn models.

Components:

  • vectorizer: TfidfVectorizer for text feature extraction
  • correspondent_classifier: LinearSVC for correspondent prediction
  • document_type_classifier: LinearSVC for document type prediction
  • tag_classifier: OneVsRestClassifier for multi-label tag prediction

train(self)
def train(self) -> bool

Purpose: Train classification models on existing documents.

Parameters: None

Returns:

  • True if training successful
  • False if insufficient data

Requirements:

  • Minimum 50 documents with correspondents for correspondent training
  • Minimum 50 documents with document types for type training
  • Minimum 50 documents with tags for tag training

Process:

  1. Load all documents from database
  2. Extract text features using TF-IDF
  3. Train correspondent classifier
  4. Train document type classifier
  5. Train tag classifier (multi-label)
  6. Save models to disk
  7. Log accuracy metrics

Example:

classifier = DocumentClassifier()
success = classifier.train()
if success:
    print("Classifier trained successfully")

classify_document(self, document)
def classify_document(self, document) -> dict

Purpose: Predict classifications for a document.

Parameters:

  • document (Document): Document to classify

Returns: Dictionary with predictions:

{
    'correspondent': int or None,
    'document_type': int or None,
    'tags': list of int,
    'correspondent_confidence': float,
    'document_type_confidence': float,
    'tags_confidence': list of float
}

Example:

predictions = classifier.classify_document(my_document)
print(f"Suggested correspondent: {predictions['correspondent']}")
print(f"Confidence: {predictions['correspondent_confidence']}")

calculate_best_correspondent(self, document)
def calculate_best_correspondent(self, document) -> tuple

Purpose: Find the most likely correspondent for a document.

Parameters:

  • document (Document): Document to analyze

Returns: (correspondent_id, confidence_score)

Algorithm:

  1. Check for matching rules (highest priority)
  2. If no match, use ML classifier
  3. Calculate confidence based on decision function
  4. Return correspondent if confidence > threshold

calculate_best_document_type(self, document)
def calculate_best_document_type(self, document) -> tuple

Purpose: Determine the best document type classification.

Parameters:

  • document (Document): Document to classify

Returns: (document_type_id, confidence_score)

Similar to correspondent classification but for document types.


calculate_best_tags(self, document)
def calculate_best_tags(self, document) -> list

Purpose: Suggest relevant tags for a document.

Parameters:

  • document (Document): Document to tag

Returns: List of (tag_id, confidence_score) tuples

Multi-label Classification:

  • Can return multiple tags
  • Each tag has independent confidence score
  • Returns tags above confidence threshold

1.3 Index Module (documents/index.py)

Class: DocumentIndex

Manages full-text search indexing for documents.

__init__(self, index_dir=None)
def __init__(self, index_dir=None)

Purpose: Initialize search index.

Parameters:

  • index_dir (str, optional): Path to index directory

Components:

  • Uses Whoosh library for indexing
  • Creates schema with fields: id, title, content, correspondent, tags
  • Supports stemming and stop words

add_or_update_document(self, document)
def add_or_update_document(self, document) -> None

Purpose: Add or update a document in the search index.

Parameters:

  • document (Document): Document to index

Process:

  1. Extract searchable text
  2. Tokenize and stem words
  3. Build search index entry
  4. Update or insert into index
  5. Commit changes

Example:

index = DocumentIndex()
index.add_or_update_document(my_document)

remove_document(self, document_id)
def remove_document(self, document_id) -> None

Purpose: Remove a document from search index.

Parameters:

  • document_id (int): ID of document to remove

search(self, query_string, limit=50)
def search(self, query_string, limit=50) -> list

Purpose: Perform full-text search.

Parameters:

  • query_string (str): Search query
  • limit (int): Maximum results to return

Returns: List of document IDs, ranked by relevance

Features:

  • Boolean operators (AND, OR, NOT)
  • Phrase search ("exact phrase")
  • Wildcard search (docu*)
  • Field-specific search (title:invoice)
  • Ranking by TF-IDF and BM25

Example:

results = index.search("invoice AND 2023")
documents = Document.objects.filter(id__in=results)

1.4 Matching Module (documents/matching.py)

Class: Match

Represents a matching rule for automatic classification.

Properties:
  • matching_algorithm: "any", "all", "literal", "regex", "fuzzy"
  • match: Pattern to match
  • is_insensitive: Case-insensitive matching
matches(self, text)
def matches(self, text) -> bool

Purpose: Check if text matches this rule.

Parameters:

  • text (str): Text to check

Returns: True if matches, False otherwise

Algorithms:

  • any: Match if any word in pattern is in text
  • all: Match if all words in pattern are in text
  • literal: Exact substring match
  • regex: Regular expression match
  • fuzzy: Fuzzy string matching (Levenshtein distance)

Function: match_correspondents(document, classifier=None)

def match_correspondents(document, classifier=None) -> int or None

Purpose: Find correspondent for document using rules and classifier.

Parameters:

  • document (Document): Document to match
  • classifier (DocumentClassifier, optional): ML classifier

Returns: Correspondent ID or None

Process:

  1. Check manual assignment
  2. Apply matching rules (in order of priority)
  3. If no match, use ML classifier
  4. Return correspondent if confidence sufficient

Function: match_document_type(document, classifier=None)

def match_document_type(document, classifier=None) -> int or None

Purpose: Find document type using rules and classifier.

Similar to correspondent matching.


Function: match_tags(document, classifier=None)

def match_tags(document, classifier=None) -> list

Purpose: Find matching tags using rules and classifier.

Returns: List of tag IDs

Multi-label: Can return multiple tags.


1.5 Barcode Module (documents/barcodes.py)

Function: get_barcodes(path, pages=None)

def get_barcodes(path, pages=None) -> list

Purpose: Extract barcodes from document.

Parameters:

  • path (str): Path to document
  • pages (list, optional): Specific pages to scan

Returns: List of barcode dictionaries:

[
    {
        'type': 'CODE128',
        'data': 'ABC123',
        'page': 1,
        'bbox': [x, y, w, h]
    },
    ...
]

Supported Formats:

  • CODE128, CODE39, QR Code, Data Matrix, EAN, UPC

Uses:

  • pyzbar library for barcode detection
  • OpenCV for image processing

Function: barcode_reader(path)

def barcode_reader(path) -> dict

Purpose: Read and interpret barcode data.

Returns: Parsed barcode information with metadata.


Function: separate_pages(path, barcodes)

def separate_pages(path, barcodes) -> list

Purpose: Split document based on separator barcodes.

Parameters:

  • path (str): Path to multi-page document
  • barcodes (list): Detected barcodes with page numbers

Returns: List of paths to separated documents

Use Case:

  • Batch scanning with separator sheets
  • Automatic document splitting

Example:

# Scan stack of documents with barcode separators
barcodes = get_barcodes("/tmp/batch.pdf")
documents = separate_pages("/tmp/batch.pdf", barcodes)
for doc_path in documents:
    consumer.try_consume_file(doc_path)

1.6 Bulk Edit Module (documents/bulk_edit.py)

Class: BulkEditService

Handles mass document operations efficiently.

update_documents(self, document_ids, updates)
def update_documents(self, document_ids, updates) -> dict

Purpose: Update multiple documents at once.

Parameters:

  • document_ids (list): List of document IDs
  • updates (dict): Fields to update

Returns: Result summary:

{
    'updated': 42,
    'failed': 0,
    'errors': []
}

Supported Updates:

  • correspondent
  • document_type
  • tags (add, remove, replace)
  • storage_path
  • custom fields
  • permissions

Optimizations:

  • Batched database operations
  • Minimal signal triggering
  • Deferred index updates

Example:

service = BulkEditService()
result = service.update_documents(
    document_ids=[1, 2, 3, 4, 5],
    updates={
        'document_type': 3,
        'tags_add': [7, 8],
        'tags_remove': [2]
    }
)

merge_documents(self, document_ids, target_id=None)
def merge_documents(self, document_ids, target_id=None) -> int

Purpose: Combine multiple documents into one.

Parameters:

  • document_ids (list): Documents to merge
  • target_id (int, optional): ID of target document

Returns: ID of merged document

Process:

  1. Combine PDFs
  2. Merge metadata (tags, etc.)
  3. Preserve all original files
  4. Update search index
  5. Delete source documents (soft delete)

split_document(self, document_id, split_pages)
def split_document(self, document_id, split_pages) -> list

Purpose: Split a document into multiple documents.

Parameters:

  • document_id (int): Document to split
  • split_pages (list): Page ranges for each new document

Returns: List of new document IDs

Example:

# Split 10-page document into 3 documents
new_docs = service.split_document(
    document_id=42,
    split_pages=[
        [1, 2, 3],      # First 3 pages
        [4, 5, 6, 7],   # Middle 4 pages
        [8, 9, 10]      # Last 3 pages
    ]
)

1.7 Workflow Module (documents/workflows/)

Class: WorkflowEngine

Executes automated document workflows.

execute_workflow(self, workflow, document, trigger_type)
def execute_workflow(self, workflow, document, trigger_type) -> dict

Purpose: Run a workflow on a document.

Parameters:

  • workflow (Workflow): Workflow definition
  • document (Document): Target document
  • trigger_type (str): What triggered this workflow

Returns: Execution result:

{
    'success': True,
    'actions_executed': 5,
    'actions_failed': 0,
    'errors': []
}

Workflow Components:

  1. Triggers:

    • consumption
    • manual
    • scheduled
    • webhook
  2. Conditions:

    • Document properties
    • Content matching
    • Date ranges
    • Custom field values
  3. Actions:

    • Set correspondent
    • Set document type
    • Add/remove tags
    • Set custom fields
    • Execute webhook
    • Send email
    • Run script

Example Workflow:

workflow = {
    'name': 'Invoice Processing',
    'trigger': 'consumption',
    'conditions': [
        {'field': 'content', 'operator': 'contains', 'value': 'INVOICE'}
    ],
    'actions': [
        {'type': 'set_document_type', 'value': 2},
        {'type': 'add_tags', 'value': [5, 6]},
        {'type': 'webhook', 'url': 'https://api.example.com/invoice'}
    ]
}

2. Paperless Core Functions

2.1 Settings Module (paperless/settings.py)

Configuration Functions

load_config_from_env()
def load_config_from_env() -> dict

Purpose: Load configuration from environment variables.

Returns: Configuration dictionary

Environment Variables:

  • PAPERLESS_DBHOST: Database host
  • PAPERLESS_DBPORT: Database port
  • PAPERLESS_OCR_LANGUAGE: OCR languages
  • PAPERLESS_CONSUMER_POLLING: Polling interval
  • PAPERLESS_TASK_WORKERS: Number of workers
  • PAPERLESS_SECRET_KEY: Django secret key

validate_settings(settings)
def validate_settings(settings) -> list

Purpose: Validate configuration for errors.

Returns: List of validation errors

Checks:

  • Required settings present
  • Valid database configuration
  • OCR languages available
  • Storage paths exist
  • Secret key security

2.2 Celery Module (paperless/celery.py)

Task Configuration

@app.task

Decorator for creating Celery tasks.

Example:

@app.task(bind=True, max_retries=3)
def process_document(self, doc_id):
    try:
        document = Document.objects.get(id=doc_id)
        # Process document
    except Exception as exc:
        raise self.retry(exc=exc, countdown=60)

Periodic Tasks
@app.on_after_finalize.connect
def setup_periodic_tasks(sender, **kwargs):
    # Run sanity check daily at 3:30 AM
    sender.add_periodic_task(
        crontab(hour=3, minute=30),
        sanity_check.s(),
        name='daily-sanity-check'
    )

    # Train classifier weekly
    sender.add_periodic_task(
        crontab(day_of_week=0, hour=2, minute=0),
        train_classifier.s(),
        name='weekly-classifier-training'
    )

2.3 Authentication Module (paperless/auth.py)

Class: PaperlessRemoteUserBackend

Custom authentication backend.

authenticate(self, request, remote_user=None)
def authenticate(self, request, remote_user=None) -> User or None

Purpose: Authenticate user via HTTP header (SSO).

Parameters:

  • request: HTTP request
  • remote_user: Username from header

Returns: User instance or None

Supports:

  • HTTP_REMOTE_USER header
  • LDAP integration
  • OAuth2 providers
  • SAML

3. Mail Integration Functions

3.1 Mail Processing (paperless_mail/mail.py)

Class: MailAccountHandler

get_messages(self, max_messages=100)
def get_messages(self, max_messages=100) -> list

Purpose: Fetch emails from mail account.

Parameters:

  • max_messages (int): Maximum emails to fetch

Returns: List of email message objects

Protocols:

  • IMAP
  • IMAP with OAuth2 (Gmail, Outlook)

process_message(self, message)
def process_message(self, message) -> Document or None

Purpose: Convert email to document.

Parameters:

  • message: Email message object

Returns: Created document or None

Process:

  1. Extract email metadata (from, to, subject, date)
  2. Extract body text
  3. Download attachments
  4. Create document for email body
  5. Create documents for attachments
  6. Link documents together
  7. Apply mail rules

handle_attachments(self, message)
def handle_attachments(self, message) -> list

Purpose: Extract and process email attachments.

Returns: List of attachment file paths

Supported:

  • PDF attachments
  • Image attachments
  • Office documents
  • Archives (extracts)

4. OCR & Parsing Functions

4.1 Tesseract Parser (paperless_tesseract/parsers.py)

Class: RasterisedDocumentParser

parse(self, document_path, mime_type)
def parse(self, document_path, mime_type) -> dict

Purpose: OCR document using Tesseract.

Parameters:

  • document_path (str): Path to document
  • mime_type (str): MIME type

Returns: Parsed document data:

{
    'text': 'Extracted text content',
    'metadata': {...},
    'pages': 10,
    'language': 'eng'
}

Process:

  1. Convert to images (if PDF)
  2. Preprocess images (deskew, denoise)
  3. Detect language
  4. Run Tesseract OCR
  5. Post-process text (fix common errors)
  6. Create searchable PDF

construct_ocrmypdf_parameters(self)
def construct_ocrmypdf_parameters(self) -> list

Purpose: Build command-line arguments for OCRmyPDF.

Returns: List of arguments

Configuration:

  • Language selection
  • OCR mode (redo, skip, force)
  • Image preprocessing
  • PDF/A creation
  • Optimization level

4.2 Tika Parser (paperless_tika/parsers.py)

Class: TikaDocumentParser

parse(self, document_path, mime_type)
def parse(self, document_path, mime_type) -> dict

Purpose: Parse document using Apache Tika.

Supported Formats:

  • Microsoft Office (doc, docx, xls, xlsx, ppt, pptx)
  • LibreOffice (odt, ods, odp)
  • Rich Text Format (rtf)
  • Archives (zip, tar, rar)
  • Images with metadata

Returns: Parsed content and metadata


5. API & Serialization Functions

5.1 Document ViewSet (documents/views.py)

Class: DocumentViewSet

list(self, request)
def list(self, request) -> Response

Purpose: List documents with filtering and pagination.

Query Parameters:

  • page: Page number
  • page_size: Results per page
  • ordering: Sort field
  • correspondent__id: Filter by correspondent
  • document_type__id: Filter by type
  • tags__id__in: Filter by tags
  • created__date__gt: Filter by date
  • query: Full-text search

Response:

{
    'count': 100,
    'next': 'http://api/documents/?page=2',
    'previous': null,
    'results': [...]
}

retrieve(self, request, pk=None)
def retrieve(self, request, pk=None) -> Response

Purpose: Get single document details.

Parameters:

  • pk: Document ID

Response: Full document JSON with metadata


download(self, request, pk=None)
@action(detail=True, methods=['get'])
def download(self, request, pk=None) -> FileResponse

Purpose: Download document file.

Query Parameters:

  • original: Download original vs archive version

Returns: File download response


preview(self, request, pk=None)
@action(detail=True, methods=['get'])
def preview(self, request, pk=None) -> FileResponse

Purpose: Generate document preview image.

Returns: PNG/JPEG image


metadata(self, request, pk=None)
@action(detail=True, methods=['get'])
def metadata(self, request, pk=None) -> Response

Purpose: Get/update document metadata.

GET Response:

{
    'original_filename': 'invoice.pdf',
    'media_filename': '0000042.pdf',
    'created': '2023-01-15T10:30:00Z',
    'modified': '2023-01-15T10:30:00Z',
    'added': '2023-01-15T10:30:00Z',
    'archive_checksum': 'sha256:abc123...',
    'original_checksum': 'sha256:def456...',
    'original_size': 245760,
    'archive_size': 180000,
    'original_mime_type': 'application/pdf'
}

suggestions(self, request, pk=None)
@action(detail=True, methods=['get'])
def suggestions(self, request, pk=None) -> Response

Purpose: Get ML classification suggestions.

Response:

{
    'correspondents': [
        {'id': 5, 'name': 'Acme Corp', 'confidence': 0.87},
        {'id': 2, 'name': 'Beta Inc', 'confidence': 0.12}
    ],
    'document_types': [...],
    'tags': [...]
}

bulk_edit(self, request)
@action(detail=False, methods=['post'])
def bulk_edit(self, request) -> Response

Purpose: Bulk update multiple documents.

Request Body:

{
    'documents': [1, 2, 3, 4, 5],
    'method': 'set_correspondent',
    'parameters': {'correspondent': 7}
}

Methods:

  • set_correspondent
  • set_document_type
  • set_storage_path
  • add_tag / remove_tag
  • modify_tags
  • delete
  • merge
  • split

6. Frontend Services & Components

6.1 Document Service (src-ui/src/app/services/rest/document.service.ts)

Class: DocumentService

listFiltered(page, pageSize, sortField, sortReverse, filterRules, extraParams?)
listFiltered(
  page?: number,
  pageSize?: number,
  sortField?: string,
  sortReverse?: boolean,
  filterRules?: FilterRule[],
  extraParams?: any
): Observable<PaginatedResults<Document>>

Purpose: Get filtered list of documents.

Parameters:

  • page: Page number (1-indexed)
  • pageSize: Results per page
  • sortField: Field to sort by
  • sortReverse: Reverse sort order
  • filterRules: Array of filter rules
  • extraParams: Additional query parameters

Returns: Observable of paginated results

Example:

this.documentService.listFiltered(
  1,
  50,
  'created',
  true,
  [
    {rule_type: FILTER_CORRESPONDENT, value: '5'},
    {rule_type: FILTER_HAS_TAGS_ALL, value: '1,3,7'}
  ]
).subscribe(results => {
  this.documents = results.results;
});

get(id: number)
get(id: number): Observable<Document>

Purpose: Get single document by ID.


update(document: Document)
update(document: Document): Observable<Document>

Purpose: Update document metadata.


upload(formData: FormData)
upload(formData: FormData): Observable<any>

Purpose: Upload new document.

FormData fields:

  • document: File
  • title: Optional title
  • correspondent: Optional correspondent ID
  • document_type: Optional type ID
  • tags: Optional tag IDs

download(id: number, original: boolean)
download(id: number, original: boolean = false): Observable<Blob>

Purpose: Download document file.


getPreviewUrl(id: number)
getPreviewUrl(id: number): string

Purpose: Get URL for document preview.

Returns: URL string


getThumbUrl(id: number)
getThumbUrl(id: number): string

Purpose: Get URL for document thumbnail.


bulkEdit(documentIds: number[], method: string, parameters: any)
bulkEdit(
  documentIds: number[],
  method: string,
  parameters: any
): Observable<any>

Purpose: Perform bulk operation on documents.


6.2 Search Service (src-ui/src/app/services/search.service.ts)

Class: SearchService

search(query: string)
search(query: string): Observable<SearchResult[]>

Purpose: Perform full-text search.

Query Syntax:

  • Simple: invoice 2023
  • Phrase: "exact phrase"
  • Boolean: invoice AND 2023
  • Field: title:invoice
  • Wildcard: doc*

advancedSearch(query: SearchQuery)
advancedSearch(query: SearchQuery): Observable<SearchResult[]>

Purpose: Advanced search with multiple criteria.

SearchQuery:

interface SearchQuery {
  text?: string;
  correspondent?: number;
  documentType?: number;
  tags?: number[];
  dateFrom?: Date;
  dateTo?: Date;
  customFields?: {[key: string]: any};
}

6.3 Settings Service (src-ui/src/app/services/settings.service.ts)

Class: SettingsService

getSettings()
getSettings(): Observable<PaperlessSettings>

Purpose: Get user/system settings.


updateSettings(settings: PaperlessSettings)
updateSettings(settings: PaperlessSettings): Observable<PaperlessSettings>

Purpose: Update settings.


7. Utility Functions

7.1 File Handling Utilities (documents/file_handling.py)

generate_unique_filename(filename, suffix="")

def generate_unique_filename(filename, suffix="") -> str

Purpose: Generate unique filename to avoid collisions.

Parameters:

  • filename (str): Base filename
  • suffix (str): Optional suffix

Returns: Unique filename with timestamp


create_source_path_directory(source_path)

def create_source_path_directory(source_path) -> None

Purpose: Create directory structure for document storage.

Parameters:

  • source_path (str): Path template with variables

Variables:

  • {correspondent}: Correspondent name
  • {document_type}: Document type
  • {created}: Creation date
  • {created_year}: Year
  • {created_month}: Month
  • {title}: Document title
  • {asn}: Archive serial number

Example:

# Template: {correspondent}/{created_year}/{document_type}
# Result: Acme Corp/2023/Invoices/

safe_rename(old_path, new_path)

def safe_rename(old_path, new_path) -> None

Purpose: Safely rename file with atomic operation.

Ensures: No data loss if operation fails


7.2 Data Utilities (paperless/utils.py)

copy_basic_file_stats(src, dst)

def copy_basic_file_stats(src, dst) -> None

Purpose: Copy file metadata (timestamps, permissions).


maybe_override_pixel_limit()

def maybe_override_pixel_limit() -> None

Purpose: Increase PIL image size limit for large documents.


8. Database Models & Methods

8.1 Document Model (documents/models.py)

Class: Document

Model Fields:
class Document(models.Model):
    title = models.CharField(max_length=255)
    content = models.TextField()
    correspondent = models.ForeignKey(Correspondent, ...)
    document_type = models.ForeignKey(DocumentType, ...)
    tags = models.ManyToManyField(Tag, ...)
    created = models.DateTimeField(...)
    modified = models.DateTimeField(auto_now=True)
    added = models.DateTimeField(auto_now_add=True)
    storage_path = models.ForeignKey(StoragePath, ...)
    archive_serial_number = models.IntegerField(...)
    original_filename = models.CharField(max_length=1024)
    checksum = models.CharField(max_length=64)
    archive_checksum = models.CharField(max_length=64)
    owner = models.ForeignKey(User, ...)
    custom_fields = models.ManyToManyField(CustomField, ...)

save(self, *args, **kwargs)
def save(self, *args, **kwargs) -> None

Purpose: Override save to add custom logic.

Custom Logic:

  1. Generate archive serial number if not set
  2. Update modification timestamp
  3. Trigger signals
  4. Update search index

filename(self)
@property
def filename(self) -> str

Purpose: Get the document filename.

Returns: Formatted filename based on template


source_path(self)
@property
def source_path(self) -> str

Purpose: Get full path to source file.


archive_path(self)
@property
def archive_path(self) -> str

Purpose: Get full path to archive file.


get_public_filename(self)
def get_public_filename(self) -> str

Purpose: Get sanitized filename for downloads.

Returns: Safe filename without path traversal characters


8.2 Correspondent Model

Class: Correspondent

class Correspondent(models.Model):
    name = models.CharField(max_length=255, unique=True)
    match = models.CharField(max_length=255, blank=True)
    matching_algorithm = models.IntegerField(choices=MATCH_CHOICES)
    is_insensitive = models.BooleanField(default=True)
    document_count = models.IntegerField(default=0)
    last_correspondence = models.DateTimeField(null=True)
    owner = models.ForeignKey(User, ...)

8.3 Workflow Model

Class: Workflow

class Workflow(models.Model):
    name = models.CharField(max_length=255)
    enabled = models.BooleanField(default=True)
    order = models.IntegerField(default=0)
    triggers = models.ManyToManyField(WorkflowTrigger)
    conditions = models.ManyToManyField(WorkflowCondition)
    actions = models.ManyToManyField(WorkflowAction)

Summary

This guide provides comprehensive documentation for the major functions in IntelliDocs-ngx. For detailed API documentation, refer to:

  • Backend API: /api/schema/ (OpenAPI/Swagger)
  • Frontend Docs: Generated via Compodoc
  • Database Schema: Django migrations in migrations/ directories

For implementation examples and testing, see the test files in each module's tests/ directory.


Last Updated: 2025-11-09 Version: 2.19.5