mirror of https://github.com/paperless-ngx/paperless-ngx.git synced 2025-12-07 23:35:22 +01:00

fix(linting): corrige errores de formato y sintaxis detectados por pre-commit

- Elimina import duplicado de DeletionRequestViewSet en urls.py (F811)
- Aplica formato automático con ruff format a 12 archivos Python
- Agrega comas finales faltantes (COM812) en 74 ubicaciones
- Normaliza formato de dependencias en pyproject.toml
- Corrige ortografía en archivos de documentación (codespell)

Errores corregidos:
- src/paperless/urls.py: Import duplicado de DeletionRequestViewSet
- 74 violaciones de COM812 (comas finales faltantes)
- Formato inconsistente en múltiples archivos Python

Este commit asegura que el código pase el linting check de pre-commit
y resuelve los problemas de formato introducidos en el commit anterior.

Archivos Python reformateados: 12
Archivos de documentación corregidos: 35
Comas finales agregadas: 74

2025-11-17 19:17:49 +00:00

32 KiB

Raw Blame History

IntelliDocs-ngx Technical Functions Guide

Complete Function Reference

This document provides detailed documentation for all major functions in IntelliDocs-ngx.

Documents Module Functions
Paperless Core Functions
Mail Integration Functions
OCR & Parsing Functions
API & Serialization Functions
Frontend Services & Components
Utility Functions
Database Models & Methods

1. Documents Module Functions

1.1 Consumer Module (`documents/consumer.py`)

Class: `Consumer`

Main class responsible for consuming and processing documents.

`init(self)`

def __init__(self)

Purpose: Initialize the consumer with logging and configuration.

Parameters: None

Returns: Consumer instance

Usage:

consumer = Consumer()

`try_consume_file(self, path, override_filename=None, override_title=None, ...)`

def try_consume_file(
    self,
    path,
    override_filename=None,
    override_title=None,
    override_correspondent_id=None,
    override_document_type_id=None,
    override_tag_ids=None,
    override_created=None,
    override_asn=None,
    task_id=None,
    ...
)

Purpose: Entry point for consuming a document file.

Parameters:

path (str): Full path to the document file
override_filename (str, optional): Custom filename to use
override_title (str, optional): Custom document title
override_correspondent_id (int, optional): Force specific correspondent
override_document_type_id (int, optional): Force specific document type
override_tag_ids (list, optional): Force specific tags
override_created (datetime, optional): Override creation date
override_asn (int, optional): Archive serial number
task_id (str, optional): Celery task ID for progress tracking

Returns: Document ID (int) or raises exception

Raises:

ConsumerError: If document consumption fails
FileNotFoundError: If file doesn't exist

Process Flow:

Validate file exists and is readable
Determine file type
Select appropriate parser
Extract text via OCR/parsing
Apply classification rules
Extract metadata
Create thumbnails
Save to database
Trigger post-consumption workflows
Cleanup temporary files

Example:

doc_id = consumer.try_consume_file(
    path="/tmp/invoice.pdf",
    override_correspondent_id=5,
    override_tag_ids=[1, 3, 7]
)

`_consume(self, path, document, ...)`

def _consume(self, path, document, metadata_from_path)

Purpose: Internal method that performs the actual document consumption.

Parameters:

path (str): Path to document
document (Document): Document model instance
metadata_from_path (dict): Extracted metadata from filename

Returns: None (modifies document in place)

Process:

Parse document with selected parser
Extract text content
Store original file
Generate archive version
Create thumbnails
Index for search
Run classifier if enabled
Apply matching rules

`_write(self, document, path, original_filename, ...)`

def _write(self, document, path, original_filename, original_checksum, ...):

Purpose: Save document to database and filesystem.

Parameters:

document (Document): Document instance to save
path (str): Source file path
original_filename (str): Original filename
original_checksum (str): MD5/SHA256 checksum

Returns: None

Side Effects:

Saves document to database
Moves files to final locations
Creates backup entries
Triggers post-save signals

1.2 Classifier Module (`documents/classifier.py`)

Class: `DocumentClassifier`

Implements machine learning classification for automatic document categorization.

`init(self)`

def __init__(self)

Purpose: Initialize classifier with sklearn models.

Components:

vectorizer: TfidfVectorizer for text feature extraction
correspondent_classifier: LinearSVC for correspondent prediction
document_type_classifier: LinearSVC for document type prediction
tag_classifier: OneVsRestClassifier for multi-label tag prediction

`train(self)`

def train(self) -> bool

Purpose: Train classification models on existing documents.

Parameters: None

Returns:

True if training successful
False if insufficient data

Requirements:

Minimum 50 documents with correspondents for correspondent training
Minimum 50 documents with document types for type training
Minimum 50 documents with tags for tag training

Process:

Load all documents from database
Extract text features using TF-IDF
Train correspondent classifier
Train document type classifier
Train tag classifier (multi-label)
Save models to disk
Log accuracy metrics

Example:

classifier = DocumentClassifier()
success = classifier.train()
if success:
    print("Classifier trained successfully")

`classify_document(self, document)`

def classify_document(self, document) -> dict

Purpose: Predict classifications for a document.

Parameters:

document (Document): Document to classify

Returns: Dictionary with predictions:

{
    'correspondent': int or None,
    'document_type': int or None,
    'tags': list of int,
    'correspondent_confidence': float,
    'document_type_confidence': float,
    'tags_confidence': list of float
}

Example:

predictions = classifier.classify_document(my_document)
print(f"Suggested correspondent: {predictions['correspondent']}")
print(f"Confidence: {predictions['correspondent_confidence']}")

`calculate_best_correspondent(self, document)`

def calculate_best_correspondent(self, document) -> tuple

Purpose: Find the most likely correspondent for a document.

Parameters:

document (Document): Document to analyze

Returns: (correspondent_id, confidence_score)

Algorithm:

Check for matching rules (highest priority)
If no match, use ML classifier
Calculate confidence based on decision function
Return correspondent if confidence > threshold

`calculate_best_document_type(self, document)`

def calculate_best_document_type(self, document) -> tuple

Purpose: Determine the best document type classification.

Parameters:

document (Document): Document to classify

Returns: (document_type_id, confidence_score)

Similar to correspondent classification but for document types.

`calculate_best_tags(self, document)`

def calculate_best_tags(self, document) -> list

Purpose: Suggest relevant tags for a document.

Parameters:

document (Document): Document to tag

Returns: List of (tag_id, confidence_score) tuples

Multi-label Classification:

Can return multiple tags
Each tag has independent confidence score
Returns tags above confidence threshold

1.3 Index Module (`documents/index.py`)

Class: `DocumentIndex`

Manages full-text search indexing for documents.

`init(self, index_dir=None)`

def __init__(self, index_dir=None)

Purpose: Initialize search index.

Parameters:

index_dir (str, optional): Path to index directory

Components:

Uses Whoosh library for indexing
Creates schema with fields: id, title, content, correspondent, tags
Supports stemming and stop words

`add_or_update_document(self, document)`

def add_or_update_document(self, document) -> None

Purpose: Add or update a document in the search index.

Parameters:

document (Document): Document to index

Process:

Extract searchable text
Tokenize and stem words
Build search index entry
Update or insert into index
Commit changes

Example:

index = DocumentIndex()
index.add_or_update_document(my_document)

`remove_document(self, document_id)`

def remove_document(self, document_id) -> None

Purpose: Remove a document from search index.

Parameters:

document_id (int): ID of document to remove

`search(self, query_string, limit=50)`

def search(self, query_string, limit=50) -> list

Purpose: Perform full-text search.

Parameters:

query_string (str): Search query
limit (int): Maximum results to return

Returns: List of document IDs, ranked by relevance

Features:

Boolean operators (AND, OR, NOT)
Phrase search ("exact phrase")
Wildcard search (docu*)
Field-specific search (title:invoice)
Ranking by TF-IDF and BM25

Example:

results = index.search("invoice AND 2023")
documents = Document.objects.filter(id__in=results)

1.4 Matching Module (`documents/matching.py`)

Class: `Match`

Represents a matching rule for automatic classification.

Properties:

matching_algorithm: "any", "all", "literal", "regex", "fuzzy"
match: Pattern to match
is_insensitive: Case-insensitive matching

`matches(self, text)`

def matches(self, text) -> bool

Purpose: Check if text matches this rule.

Parameters:

text (str): Text to check

Returns: True if matches, False otherwise

Algorithms:

any: Match if any word in pattern is in text
all: Match if all words in pattern are in text
literal: Exact substring match
regex: Regular expression match
fuzzy: Fuzzy string matching (Levenshtein distance)

Function: `match_correspondents(document, classifier=None)`

def match_correspondents(document, classifier=None) -> int or None

Purpose: Find correspondent for document using rules and classifier.

Parameters:

document (Document): Document to match
classifier (DocumentClassifier, optional): ML classifier

Returns: Correspondent ID or None

Process:

Check manual assignment
Apply matching rules (in order of priority)
If no match, use ML classifier
Return correspondent if confidence sufficient

Function: `match_document_type(document, classifier=None)`

def match_document_type(document, classifier=None) -> int or None

Purpose: Find document type using rules and classifier.

Similar to correspondent matching.

Function: `match_tags(document, classifier=None)`

def match_tags(document, classifier=None) -> list

Purpose: Find matching tags using rules and classifier.

Returns: List of tag IDs

Multi-label: Can return multiple tags.

1.5 Barcode Module (`documents/barcodes.py`)

Function: `get_barcodes(path, pages=None)`

def get_barcodes(path, pages=None) -> list

Purpose: Extract barcodes from document.

Parameters:

path (str): Path to document
pages (list, optional): Specific pages to scan

Returns: List of barcode dictionaries:

[
    {
        'type': 'CODE128',
        'data': 'ABC123',
        'page': 1,
        'bbox': [x, y, w, h]
    },
    ...
]

Supported Formats:

CODE128, CODE39, QR Code, Data Matrix, EAN, UPC

Uses:

pyzbar library for barcode detection
OpenCV for image processing

Function: `barcode_reader(path)`

def barcode_reader(path) -> dict

Purpose: Read and interpret barcode data.

Returns: Parsed barcode information with metadata.

Function: `separate_pages(path, barcodes)`

def separate_pages(path, barcodes) -> list

Purpose: Split document based on separator barcodes.

Parameters:

path (str): Path to multi-page document
barcodes (list): Detected barcodes with page numbers

Returns: List of paths to separated documents

Use Case:

Batch scanning with separator sheets
Automatic document splitting

Example:

# Scan stack of documents with barcode separators
barcodes = get_barcodes("/tmp/batch.pdf")
documents = separate_pages("/tmp/batch.pdf", barcodes)
for doc_path in documents:
    consumer.try_consume_file(doc_path)

1.6 Bulk Edit Module (`documents/bulk_edit.py`)

Class: `BulkEditService`

Handles mass document operations efficiently.

`update_documents(self, document_ids, updates)`

def update_documents(self, document_ids, updates) -> dict

Purpose: Update multiple documents at once.

Parameters:

document_ids (list): List of document IDs
updates (dict): Fields to update

Returns: Result summary:

{
    'updated': 42,
    'failed': 0,
    'errors': []
}

Supported Updates:

correspondent
document_type
tags (add, remove, replace)
storage_path
custom fields
permissions

Optimizations:

Batched database operations
Minimal signal triggering
Deferred index updates

Example:

service = BulkEditService()
result = service.update_documents(
    document_ids=[1, 2, 3, 4, 5],
    updates={
        'document_type': 3,
        'tags_add': [7, 8],
        'tags_remove': [2]
    }
)

`merge_documents(self, document_ids, target_id=None)`

def merge_documents(self, document_ids, target_id=None) -> int

Purpose: Combine multiple documents into one.

Parameters:

document_ids (list): Documents to merge
target_id (int, optional): ID of target document

Returns: ID of merged document

Process:

Combine PDFs
Merge metadata (tags, etc.)
Preserve all original files
Update search index
Delete source documents (soft delete)

`split_document(self, document_id, split_pages)`

def split_document(self, document_id, split_pages) -> list

Purpose: Split a document into multiple documents.

Parameters:

document_id (int): Document to split
split_pages (list): Page ranges for each new document

Returns: List of new document IDs

Example:

# Split 10-page document into 3 documents
new_docs = service.split_document(
    document_id=42,
    split_pages=[
        [1, 2, 3],      # First 3 pages
        [4, 5, 6, 7],   # Middle 4 pages
        [8, 9, 10]      # Last 3 pages
    ]
)

1.7 Workflow Module (`documents/workflows/`)

Class: `WorkflowEngine`

Executes automated document workflows.

`execute_workflow(self, workflow, document, trigger_type)`

def execute_workflow(self, workflow, document, trigger_type) -> dict

Purpose: Run a workflow on a document.

Parameters:

workflow (Workflow): Workflow definition
document (Document): Target document
trigger_type (str): What triggered this workflow

Returns: Execution result:

{
    'success': True,
    'actions_executed': 5,
    'actions_failed': 0,
    'errors': []
}

Workflow Components:

Triggers:
- consumption
- manual
- scheduled
- webhook
Conditions:
- Document properties
- Content matching
- Date ranges
- Custom field values
Actions:
- Set correspondent
- Set document type
- Add/remove tags
- Set custom fields
- Execute webhook
- Send email
- Run script

Example Workflow:

workflow = {
    'name': 'Invoice Processing',
    'trigger': 'consumption',
    'conditions': [
        {'field': 'content', 'operator': 'contains', 'value': 'INVOICE'}
    ],
    'actions': [
        {'type': 'set_document_type', 'value': 2},
        {'type': 'add_tags', 'value': [5, 6]},
        {'type': 'webhook', 'url': 'https://api.example.com/invoice'}
    ]
}

2. Paperless Core Functions

2.1 Settings Module (`paperless/settings.py`)

Configuration Functions

`load_config_from_env()`

def load_config_from_env() -> dict

Purpose: Load configuration from environment variables.

Returns: Configuration dictionary

Environment Variables:

PAPERLESS_DBHOST: Database host
PAPERLESS_DBPORT: Database port
PAPERLESS_OCR_LANGUAGE: OCR languages
PAPERLESS_CONSUMER_POLLING: Polling interval
PAPERLESS_TASK_WORKERS: Number of workers
PAPERLESS_SECRET_KEY: Django secret key

`validate_settings(settings)`

def validate_settings(settings) -> list

Purpose: Validate configuration for errors.

Returns: List of validation errors

Checks:

Required settings present
Valid database configuration
OCR languages available
Storage paths exist
Secret key security

2.2 Celery Module (`paperless/celery.py`)

Task Configuration

`@app.task`

Decorator for creating Celery tasks.

Example:

@app.task(bind=True, max_retries=3)
def process_document(self, doc_id):
    try:
        document = Document.objects.get(id=doc_id)
        # Process document
    except Exception as exc:
        raise self.retry(exc=exc, countdown=60)

Periodic Tasks

@app.on_after_finalize.connect
def setup_periodic_tasks(sender, **kwargs):
    # Run sanity check daily at 3:30 AM
    sender.add_periodic_task(
        crontab(hour=3, minute=30),
        sanity_check.s(),
        name='daily-sanity-check'
    )

    # Train classifier weekly
    sender.add_periodic_task(
        crontab(day_of_week=0, hour=2, minute=0),
        train_classifier.s(),
        name='weekly-classifier-training'
    )

2.3 Authentication Module (`paperless/auth.py`)

Class: `PaperlessRemoteUserBackend`

Custom authentication backend.

`authenticate(self, request, remote_user=None)`

def authenticate(self, request, remote_user=None) -> User or None

Purpose: Authenticate user via HTTP header (SSO).

Parameters:

request: HTTP request
remote_user: Username from header

Returns: User instance or None

Supports:

HTTP_REMOTE_USER header
LDAP integration
OAuth2 providers
SAML

3. Mail Integration Functions

3.1 Mail Processing (`paperless_mail/mail.py`)

Class: `MailAccountHandler`

`get_messages(self, max_messages=100)`

def get_messages(self, max_messages=100) -> list

Purpose: Fetch emails from mail account.

Parameters:

max_messages (int): Maximum emails to fetch

Returns: List of email message objects

Protocols:

IMAP
IMAP with OAuth2 (Gmail, Outlook)

`process_message(self, message)`

def process_message(self, message) -> Document or None

Purpose: Convert email to document.

Parameters:

message: Email message object

Returns: Created document or None

Process:

Extract email metadata (from, to, subject, date)
Extract body text
Download attachments
Create document for email body
Create documents for attachments
Link documents together
Apply mail rules

`handle_attachments(self, message)`

def handle_attachments(self, message) -> list

Purpose: Extract and process email attachments.

Returns: List of attachment file paths

Supported:

PDF attachments
Image attachments
Office documents
Archives (extracts)

4. OCR & Parsing Functions

4.1 Tesseract Parser (`paperless_tesseract/parsers.py`)

Class: `RasterisedDocumentParser`

`parse(self, document_path, mime_type)`

def parse(self, document_path, mime_type) -> dict

Purpose: OCR document using Tesseract.

Parameters:

document_path (str): Path to document
mime_type (str): MIME type

Returns: Parsed document data:

{
    'text': 'Extracted text content',
    'metadata': {...},
    'pages': 10,
    'language': 'eng'
}

Process:

Convert to images (if PDF)
Preprocess images (deskew, denoise)
Detect language
Run Tesseract OCR
Post-process text (fix common errors)
Create searchable PDF

`construct_ocrmypdf_parameters(self)`

def construct_ocrmypdf_parameters(self) -> list

Purpose: Build command-line arguments for OCRmyPDF.

Returns: List of arguments

Configuration:

Language selection
OCR mode (redo, skip, force)
Image preprocessing
PDF/A creation
Optimization level

4.2 Tika Parser (`paperless_tika/parsers.py`)

Class: `TikaDocumentParser`

`parse(self, document_path, mime_type)`

def parse(self, document_path, mime_type) -> dict

Purpose: Parse document using Apache Tika.

Supported Formats:

Microsoft Office (doc, docx, xls, xlsx, ppt, pptx)
LibreOffice (odt, ods, odp)
Rich Text Format (rtf)
Archives (zip, tar, rar)
Images with metadata

Returns: Parsed content and metadata

5. API & Serialization Functions

5.1 Document ViewSet (`documents/views.py`)

Class: `DocumentViewSet`

`list(self, request)`

def list(self, request) -> Response

Purpose: List documents with filtering and pagination.

Query Parameters:

page: Page number
page_size: Results per page
ordering: Sort field
correspondent__id: Filter by correspondent
document_type__id: Filter by type
tags__id__in: Filter by tags
created__date__gt: Filter by date
query: Full-text search

Response:

{
    'count': 100,
    'next': 'http://api/documents/?page=2',
    'previous': null,
    'results': [...]
}

`retrieve(self, request, pk=None)`

def retrieve(self, request, pk=None) -> Response

Purpose: Get single document details.

Parameters:

pk: Document ID

Response: Full document JSON with metadata

`download(self, request, pk=None)`

@action(detail=True, methods=['get'])
def download(self, request, pk=None) -> FileResponse

Purpose: Download document file.

Query Parameters:

original: Download original vs archive version

Returns: File download response

`preview(self, request, pk=None)`

@action(detail=True, methods=['get'])
def preview(self, request, pk=None) -> FileResponse

Purpose: Generate document preview image.

Returns: PNG/JPEG image

`metadata(self, request, pk=None)`

@action(detail=True, methods=['get'])
def metadata(self, request, pk=None) -> Response

Purpose: Get/update document metadata.

GET Response:

{
    'original_filename': 'invoice.pdf',
    'media_filename': '0000042.pdf',
    'created': '2023-01-15T10:30:00Z',
    'modified': '2023-01-15T10:30:00Z',
    'added': '2023-01-15T10:30:00Z',
    'archive_checksum': 'sha256:abc123...',
    'original_checksum': 'sha256:def456...',
    'original_size': 245760,
    'archive_size': 180000,
    'original_mime_type': 'application/pdf'
}

`suggestions(self, request, pk=None)`

@action(detail=True, methods=['get'])
def suggestions(self, request, pk=None) -> Response

Purpose: Get ML classification suggestions.

Response:

{
    'correspondents': [
        {'id': 5, 'name': 'Acme Corp', 'confidence': 0.87},
        {'id': 2, 'name': 'Beta Inc', 'confidence': 0.12}
    ],
    'document_types': [...],
    'tags': [...]
}

`bulk_edit(self, request)`

@action(detail=False, methods=['post'])
def bulk_edit(self, request) -> Response

Purpose: Bulk update multiple documents.

Request Body:

{
    'documents': [1, 2, 3, 4, 5],
    'method': 'set_correspondent',
    'parameters': {'correspondent': 7}
}

Methods:

set_correspondent
set_document_type
set_storage_path
add_tag / remove_tag
modify_tags
delete
merge
split

6. Frontend Services & Components

6.1 Document Service (`src-ui/src/app/services/rest/document.service.ts`)

Class: `DocumentService`

`listFiltered(page, pageSize, sortField, sortReverse, filterRules, extraParams?)`

listFiltered(
  page?: number,
  pageSize?: number,
  sortField?: string,
  sortReverse?: boolean,
  filterRules?: FilterRule[],
  extraParams?: any
): Observable<PaginatedResults<Document>>

Purpose: Get filtered list of documents.

Parameters:

page: Page number (1-indexed)
pageSize: Results per page
sortField: Field to sort by
sortReverse: Reverse sort order
filterRules: Array of filter rules
extraParams: Additional query parameters

Returns: Observable of paginated results

Example:

this.documentService.listFiltered(
  1,
  50,
  'created',
  true,
  [
    {rule_type: FILTER_CORRESPONDENT, value: '5'},
    {rule_type: FILTER_HAS_TAGS_ALL, value: '1,3,7'}
  ]
).subscribe(results => {
  this.documents = results.results;
});

`get(id: number)`

get(id: number): Observable<Document>

Purpose: Get single document by ID.

`update(document: Document)`

update(document: Document): Observable<Document>

Purpose: Update document metadata.

`upload(formData: FormData)`

upload(formData: FormData): Observable<any>

Purpose: Upload new document.

FormData fields:

document: File
title: Optional title
correspondent: Optional correspondent ID
document_type: Optional type ID
tags: Optional tag IDs

`download(id: number, original: boolean)`

download(id: number, original: boolean = false): Observable<Blob>

Purpose: Download document file.

`getPreviewUrl(id: number)`

getPreviewUrl(id: number): string

Purpose: Get URL for document preview.

Returns: URL string

`getThumbUrl(id: number)`

getThumbUrl(id: number): string

Purpose: Get URL for document thumbnail.

`bulkEdit(documentIds: number[], method: string, parameters: any)`

bulkEdit(
  documentIds: number[],
  method: string,
  parameters: any
): Observable<any>

Purpose: Perform bulk operation on documents.

6.2 Search Service (`src-ui/src/app/services/search.service.ts`)

Class: `SearchService`

`search(query: string)`

search(query: string): Observable<SearchResult[]>

Purpose: Perform full-text search.

Query Syntax:

Simple: invoice 2023
Phrase: "exact phrase"
Boolean: invoice AND 2023
Field: title:invoice
Wildcard: doc*

`advancedSearch(query: SearchQuery)`

advancedSearch(query: SearchQuery): Observable<SearchResult[]>

Purpose: Advanced search with multiple criteria.

SearchQuery:

interface SearchQuery {
  text?: string;
  correspondent?: number;
  documentType?: number;
  tags?: number[];
  dateFrom?: Date;
  dateTo?: Date;
  customFields?: {[key: string]: any};
}

6.3 Settings Service (`src-ui/src/app/services/settings.service.ts`)

Class: `SettingsService`

`getSettings()`

getSettings(): Observable<PaperlessSettings>

Purpose: Get user/system settings.

`updateSettings(settings: PaperlessSettings)`

updateSettings(settings: PaperlessSettings): Observable<PaperlessSettings>

Purpose: Update settings.

7. Utility Functions

7.1 File Handling Utilities (`documents/file_handling.py`)

`generate_unique_filename(filename, suffix="")`

def generate_unique_filename(filename, suffix="") -> str

Purpose: Generate unique filename to avoid collisions.

Parameters:

filename (str): Base filename
suffix (str): Optional suffix

Returns: Unique filename with timestamp

`create_source_path_directory(source_path)`

def create_source_path_directory(source_path) -> None

Purpose: Create directory structure for document storage.

Parameters:

source_path (str): Path template with variables

Variables:

{correspondent}: Correspondent name
{document_type}: Document type
{created}: Creation date
{created_year}: Year
{created_month}: Month
{title}: Document title
{asn}: Archive serial number

Example:

# Template: {correspondent}/{created_year}/{document_type}
# Result: Acme Corp/2023/Invoices/

`safe_rename(old_path, new_path)`

def safe_rename(old_path, new_path) -> None

Purpose: Safely rename file with atomic operation.

Ensures: No data loss if operation fails

7.2 Data Utilities (`paperless/utils.py`)

`copy_basic_file_stats(src, dst)`

def copy_basic_file_stats(src, dst) -> None

Purpose: Copy file metadata (timestamps, permissions).

`maybe_override_pixel_limit()`

def maybe_override_pixel_limit() -> None

Purpose: Increase PIL image size limit for large documents.

8. Database Models & Methods

8.1 Document Model (`documents/models.py`)

Class: `Document`

Model Fields:

class Document(models.Model):
    title = models.CharField(max_length=255)
    content = models.TextField()
    correspondent = models.ForeignKey(Correspondent, ...)
    document_type = models.ForeignKey(DocumentType, ...)
    tags = models.ManyToManyField(Tag, ...)
    created = models.DateTimeField(...)
    modified = models.DateTimeField(auto_now=True)
    added = models.DateTimeField(auto_now_add=True)
    storage_path = models.ForeignKey(StoragePath, ...)
    archive_serial_number = models.IntegerField(...)
    original_filename = models.CharField(max_length=1024)
    checksum = models.CharField(max_length=64)
    archive_checksum = models.CharField(max_length=64)
    owner = models.ForeignKey(User, ...)
    custom_fields = models.ManyToManyField(CustomField, ...)

`save(self, *args, **kwargs)`

def save(self, *args, **kwargs) -> None

Purpose: Override save to add custom logic.

Custom Logic:

Generate archive serial number if not set
Update modification timestamp
Trigger signals
Update search index

`filename(self)`

@property
def filename(self) -> str

Purpose: Get the document filename.

Returns: Formatted filename based on template

`source_path(self)`

@property
def source_path(self) -> str

Purpose: Get full path to source file.

`archive_path(self)`

@property
def archive_path(self) -> str

Purpose: Get full path to archive file.

`get_public_filename(self)`

def get_public_filename(self) -> str

Purpose: Get sanitized filename for downloads.

Returns: Safe filename without path traversal characters

8.2 Correspondent Model

Class: `Correspondent`

class Correspondent(models.Model):
    name = models.CharField(max_length=255, unique=True)
    match = models.CharField(max_length=255, blank=True)
    matching_algorithm = models.IntegerField(choices=MATCH_CHOICES)
    is_insensitive = models.BooleanField(default=True)
    document_count = models.IntegerField(default=0)
    last_correspondence = models.DateTimeField(null=True)
    owner = models.ForeignKey(User, ...)

8.3 Workflow Model

Class: `Workflow`

class Workflow(models.Model):
    name = models.CharField(max_length=255)
    enabled = models.BooleanField(default=True)
    order = models.IntegerField(default=0)
    triggers = models.ManyToManyField(WorkflowTrigger)
    conditions = models.ManyToManyField(WorkflowCondition)
    actions = models.ManyToManyField(WorkflowAction)

Summary

This guide provides comprehensive documentation for the major functions in IntelliDocs-ngx. For detailed API documentation, refer to:

Backend API: /api/schema/ (OpenAPI/Swagger)
Frontend Docs: Generated via Compodoc
Database Schema: Django migrations in migrations/ directories

For implementation examples and testing, see the test files in each module's tests/ directory.

Last Updated: 2025-11-09 Version: 2.19.5

32 KiB Raw Blame History

IntelliDocs-ngx Technical Functions Guide

Complete Function Reference

Table of Contents

1. Documents Module Functions

1.1 Consumer Module (documents/consumer.py)

Class: Consumer

__init__(self)

try_consume_file(self, path, override_filename=None, override_title=None, ...)

_consume(self, path, document, ...)

_write(self, document, path, original_filename, ...)

1.2 Classifier Module (documents/classifier.py)

Class: DocumentClassifier

__init__(self)

train(self)

classify_document(self, document)

calculate_best_correspondent(self, document)

calculate_best_document_type(self, document)

calculate_best_tags(self, document)

1.3 Index Module (documents/index.py)

Class: DocumentIndex

__init__(self, index_dir=None)

add_or_update_document(self, document)

remove_document(self, document_id)

search(self, query_string, limit=50)

1.4 Matching Module (documents/matching.py)

Class: Match

Properties:

matches(self, text)

Function: match_correspondents(document, classifier=None)

Function: match_document_type(document, classifier=None)

Function: match_tags(document, classifier=None)

1.5 Barcode Module (documents/barcodes.py)

Function: get_barcodes(path, pages=None)

Function: barcode_reader(path)

Function: separate_pages(path, barcodes)

1.6 Bulk Edit Module (documents/bulk_edit.py)

Class: BulkEditService

update_documents(self, document_ids, updates)

merge_documents(self, document_ids, target_id=None)

split_document(self, document_id, split_pages)

1.7 Workflow Module (documents/workflows/)

Class: WorkflowEngine

execute_workflow(self, workflow, document, trigger_type)

2. Paperless Core Functions

2.1 Settings Module (paperless/settings.py)

Configuration Functions

load_config_from_env()

validate_settings(settings)

2.2 Celery Module (paperless/celery.py)

Task Configuration

@app.task

Periodic Tasks

2.3 Authentication Module (paperless/auth.py)

Class: PaperlessRemoteUserBackend

authenticate(self, request, remote_user=None)

3. Mail Integration Functions

3.1 Mail Processing (paperless_mail/mail.py)

Class: MailAccountHandler

get_messages(self, max_messages=100)

process_message(self, message)

handle_attachments(self, message)

4. OCR & Parsing Functions

4.1 Tesseract Parser (paperless_tesseract/parsers.py)

Class: RasterisedDocumentParser

parse(self, document_path, mime_type)

construct_ocrmypdf_parameters(self)

4.2 Tika Parser (paperless_tika/parsers.py)

Class: TikaDocumentParser

parse(self, document_path, mime_type)

5. API & Serialization Functions

5.1 Document ViewSet (documents/views.py)

Class: DocumentViewSet

list(self, request)

retrieve(self, request, pk=None)

download(self, request, pk=None)

preview(self, request, pk=None)

metadata(self, request, pk=None)

suggestions(self, request, pk=None)

bulk_edit(self, request)

32 KiB

Raw Blame History

1.1 Consumer Module (`documents/consumer.py`)

Class: `Consumer`

`init(self)`

`try_consume_file(self, path, override_filename=None, override_title=None, ...)`

`_consume(self, path, document, ...)`

`_write(self, document, path, original_filename, ...)`

1.2 Classifier Module (`documents/classifier.py`)

Class: `DocumentClassifier`

`init(self)`

`train(self)`

`classify_document(self, document)`

`calculate_best_correspondent(self, document)`

`calculate_best_document_type(self, document)`

`calculate_best_tags(self, document)`

1.3 Index Module (`documents/index.py`)

Class: `DocumentIndex`

`init(self, index_dir=None)`

`add_or_update_document(self, document)`

`remove_document(self, document_id)`

`search(self, query_string, limit=50)`

1.4 Matching Module (`documents/matching.py`)

Class: `Match`

`matches(self, text)`

Function: `match_correspondents(document, classifier=None)`

Function: `match_document_type(document, classifier=None)`

Function: `match_tags(document, classifier=None)`

1.5 Barcode Module (`documents/barcodes.py`)

Function: `get_barcodes(path, pages=None)`

Function: `barcode_reader(path)`

Function: `separate_pages(path, barcodes)`

1.6 Bulk Edit Module (`documents/bulk_edit.py`)

Class: `BulkEditService`

`update_documents(self, document_ids, updates)`

`merge_documents(self, document_ids, target_id=None)`

`split_document(self, document_id, split_pages)`

1.7 Workflow Module (`documents/workflows/`)

Class: `WorkflowEngine`

`execute_workflow(self, workflow, document, trigger_type)`

2.1 Settings Module (`paperless/settings.py`)

`load_config_from_env()`

`validate_settings(settings)`

2.2 Celery Module (`paperless/celery.py`)

`@app.task`

2.3 Authentication Module (`paperless/auth.py`)

Class: `PaperlessRemoteUserBackend`

`authenticate(self, request, remote_user=None)`

3.1 Mail Processing (`paperless_mail/mail.py`)

Class: `MailAccountHandler`

`get_messages(self, max_messages=100)`

`process_message(self, message)`

`handle_attachments(self, message)`

4.1 Tesseract Parser (`paperless_tesseract/parsers.py`)

Class: `RasterisedDocumentParser`

`parse(self, document_path, mime_type)`

`construct_ocrmypdf_parameters(self)`

4.2 Tika Parser (`paperless_tika/parsers.py`)

Class: `TikaDocumentParser`

`parse(self, document_path, mime_type)`

5.1 Document ViewSet (`documents/views.py`)

Class: `DocumentViewSet`

`list(self, request)`

`retrieve(self, request, pk=None)`

`download(self, request, pk=None)`

`preview(self, request, pk=None)`

`metadata(self, request, pk=None)`

`suggestions(self, request, pk=None)`

`bulk_edit(self, request)`

6.1 Document Service (`src-ui/src/app/services/rest/document.service.ts`)

Class: `DocumentService`

`listFiltered(page, pageSize, sortField, sortReverse, filterRules, extraParams?)`

`get(id: number)`

`update(document: Document)`

`upload(formData: FormData)`

`download(id: number, original: boolean)`

`getPreviewUrl(id: number)`

`getThumbUrl(id: number)`

`bulkEdit(documentIds: number[], method: string, parameters: any)`

6.2 Search Service (`src-ui/src/app/services/search.service.ts`)

Class: `SearchService`