# IntelliDocs-ngx Technical Functions Guide ## Complete Function Reference This document provides detailed documentation for all major functions in IntelliDocs-ngx. --- ## Table of Contents 1. [Documents Module Functions](#1-documents-module-functions) 2. [Paperless Core Functions](#2-paperless-core-functions) 3. [Mail Integration Functions](#3-mail-integration-functions) 4. [OCR & Parsing Functions](#4-ocr--parsing-functions) 5. [API & Serialization Functions](#5-api--serialization-functions) 6. [Frontend Services & Components](#6-frontend-services--components) 7. [Utility Functions](#7-utility-functions) 8. [Database Models & Methods](#8-database-models--methods) --- ## 1. Documents Module Functions ### 1.1 Consumer Module (`documents/consumer.py`) #### Class: `Consumer` Main class responsible for consuming and processing documents. ##### `__init__(self)` ```python def __init__(self) ``` **Purpose**: Initialize the consumer with logging and configuration. **Parameters**: None **Returns**: Consumer instance **Usage**: ```python consumer = Consumer() ``` --- ##### `try_consume_file(self, path, override_filename=None, override_title=None, ...)` ```python def try_consume_file( self, path, override_filename=None, override_title=None, override_correspondent_id=None, override_document_type_id=None, override_tag_ids=None, override_created=None, override_asn=None, task_id=None, ... ) ``` **Purpose**: Entry point for consuming a document file. **Parameters**: - `path` (str): Full path to the document file - `override_filename` (str, optional): Custom filename to use - `override_title` (str, optional): Custom document title - `override_correspondent_id` (int, optional): Force specific correspondent - `override_document_type_id` (int, optional): Force specific document type - `override_tag_ids` (list, optional): Force specific tags - `override_created` (datetime, optional): Override creation date - `override_asn` (int, optional): Archive serial number - `task_id` (str, optional): Celery task ID for progress tracking **Returns**: Document ID (int) or raises exception **Raises**: - `ConsumerError`: If document consumption fails - `FileNotFoundError`: If file doesn't exist **Process Flow**: 1. Validate file exists and is readable 2. Determine file type 3. Select appropriate parser 4. Extract text via OCR/parsing 5. Apply classification rules 6. Extract metadata 7. Create thumbnails 8. Save to database 9. Trigger post-consumption workflows 10. Cleanup temporary files **Example**: ```python doc_id = consumer.try_consume_file( path="/tmp/invoice.pdf", override_correspondent_id=5, override_tag_ids=[1, 3, 7] ) ``` --- ##### `_consume(self, path, document, ...)` ```python def _consume(self, path, document, metadata_from_path) ``` **Purpose**: Internal method that performs the actual document consumption. **Parameters**: - `path` (str): Path to document - `document` (Document): Document model instance - `metadata_from_path` (dict): Extracted metadata from filename **Returns**: None (modifies document in place) **Process**: 1. Parse document with selected parser 2. Extract text content 3. Store original file 4. Generate archive version 5. Create thumbnails 6. Index for search 7. Run classifier if enabled 8. Apply matching rules --- ##### `_write(self, document, path, original_filename, ...)` ```python def _write(self, document, path, original_filename, original_checksum, ...): ``` **Purpose**: Save document to database and filesystem. **Parameters**: - `document` (Document): Document instance to save - `path` (str): Source file path - `original_filename` (str): Original filename - `original_checksum` (str): MD5/SHA256 checksum **Returns**: None **Side Effects**: - Saves document to database - Moves files to final locations - Creates backup entries - Triggers post-save signals --- ### 1.2 Classifier Module (`documents/classifier.py`) #### Class: `DocumentClassifier` Implements machine learning classification for automatic document categorization. ##### `__init__(self)` ```python def __init__(self) ``` **Purpose**: Initialize classifier with sklearn models. **Components**: - `vectorizer`: TfidfVectorizer for text feature extraction - `correspondent_classifier`: LinearSVC for correspondent prediction - `document_type_classifier`: LinearSVC for document type prediction - `tag_classifier`: OneVsRestClassifier for multi-label tag prediction --- ##### `train(self)` ```python def train(self) -> bool ``` **Purpose**: Train classification models on existing documents. **Parameters**: None **Returns**: - `True` if training successful - `False` if insufficient data **Requirements**: - Minimum 50 documents with correspondents for correspondent training - Minimum 50 documents with document types for type training - Minimum 50 documents with tags for tag training **Process**: 1. Load all documents from database 2. Extract text features using TF-IDF 3. Train correspondent classifier 4. Train document type classifier 5. Train tag classifier (multi-label) 6. Save models to disk 7. Log accuracy metrics **Example**: ```python classifier = DocumentClassifier() success = classifier.train() if success: print("Classifier trained successfully") ``` --- ##### `classify_document(self, document)` ```python def classify_document(self, document) -> dict ``` **Purpose**: Predict classifications for a document. **Parameters**: - `document` (Document): Document to classify **Returns**: Dictionary with predictions: ```python { 'correspondent': int or None, 'document_type': int or None, 'tags': list of int, 'correspondent_confidence': float, 'document_type_confidence': float, 'tags_confidence': list of float } ``` **Example**: ```python predictions = classifier.classify_document(my_document) print(f"Suggested correspondent: {predictions['correspondent']}") print(f"Confidence: {predictions['correspondent_confidence']}") ``` --- ##### `calculate_best_correspondent(self, document)` ```python def calculate_best_correspondent(self, document) -> tuple ``` **Purpose**: Find the most likely correspondent for a document. **Parameters**: - `document` (Document): Document to analyze **Returns**: `(correspondent_id, confidence_score)` **Algorithm**: 1. Check for matching rules (highest priority) 2. If no match, use ML classifier 3. Calculate confidence based on decision function 4. Return correspondent if confidence > threshold --- ##### `calculate_best_document_type(self, document)` ```python def calculate_best_document_type(self, document) -> tuple ``` **Purpose**: Determine the best document type classification. **Parameters**: - `document` (Document): Document to classify **Returns**: `(document_type_id, confidence_score)` **Similar to correspondent classification but for document types.** --- ##### `calculate_best_tags(self, document)` ```python def calculate_best_tags(self, document) -> list ``` **Purpose**: Suggest relevant tags for a document. **Parameters**: - `document` (Document): Document to tag **Returns**: List of `(tag_id, confidence_score)` tuples **Multi-label Classification**: - Can return multiple tags - Each tag has independent confidence score - Returns tags above confidence threshold --- ### 1.3 Index Module (`documents/index.py`) #### Class: `DocumentIndex` Manages full-text search indexing for documents. ##### `__init__(self, index_dir=None)` ```python def __init__(self, index_dir=None) ``` **Purpose**: Initialize search index. **Parameters**: - `index_dir` (str, optional): Path to index directory **Components**: - Uses Whoosh library for indexing - Creates schema with fields: id, title, content, correspondent, tags - Supports stemming and stop words --- ##### `add_or_update_document(self, document)` ```python def add_or_update_document(self, document) -> None ``` **Purpose**: Add or update a document in the search index. **Parameters**: - `document` (Document): Document to index **Process**: 1. Extract searchable text 2. Tokenize and stem words 3. Build search index entry 4. Update or insert into index 5. Commit changes **Example**: ```python index = DocumentIndex() index.add_or_update_document(my_document) ``` --- ##### `remove_document(self, document_id)` ```python def remove_document(self, document_id) -> None ``` **Purpose**: Remove a document from search index. **Parameters**: - `document_id` (int): ID of document to remove --- ##### `search(self, query_string, limit=50)` ```python def search(self, query_string, limit=50) -> list ``` **Purpose**: Perform full-text search. **Parameters**: - `query_string` (str): Search query - `limit` (int): Maximum results to return **Returns**: List of document IDs, ranked by relevance **Features**: - Boolean operators (AND, OR, NOT) - Phrase search ("exact phrase") - Wildcard search (docu*) - Field-specific search (title:invoice) - Ranking by TF-IDF and BM25 **Example**: ```python results = index.search("invoice AND 2023") documents = Document.objects.filter(id__in=results) ``` --- ### 1.4 Matching Module (`documents/matching.py`) #### Class: `Match` Represents a matching rule for automatic classification. ##### Properties: - `matching_algorithm`: "any", "all", "literal", "regex", "fuzzy" - `match`: Pattern to match - `is_insensitive`: Case-insensitive matching ##### `matches(self, text)` ```python def matches(self, text) -> bool ``` **Purpose**: Check if text matches this rule. **Parameters**: - `text` (str): Text to check **Returns**: True if matches, False otherwise **Algorithms**: - **any**: Match if any word in pattern is in text - **all**: Match if all words in pattern are in text - **literal**: Exact substring match - **regex**: Regular expression match - **fuzzy**: Fuzzy string matching (Levenshtein distance) --- #### Function: `match_correspondents(document, classifier=None)` ```python def match_correspondents(document, classifier=None) -> int or None ``` **Purpose**: Find correspondent for document using rules and classifier. **Parameters**: - `document` (Document): Document to match - `classifier` (DocumentClassifier, optional): ML classifier **Returns**: Correspondent ID or None **Process**: 1. Check manual assignment 2. Apply matching rules (in order of priority) 3. If no match, use ML classifier 4. Return correspondent if confidence sufficient --- #### Function: `match_document_type(document, classifier=None)` ```python def match_document_type(document, classifier=None) -> int or None ``` **Purpose**: Find document type using rules and classifier. **Similar to correspondent matching.** --- #### Function: `match_tags(document, classifier=None)` ```python def match_tags(document, classifier=None) -> list ``` **Purpose**: Find matching tags using rules and classifier. **Returns**: List of tag IDs **Multi-label**: Can return multiple tags. --- ### 1.5 Barcode Module (`documents/barcodes.py`) #### Function: `get_barcodes(path, pages=None)` ```python def get_barcodes(path, pages=None) -> list ``` **Purpose**: Extract barcodes from document. **Parameters**: - `path` (str): Path to document - `pages` (list, optional): Specific pages to scan **Returns**: List of barcode dictionaries: ```python [ { 'type': 'CODE128', 'data': 'ABC123', 'page': 1, 'bbox': [x, y, w, h] }, ... ] ``` **Supported Formats**: - CODE128, CODE39, QR Code, Data Matrix, EAN, UPC **Uses**: - pyzbar library for barcode detection - OpenCV for image processing --- #### Function: `barcode_reader(path)` ```python def barcode_reader(path) -> dict ``` **Purpose**: Read and interpret barcode data. **Returns**: Parsed barcode information with metadata. --- #### Function: `separate_pages(path, barcodes)` ```python def separate_pages(path, barcodes) -> list ``` **Purpose**: Split document based on separator barcodes. **Parameters**: - `path` (str): Path to multi-page document - `barcodes` (list): Detected barcodes with page numbers **Returns**: List of paths to separated documents **Use Case**: - Batch scanning with separator sheets - Automatic document splitting **Example**: ```python # Scan stack of documents with barcode separators barcodes = get_barcodes("/tmp/batch.pdf") documents = separate_pages("/tmp/batch.pdf", barcodes) for doc_path in documents: consumer.try_consume_file(doc_path) ``` --- ### 1.6 Bulk Edit Module (`documents/bulk_edit.py`) #### Class: `BulkEditService` Handles mass document operations efficiently. ##### `update_documents(self, document_ids, updates)` ```python def update_documents(self, document_ids, updates) -> dict ``` **Purpose**: Update multiple documents at once. **Parameters**: - `document_ids` (list): List of document IDs - `updates` (dict): Fields to update **Returns**: Result summary: ```python { 'updated': 42, 'failed': 0, 'errors': [] } ``` **Supported Updates**: - correspondent - document_type - tags (add, remove, replace) - storage_path - custom fields - permissions **Optimizations**: - Batched database operations - Minimal signal triggering - Deferred index updates **Example**: ```python service = BulkEditService() result = service.update_documents( document_ids=[1, 2, 3, 4, 5], updates={ 'document_type': 3, 'tags_add': [7, 8], 'tags_remove': [2] } ) ``` --- ##### `merge_documents(self, document_ids, target_id=None)` ```python def merge_documents(self, document_ids, target_id=None) -> int ``` **Purpose**: Combine multiple documents into one. **Parameters**: - `document_ids` (list): Documents to merge - `target_id` (int, optional): ID of target document **Returns**: ID of merged document **Process**: 1. Combine PDFs 2. Merge metadata (tags, etc.) 3. Preserve all original files 4. Update search index 5. Delete source documents (soft delete) --- ##### `split_document(self, document_id, split_pages)` ```python def split_document(self, document_id, split_pages) -> list ``` **Purpose**: Split a document into multiple documents. **Parameters**: - `document_id` (int): Document to split - `split_pages` (list): Page ranges for each new document **Returns**: List of new document IDs **Example**: ```python # Split 10-page document into 3 documents new_docs = service.split_document( document_id=42, split_pages=[ [1, 2, 3], # First 3 pages [4, 5, 6, 7], # Middle 4 pages [8, 9, 10] # Last 3 pages ] ) ``` --- ### 1.7 Workflow Module (`documents/workflows/`) #### Class: `WorkflowEngine` Executes automated document workflows. ##### `execute_workflow(self, workflow, document, trigger_type)` ```python def execute_workflow(self, workflow, document, trigger_type) -> dict ``` **Purpose**: Run a workflow on a document. **Parameters**: - `workflow` (Workflow): Workflow definition - `document` (Document): Target document - `trigger_type` (str): What triggered this workflow **Returns**: Execution result: ```python { 'success': True, 'actions_executed': 5, 'actions_failed': 0, 'errors': [] } ``` **Workflow Components**: 1. **Triggers**: - consumption - manual - scheduled - webhook 2. **Conditions**: - Document properties - Content matching - Date ranges - Custom field values 3. **Actions**: - Set correspondent - Set document type - Add/remove tags - Set custom fields - Execute webhook - Send email - Run script **Example Workflow**: ```python workflow = { 'name': 'Invoice Processing', 'trigger': 'consumption', 'conditions': [ {'field': 'content', 'operator': 'contains', 'value': 'INVOICE'} ], 'actions': [ {'type': 'set_document_type', 'value': 2}, {'type': 'add_tags', 'value': [5, 6]}, {'type': 'webhook', 'url': 'https://api.example.com/invoice'} ] } ``` --- ## 2. Paperless Core Functions ### 2.1 Settings Module (`paperless/settings.py`) #### Configuration Functions ##### `load_config_from_env()` ```python def load_config_from_env() -> dict ``` **Purpose**: Load configuration from environment variables. **Returns**: Configuration dictionary **Environment Variables**: - `PAPERLESS_DBHOST`: Database host - `PAPERLESS_DBPORT`: Database port - `PAPERLESS_OCR_LANGUAGE`: OCR languages - `PAPERLESS_CONSUMER_POLLING`: Polling interval - `PAPERLESS_TASK_WORKERS`: Number of workers - `PAPERLESS_SECRET_KEY`: Django secret key --- ##### `validate_settings(settings)` ```python def validate_settings(settings) -> list ``` **Purpose**: Validate configuration for errors. **Returns**: List of validation errors **Checks**: - Required settings present - Valid database configuration - OCR languages available - Storage paths exist - Secret key security --- ### 2.2 Celery Module (`paperless/celery.py`) #### Task Configuration ##### `@app.task` Decorator for creating Celery tasks. **Example**: ```python @app.task(bind=True, max_retries=3) def process_document(self, doc_id): try: document = Document.objects.get(id=doc_id) # Process document except Exception as exc: raise self.retry(exc=exc, countdown=60) ``` --- ##### Periodic Tasks ```python @app.on_after_finalize.connect def setup_periodic_tasks(sender, **kwargs): # Run sanity check daily at 3:30 AM sender.add_periodic_task( crontab(hour=3, minute=30), sanity_check.s(), name='daily-sanity-check' ) # Train classifier weekly sender.add_periodic_task( crontab(day_of_week=0, hour=2, minute=0), train_classifier.s(), name='weekly-classifier-training' ) ``` --- ### 2.3 Authentication Module (`paperless/auth.py`) #### Class: `PaperlessRemoteUserBackend` Custom authentication backend. ##### `authenticate(self, request, remote_user=None)` ```python def authenticate(self, request, remote_user=None) -> User or None ``` **Purpose**: Authenticate user via HTTP header (SSO). **Parameters**: - `request`: HTTP request - `remote_user`: Username from header **Returns**: User instance or None **Supports**: - HTTP_REMOTE_USER header - LDAP integration - OAuth2 providers - SAML --- ## 3. Mail Integration Functions ### 3.1 Mail Processing (`paperless_mail/mail.py`) #### Class: `MailAccountHandler` ##### `get_messages(self, max_messages=100)` ```python def get_messages(self, max_messages=100) -> list ``` **Purpose**: Fetch emails from mail account. **Parameters**: - `max_messages` (int): Maximum emails to fetch **Returns**: List of email message objects **Protocols**: - IMAP - IMAP with OAuth2 (Gmail, Outlook) --- ##### `process_message(self, message)` ```python def process_message(self, message) -> Document or None ``` **Purpose**: Convert email to document. **Parameters**: - `message`: Email message object **Returns**: Created document or None **Process**: 1. Extract email metadata (from, to, subject, date) 2. Extract body text 3. Download attachments 4. Create document for email body 5. Create documents for attachments 6. Link documents together 7. Apply mail rules --- ##### `handle_attachments(self, message)` ```python def handle_attachments(self, message) -> list ``` **Purpose**: Extract and process email attachments. **Returns**: List of attachment file paths **Supported**: - PDF attachments - Image attachments - Office documents - Archives (extracts) --- ## 4. OCR & Parsing Functions ### 4.1 Tesseract Parser (`paperless_tesseract/parsers.py`) #### Class: `RasterisedDocumentParser` ##### `parse(self, document_path, mime_type)` ```python def parse(self, document_path, mime_type) -> dict ``` **Purpose**: OCR document using Tesseract. **Parameters**: - `document_path` (str): Path to document - `mime_type` (str): MIME type **Returns**: Parsed document data: ```python { 'text': 'Extracted text content', 'metadata': {...}, 'pages': 10, 'language': 'eng' } ``` **Process**: 1. Convert to images (if PDF) 2. Preprocess images (deskew, denoise) 3. Detect language 4. Run Tesseract OCR 5. Post-process text (fix common errors) 6. Create searchable PDF --- ##### `construct_ocrmypdf_parameters(self)` ```python def construct_ocrmypdf_parameters(self) -> list ``` **Purpose**: Build command-line arguments for OCRmyPDF. **Returns**: List of arguments **Configuration**: - Language selection - OCR mode (redo, skip, force) - Image preprocessing - PDF/A creation - Optimization level --- ### 4.2 Tika Parser (`paperless_tika/parsers.py`) #### Class: `TikaDocumentParser` ##### `parse(self, document_path, mime_type)` ```python def parse(self, document_path, mime_type) -> dict ``` **Purpose**: Parse document using Apache Tika. **Supported Formats**: - Microsoft Office (doc, docx, xls, xlsx, ppt, pptx) - LibreOffice (odt, ods, odp) - Rich Text Format (rtf) - Archives (zip, tar, rar) - Images with metadata **Returns**: Parsed content and metadata --- ## 5. API & Serialization Functions ### 5.1 Document ViewSet (`documents/views.py`) #### Class: `DocumentViewSet` ##### `list(self, request)` ```python def list(self, request) -> Response ``` **Purpose**: List documents with filtering and pagination. **Query Parameters**: - `page`: Page number - `page_size`: Results per page - `ordering`: Sort field - `correspondent__id`: Filter by correspondent - `document_type__id`: Filter by type - `tags__id__in`: Filter by tags - `created__date__gt`: Filter by date - `query`: Full-text search **Response**: ```python { 'count': 100, 'next': 'http://api/documents/?page=2', 'previous': null, 'results': [...] } ``` --- ##### `retrieve(self, request, pk=None)` ```python def retrieve(self, request, pk=None) -> Response ``` **Purpose**: Get single document details. **Parameters**: - `pk`: Document ID **Response**: Full document JSON with metadata --- ##### `download(self, request, pk=None)` ```python @action(detail=True, methods=['get']) def download(self, request, pk=None) -> FileResponse ``` **Purpose**: Download document file. **Query Parameters**: - `original`: Download original vs archive version **Returns**: File download response --- ##### `preview(self, request, pk=None)` ```python @action(detail=True, methods=['get']) def preview(self, request, pk=None) -> FileResponse ``` **Purpose**: Generate document preview image. **Returns**: PNG/JPEG image --- ##### `metadata(self, request, pk=None)` ```python @action(detail=True, methods=['get']) def metadata(self, request, pk=None) -> Response ``` **Purpose**: Get/update document metadata. **GET Response**: ```python { 'original_filename': 'invoice.pdf', 'media_filename': '0000042.pdf', 'created': '2023-01-15T10:30:00Z', 'modified': '2023-01-15T10:30:00Z', 'added': '2023-01-15T10:30:00Z', 'archive_checksum': 'sha256:abc123...', 'original_checksum': 'sha256:def456...', 'original_size': 245760, 'archive_size': 180000, 'original_mime_type': 'application/pdf' } ``` --- ##### `suggestions(self, request, pk=None)` ```python @action(detail=True, methods=['get']) def suggestions(self, request, pk=None) -> Response ``` **Purpose**: Get ML classification suggestions. **Response**: ```python { 'correspondents': [ {'id': 5, 'name': 'Acme Corp', 'confidence': 0.87}, {'id': 2, 'name': 'Beta Inc', 'confidence': 0.12} ], 'document_types': [...], 'tags': [...] } ``` --- ##### `bulk_edit(self, request)` ```python @action(detail=False, methods=['post']) def bulk_edit(self, request) -> Response ``` **Purpose**: Bulk update multiple documents. **Request Body**: ```python { 'documents': [1, 2, 3, 4, 5], 'method': 'set_correspondent', 'parameters': {'correspondent': 7} } ``` **Methods**: - `set_correspondent` - `set_document_type` - `set_storage_path` - `add_tag` / `remove_tag` - `modify_tags` - `delete` - `merge` - `split` --- ## 6. Frontend Services & Components ### 6.1 Document Service (`src-ui/src/app/services/rest/document.service.ts`) #### Class: `DocumentService` ##### `listFiltered(page, pageSize, sortField, sortReverse, filterRules, extraParams?)` ```typescript listFiltered( page?: number, pageSize?: number, sortField?: string, sortReverse?: boolean, filterRules?: FilterRule[], extraParams?: any ): Observable> ``` **Purpose**: Get filtered list of documents. **Parameters**: - `page`: Page number (1-indexed) - `pageSize`: Results per page - `sortField`: Field to sort by - `sortReverse`: Reverse sort order - `filterRules`: Array of filter rules - `extraParams`: Additional query parameters **Returns**: Observable of paginated results **Example**: ```typescript this.documentService.listFiltered( 1, 50, 'created', true, [ {rule_type: FILTER_CORRESPONDENT, value: '5'}, {rule_type: FILTER_HAS_TAGS_ALL, value: '1,3,7'} ] ).subscribe(results => { this.documents = results.results; }); ``` --- ##### `get(id: number)` ```typescript get(id: number): Observable ``` **Purpose**: Get single document by ID. --- ##### `update(document: Document)` ```typescript update(document: Document): Observable ``` **Purpose**: Update document metadata. --- ##### `upload(formData: FormData)` ```typescript upload(formData: FormData): Observable ``` **Purpose**: Upload new document. **FormData fields**: - `document`: File - `title`: Optional title - `correspondent`: Optional correspondent ID - `document_type`: Optional type ID - `tags`: Optional tag IDs --- ##### `download(id: number, original: boolean)` ```typescript download(id: number, original: boolean = false): Observable ``` **Purpose**: Download document file. --- ##### `getPreviewUrl(id: number)` ```typescript getPreviewUrl(id: number): string ``` **Purpose**: Get URL for document preview. **Returns**: URL string --- ##### `getThumbUrl(id: number)` ```typescript getThumbUrl(id: number): string ``` **Purpose**: Get URL for document thumbnail. --- ##### `bulkEdit(documentIds: number[], method: string, parameters: any)` ```typescript bulkEdit( documentIds: number[], method: string, parameters: any ): Observable ``` **Purpose**: Perform bulk operation on documents. --- ### 6.2 Search Service (`src-ui/src/app/services/search.service.ts`) #### Class: `SearchService` ##### `search(query: string)` ```typescript search(query: string): Observable ``` **Purpose**: Perform full-text search. **Query Syntax**: - Simple: `invoice 2023` - Phrase: `"exact phrase"` - Boolean: `invoice AND 2023` - Field: `title:invoice` - Wildcard: `doc*` --- ##### `advancedSearch(query: SearchQuery)` ```typescript advancedSearch(query: SearchQuery): Observable ``` **Purpose**: Advanced search with multiple criteria. **SearchQuery**: ```typescript interface SearchQuery { text?: string; correspondent?: number; documentType?: number; tags?: number[]; dateFrom?: Date; dateTo?: Date; customFields?: {[key: string]: any}; } ``` --- ### 6.3 Settings Service (`src-ui/src/app/services/settings.service.ts`) #### Class: `SettingsService` ##### `getSettings()` ```typescript getSettings(): Observable ``` **Purpose**: Get user/system settings. --- ##### `updateSettings(settings: PaperlessSettings)` ```typescript updateSettings(settings: PaperlessSettings): Observable ``` **Purpose**: Update settings. --- ## 7. Utility Functions ### 7.1 File Handling Utilities (`documents/file_handling.py`) #### `generate_unique_filename(filename, suffix="")` ```python def generate_unique_filename(filename, suffix="") -> str ``` **Purpose**: Generate unique filename to avoid collisions. **Parameters**: - `filename` (str): Base filename - `suffix` (str): Optional suffix **Returns**: Unique filename with timestamp --- #### `create_source_path_directory(source_path)` ```python def create_source_path_directory(source_path) -> None ``` **Purpose**: Create directory structure for document storage. **Parameters**: - `source_path` (str): Path template with variables **Variables**: - `{correspondent}`: Correspondent name - `{document_type}`: Document type - `{created}`: Creation date - `{created_year}`: Year - `{created_month}`: Month - `{title}`: Document title - `{asn}`: Archive serial number **Example**: ```python # Template: {correspondent}/{created_year}/{document_type} # Result: Acme Corp/2023/Invoices/ ``` --- #### `safe_rename(old_path, new_path)` ```python def safe_rename(old_path, new_path) -> None ``` **Purpose**: Safely rename file with atomic operation. **Ensures**: No data loss if operation fails --- ### 7.2 Data Utilities (`paperless/utils.py`) #### `copy_basic_file_stats(src, dst)` ```python def copy_basic_file_stats(src, dst) -> None ``` **Purpose**: Copy file metadata (timestamps, permissions). --- #### `maybe_override_pixel_limit()` ```python def maybe_override_pixel_limit() -> None ``` **Purpose**: Increase PIL image size limit for large documents. --- ## 8. Database Models & Methods ### 8.1 Document Model (`documents/models.py`) #### Class: `Document` ##### Model Fields: ```python class Document(models.Model): title = models.CharField(max_length=255) content = models.TextField() correspondent = models.ForeignKey(Correspondent, ...) document_type = models.ForeignKey(DocumentType, ...) tags = models.ManyToManyField(Tag, ...) created = models.DateTimeField(...) modified = models.DateTimeField(auto_now=True) added = models.DateTimeField(auto_now_add=True) storage_path = models.ForeignKey(StoragePath, ...) archive_serial_number = models.IntegerField(...) original_filename = models.CharField(max_length=1024) checksum = models.CharField(max_length=64) archive_checksum = models.CharField(max_length=64) owner = models.ForeignKey(User, ...) custom_fields = models.ManyToManyField(CustomField, ...) ``` --- ##### `save(self, *args, **kwargs)` ```python def save(self, *args, **kwargs) -> None ``` **Purpose**: Override save to add custom logic. **Custom Logic**: 1. Generate archive serial number if not set 2. Update modification timestamp 3. Trigger signals 4. Update search index --- ##### `filename(self)` ```python @property def filename(self) -> str ``` **Purpose**: Get the document filename. **Returns**: Formatted filename based on template --- ##### `source_path(self)` ```python @property def source_path(self) -> str ``` **Purpose**: Get full path to source file. --- ##### `archive_path(self)` ```python @property def archive_path(self) -> str ``` **Purpose**: Get full path to archive file. --- ##### `get_public_filename(self)` ```python def get_public_filename(self) -> str ``` **Purpose**: Get sanitized filename for downloads. **Returns**: Safe filename without path traversal characters --- ### 8.2 Correspondent Model #### Class: `Correspondent` ```python class Correspondent(models.Model): name = models.CharField(max_length=255, unique=True) match = models.CharField(max_length=255, blank=True) matching_algorithm = models.IntegerField(choices=MATCH_CHOICES) is_insensitive = models.BooleanField(default=True) document_count = models.IntegerField(default=0) last_correspondence = models.DateTimeField(null=True) owner = models.ForeignKey(User, ...) ``` --- ### 8.3 Workflow Model #### Class: `Workflow` ```python class Workflow(models.Model): name = models.CharField(max_length=255) enabled = models.BooleanField(default=True) order = models.IntegerField(default=0) triggers = models.ManyToManyField(WorkflowTrigger) conditions = models.ManyToManyField(WorkflowCondition) actions = models.ManyToManyField(WorkflowAction) ``` --- ## Summary This guide provides comprehensive documentation for the major functions in IntelliDocs-ngx. For detailed API documentation, refer to: - **Backend API**: `/api/schema/` (OpenAPI/Swagger) - **Frontend Docs**: Generated via Compodoc - **Database Schema**: Django migrations in `migrations/` directories For implementation examples and testing, see the test files in each module's `tests/` directory. --- *Last Updated: 2025-11-09* *Version: 2.19.5*