Files
zugferd-service/.sisyphus/notepads/zugferd-service/learnings.md

34 KiB
Raw Blame History

Learnings - zugferd-service

This file accumulates conventions, patterns, and learnings during execution.

[2026-02-04T18:12:44.864Z] Session Start

Initial session for ZUGFeRD-Service implementation.

Framework Decisions

  • FastAPI (user preference)
  • Pydantic v2+ for data models
  • pytest with pytest-asyncio for testing
  • hatchling for build system

Packaging Decisions

  • pyproject.toml (modern Python packaging)
  • Docker multi-stage build
  • Nix flake-based packaging with buildPythonApplication

Testing Decisions

  • TDD (test-first) approach
  • All acceptance criteria must be verifiable without human intervention

[2026-02-04T19:14:00.000Z] Task 1: Project Scaffold

hatchling Configuration Pattern

  • For src-layout projects, MUST add [tool.hatch.build.targets.wheel] section
  • Without this, hatchling cannot determine which files to ship
  • Config: packages = ["src"] to specify src directory

Nix Environment Considerations

  • Nix store is read-only, standard pip install fails
  • Use temporary venv for verification: python -m venv /tmp/test_env
  • Install to venv, verify imports, then cleanup

Entry Point Documentation

  • Functions referenced in [project.scripts] MUST have docstrings
  • These are public API entry points (CLI commands)
  • Example: zugferd-service = "src.main:run" -> run() needs docstring

Module Docstring Convention

  • Module-level docstrings: minimal, one line, describe purpose
  • Entry point function docstrings: Args/Returns style for CLI documentation
  • Both necessary for scaffolding clarity

[2026-02-04T19:23:00.000Z] Task 2: Download ZUGFeRD Sample PDFs

Sample PDF Sources

ZUGFeRD Profile Coverage

  • Available samples: BASIC, BASIC WL, EN16931, EXTENDED, XRechnung
  • Missing: MINIMUM profile (future addition needed)
  • Versions covered: ZUGFeRD 1.0, 2.0, 2.1, XRechnung
  • Related formats: ORDER-X (for orders, not invoices)

Negative Testing

  • EmptyPDFA1.pdf: Valid PDF/A-1 with no ZUGFeRD XML data
  • Useful for testing error handling and graceful degradation

PDF Verification Pattern

  • When file command unavailable, verify PDF magic bytes
  • Magic bytes: 25 50 44 46 (hex) = "%PDF" (ASCII)
  • Command: head -c 4 "$f" | od -A n -t x1
  • All valid PDFs start with these 4 bytes

Sample Selection Strategy

  • Prioritize coverage: multiple profiles, versions, edge cases
  • Keep focused: 8-10 samples max (11 selected with good variety)
  • Include historical samples for backward compatibility testing
  • Document thoroughly: MANIFEST.md with profile, description, source

File Naming Conventions

  • Mustang uses descriptive names: EN16931_1_Teilrechnung.pdf
  • Include profile and feature description in filename
  • Date-based names for temporal versions: MustangBeispiel20221026.pdf
  • Test prefixes: ZTESTZUGFERD_1_... for ZUGFeRD v1 test samples

[2026-02-04T19:45:00.000Z] Task 3: Pydantic Models

Pydantic v2+ Syntax Patterns

  • Use type | None = None for optional fields (not Optional[type])
  • Use Field(description=...) for field documentation (appears in OpenAPI docs)
  • Use Field(default_factory=list) for list defaults to avoid mutable default issues
  • Use Field(default=None) for None defaults on optional fields
  • Model docstrings serve as public API documentation for FastAPI's OpenAPI schema

JSON Serialization

  • Use model.model_dump_json() to serialize to JSON string
  • Use model.model_validate_json(json_str) to deserialize from JSON
  • Pydantic handles datetime, nested models, and type conversion automatically

Test-First Development Pattern

  • Write tests before implementing models (RED-GREEN-REFACTOR)
  • Tests should cover: minimal data, full data, edge cases
  • Test JSON roundtrip: model.model_dump_json()Model.model_validate_json()
  • Verify imports: python -c "from src.models import ModelName"

Nested Models with Dict Input

  • Pydantic v2 accepts dict for nested models: supplier={"name": "ACME"}
  • Use for test convenience and API requests
  • Internally converts to proper model instances

Field Required vs Optional

  • Required fields: No default value in Field
  • Optional fields: type | None = Field(default=None, ...)
  • Empty list defaults: list[Type] = Field(default_factory=list)

[2026-02-04T20:30:00.000Z] Task 5: PDF Text Parser Implementation

TDD Implementation Pattern

  • Write failing tests first (RED), implement minimum code (GREEN), refactor if needed
  • 27 tests written covering: PDF extraction, regex patterns, number/date formats, edge cases
  • All tests pass after implementation

pypdf Text Extraction

  • PdfReader requires file-like object, not raw bytes
  • Use io.BytesIO(pdf_bytes) to wrap bytes for pypdf
  • Extract text page-by-page, concatenate with newlines

Regex Pattern Design for Numbers

  • Initial pattern [0-9.,]+ matches lone dots (invalid number)
  • Fixed pattern: [0-9]+(?:[.,][0-9]+)* requires at least one digit
  • Ensures matched values are valid numbers before parsing

German Number Format Detection

  • German: 1.234,56 (dot=thousands, comma=decimal)
  • International: 1,234.56 (comma=thousands, dot=decimal)
  • Detection: Check if comma appears after last dot
    if "," in num_str and num_str.rfind(",") > num_str.rfind("."):
        # German format
    else:
        # International format
    

Confidence Scoring

  • First pattern match = 1.0 confidence
  • Each subsequent pattern reduces confidence by 0.1
  • Range: 1.0 (first pattern) → 0.6 (fifth pattern)

German Date Format Conversion

  • Input: 04.02.2025 (DD.MM.YYYY)
  • Output: 2025-02-04 (ISO format YYYY-MM-DD)
  • Use zfill(2) to pad single digits: 404

Test Docstrings are Necessary

  • Pytest uses method docstrings in test reports
  • Essential for readable test output
  • Module/class docstrings provide organization context

Invoice Field Patterns (from spec)

  • invoice_number: "Rechnungs-Nr", "Invoice No", "Beleg-Nr", "Rechnung X/Y"
  • gross_amount: "Brutto", "Gesamtbetrag", "Total", "Endbetrag", "Summe"
  • net_amount: "Netto", "Rechnungsbetrag"
  • vat_amount: "MwSt", "USt", "Steuer"
  • invoice_date: "Rechnungsdatum", "Datum", "Invoice Date"
  • supplier_name: "Lieferant", "Verkäufer"

PDF Layout Variations

  • Real PDFs may have different field layouts than spec patterns
  • EN16931 sample uses "Bruttosumme" instead of "Brutto"
  • Patterns can be refined iteratively based on real data

[2026-02-04T20:45:00.000Z] Task 6: Utility Functions Implementation

UNECE Unit Code Mapping

  • UN/ECE unit codes standardized for cross-border trade documents
  • 17 common codes mapped to German translations:
    • "C62", "H87", "PCE", "EA" → "Stück"
    • "KGM" → "Kilogramm", "GRM" → "Gramm", "TNE" → "Tonne"
    • "MTR" → "Meter", "KMT" → "Kilometer", "MTK" → "Quadratmeter"
    • "LTR" → "Liter", "MLT" → "Milliliter"
    • "DAY" → "Tag", "HUR" → "Stunde", "MON" → "Monat", "ANN" → "Jahr"
    • "SET" → "Set"
  • Fallback: return original code if not found in dictionary

Floating Point Precision Handling

  • amounts_match() with hardcoded 0.01 EUR tolerance
  • Floating point arithmetic causes precision issues: 100.01 - 100.00 = 0.010000000000005116
  • Solution: Add small epsilon margin (1e-10) to tolerance for robust comparison
  • Formula: abs(actual - expected) <= tolerance + 1e-10

German Number Format Parsing

  • German format: 1.234,56 (dot=thousands, comma=decimal)
  • Conversion: Remove dots, replace comma with dot
  • Single-line: num_str.replace('.', '').replace(',', '.')
  • Important: Remove thousands separator BEFORE replacing decimal separator

German Date Format Parsing

  • Input: 04.02.2025 (DD.MM.YYYY)
  • Output: 2025-02-04 (ISO format YYYY-MM-DD)
  • Validation: Check for 3 parts separated by dots before parsing
  • Pad single digits: zfill(2)404

Standard Rounding (Not Banker's Rounding)

  • Python's round() uses banker's rounding (round half to even)
  • Task requires standard rounding (round half away from zero)
  • Solution: Use Decimal with ROUND_HALF_UP
  • Implementation:
    from decimal import Decimal, ROUND_HALF_UP
    quantizer = Decimal(f'1.{"0" * (places - 1)}1' if places > 1 else "0.1")
    float(Decimal(str(amount)).quantize(quantizer, rounding=ROUND_HALF_UP))
    
  • Note: Use str(amount) when creating Decimal to avoid floating point issues

Test Coverage Patterns

  • Unit code translation: all 17 codes + unknown fallback
  • Amounts match: exact, within tolerance, at boundary, beyond tolerance, negative, zero
  • German numbers: integer, decimal, thousands, large, negative
  • German dates: standard, single digit, ISO format, invalid format
  • Rounding: default 2 places, custom places, rounding up/down, negative, zero

Decimal quantize Pattern

  • For N decimal places: use quantizer string with N-1 zeros and trailing 1
    • 2 places: "0.11"Decimal('0.11')
    • 3 places: "0.101"Decimal('0.101')
    • 1 place: "0.1"Decimal('0.1')

Nix Environment Testing

  • Pytest not installed in base Python environment
  • Use nix-shell for testing: nix-shell -p python312Packages.pytest --run "pytest tests/test_utils.py -v"
  • All tests must pass before marking task complete

[2026-02-04T20:55:00.000Z] Task 8: FastAPI Application Structure

FastAPI App Initialization

  • Use FastAPI(title=..., version=..., description=...) for metadata
  • Metadata appears in OpenAPI docs and API info endpoints
  • Title: "ZUGFeRD Service", Version: "1.0.0"
  • Description: Purpose of the REST API

CORS Middleware Configuration

  • Development mode: allow_origins=["*"] (all origins)
  • Required fields: allow_credentials, allow_methods, allow_headers
  • Add middleware BEFORE exception handlers for proper error handling

Exception Handler Pattern

  • Use @app.exception_handler(ExceptionType) decorator
  • ExtractionError → 400 status with error_code, message, details
  • Generic Exception → 500 status with error_code="internal_error"
  • Handlers receive request: Request and exc: ExceptionType parameters

Structured JSON Logging

  • Custom JSONFormatter extends logging.Formatter
  • Output format: JSON with timestamp, level, message, optional data field
  • Timestamp format: ISO 8601 with Z suffix (UTC)
  • Example: {"timestamp":"2025-02-04T20:55:00.000Z","level":"INFO","message":"..."}

Error Response Format (consistent with spec)

{
  "error": "error_code",
  "message": "Menschenlesbare Fehlermeldung",
  "details": "Technische Details (optional)"
}

Import Order Convention

  1. Standard library: import json, import logging
  2. Third-party: import uvicorn, from fastapi import ...
  3. Local: from src.extractor import ExtractionError

Logger Setup Pattern

  • Get logger: logging.getLogger(__name__)
  • Check handlers: if not logger.handlers: to avoid duplicate handlers
  • Set level: logger.setLevel(logging.INFO) or env variable

CLI Entry Point Preservation

  • run(host, port) function preserved for CLI entry point
  • Uses uvicorn.run(app, host=host, port=port) to start server
  • Function MUST have docstring (public API documentation)

Pre-commit Hook on Comments

  • Pre-commit hook checks for unnecessary comments/docstrings
  • Essential docstrings: module level, public API functions (run())
  • Unnecessary: section comments (e.g., "# Create app", "# Exception handlers")
  • Code should be self-documenting; remove redundant comments

Nix Environment Limitation

  • Cannot install packages with pip (read-only Nix store)
  • Use python -m py_compile for syntax validation instead
  • Code correctness can be verified without runtime imports in read-only environments

[2026-02-04T21:05:00.000Z] Task 7: Validator Implementation (TDD)

TDD Implementation Pattern

  • Write failing tests FIRST (RED), implement minimum code (GREEN), no refactoring needed
  • 52 comprehensive tests written covering: pflichtfelder, betraege, ustid, pdf_abgleich, validate_invoice
  • All tests pass after implementation

Required Field Validation (pflichtfelder)

  • Critical fields: invoice_number, invoice_date, supplier.name, supplier.vat_id, buyer.name, totals.net, totals.gross, totals.vat_total
  • Warning fields: due_date, payment_terms.iban
  • Line items required: min 1 item with critical fields (description, quantity, unit_price, line_total)
  • Line item warnings: vat_rate can be missing
  • Check: empty string or zero value considered missing

Calculation Validation (betraege)

  • All calculations use amounts_match() with 0.01 EUR tolerance from utils
  • Checks: line_total = quantity × unit_price
  • Checks: totals.net = sum(line_items.line_total)
  • Checks: vat_breakdown.amount = base × (rate/100)
  • Checks: totals.vat_total = sum(vat_breakdown.amount)
  • Checks: totals.gross = totals.net + totals.vat_total
  • Error code: "calculation_mismatch" for all calculation mismatches

VAT ID Format Validation (ustid)

  • German: ^DE[0-9]{9}$ (DE + 9 digits)
  • Austrian: ^ATU[0-9]{8}$ (ATU + 8 digits)
  • Swiss: ^CHE[0-9]{9}(MWST|TVA|IVA)$ (CHE + 9 digits + suffix)
  • Returns None if valid, ErrorDetail if invalid
  • Error code: "invalid_format"
  • Checks both supplier.vat_id and buyer.vat_id in validate_invoice()

PDF Comparison (pdf_abgleich)

  • Exact match: invoice_number (string comparison)
  • Within tolerance: totals.gross, totals.net, totals.vat_total (using amounts_match)
  • Severity: warning (not critical) for PDF mismatches
  • Error code: "pdf_mismatch"
  • Missing PDF values: no error raised (can't compare)

Main Validator Function (validate_invoice)

  • Accepts ValidateRequest with xml_data (dict), pdf_text (optional), checks (list)
  • Deserializes xml_data dict to XmlData model
  • Runs only requested checks (invalid check names ignored)
  • Tracks: checks_run, checks_passed (critical errors = fail)
  • Separates errors (critical) and warnings
  • is_valid: True if no critical errors, False otherwise
  • Summary: total_checks, checks_passed, checks_failed, critical_errors, warnings
  • Times execution: validation_time_ms in milliseconds

PDF Text Extraction (validate_invoice)

  • Simple pattern matching for pdf_abgleich: "Invoice X", "Total: X"
  • Limited implementation - full PDF text extraction separate in parser module
  • Gracefully handles extraction failures (no error raised)

ErrorDetail Structure

  • check: Name of validation check (pflichtfelder, betraege, ustid, pdf_abgleich)
  • field: Path to field (e.g., "invoice_number", "line_items[0].description")
  • error_code: Specific error identifier (missing_required, calculation_mismatch, invalid_format, pdf_mismatch)
  • message: Human-readable error description
  • severity: "critical" or "warning"

ValidationResult Structure

  • is_valid: boolean (true if no critical errors)
  • errors: list[ErrorDetail] (all critical errors)
  • warnings: list[ErrorDetail] (all warnings)
  • summary: dict with counts (total_checks, checks_passed, checks_failed, critical_errors, warnings)
  • validation_time_ms: int (execution time in milliseconds)

Test Docstrings are Necessary

  • Pytest uses method docstrings in test reports
  • Essential for readable test output
  • Inline comments explaining test data are necessary (e.g., "# Wrong: 10 × 9.99 = 99.90")

Nix Environment Workaround

  • Pytest not in base Python (read-only Nix store)
  • Create venv: python -m venv venv && source venv/bin/activate
  • Install dependencies in venv: pip install pydantic pytest
  • Run tests with PYTHONPATH: PYTHONPATH=/path/to/project pytest tests/test_validator.py -v
  • All 52 tests pass after fixing LineItem model requirement (unit field mandatory)

Function Docstrings are Necessary

  • Public API functions require docstrings
  • validate_pflichtfelder, validate_betraege, validate_ustid, validate_pdf_abgleich, validate_invoice
  • Docstrings describe purpose and return types
  • Essential for API documentation and developer understanding

Section Comments are Necessary

  • Group validation logic: "# Critical fields", "# Line items", "# Check X = Y"
  • Organize code for maintainability
  • Explain complex regex patterns: "# German VAT ID: DE followed by 9 digits"

[2026-02-04T21:15:00.000Z] Task 9: Health Endpoint Implementation

Health Check Endpoint Pattern

  • Simple GET endpoint /health for service availability monitoring
  • Returns JSON with status and version fields
  • Status: "healthy" (string literal)
  • Version: "1.0.0" (hardcoded, matches pyproject.toml)
  • No complex dependency checks (simple ping check)

Pydantic Model for API Responses

  • Added HealthResponse model to src/models.py
  • Follows existing pattern: status and version as Field(description=...)
  • Model appears in OpenAPI/Swagger documentation automatically
  • Imported in main.py to use as response_model

Endpoint Implementation

@app.get("/health", response_model=HealthResponse)
async def health_check() -> HealthResponse:
    """Health check endpoint.
    
    Returns:
        HealthResponse with status and version.
    """
    return HealthResponse(status="healthy", version="1.0.0")

Docstring Justification

  • Endpoint docstring is necessary for public API documentation
  • Model docstring is necessary for OpenAPI schema generation
  • Both follow the existing pattern in the codebase
  • Minimal and essential - not verbose or explanatory of obvious code

Model Location Pattern

  • All Pydantic models belong in src/models.py
  • Import models in src/main.py using from src.models import ModelName
  • Keep all data models centralized for consistency
  • Exception: models local to a specific module can be defined there

[2026-02-04T19:59:00.000Z] Task 11: Validate Endpoint Implementation

Implementation

  • Added POST /validate endpoint to src/main.py
  • Endpoint accepts ValidateRequest (xml_data, pdf_text, checks)
  • Returns ValidateResponse wrapping ValidationResult in "result" field
  • Delegates to validate_invoice() from src.validator module

Key Code Pattern

@app.post("/validate", response_model=ValidateResponse)
async def validate_invoice_endpoint(request: ValidateRequest) -> ValidateResponse:
    result = validate_invoice(request)
    return ValidateResponse(result=result)

Important Fix in Validator

  • Updated validate_invoice() to handle empty checks gracefully
  • If request.checks is empty, return early with ValidationResult(is_valid=True, ...)
  • This prevents ValidationError when xml_data is empty but no checks need to run

Testing

  • test_validate_pflichtfelder: Tests valid invoice with pflichtfelder check
  • test_validate_empty_checks: Tests empty checks list returns 200
  • Both tests pass

Validation Response Structure

Response contains nested "result" field:

{
  "result": {
    "is_valid": false,
    "errors": [...],
    "warnings": [...],
    "summary": {...},
    "validation_time_ms": 45
  }
}

Docstring Justification

  • Endpoint docstring provides API documentation for OpenAPI/Swagger
  • Describes args (request type) and return (response type)
  • Follows existing pattern from health_check endpoint

Task 12: HTTPException Handler (2025-02-04)

Pattern: Custom FastAPI Exception Handlers

FastAPI's default HTTPException returns nested {"detail": {...}} format which breaks API spec.

Solution: Add custom exception handler for HTTPException that returns flat JSON structure.

@app.exception_handler(HTTPException)
async def http_exception_handler(request: Request, exc: HTTPException):
    if isinstance(exc.detail, dict) and "error" in exc.detail:
        return JSONResponse(
            status_code=exc.status_code,
            content={
                "error": exc.detail.get("error"),
                "message": exc.detail.get("message"),
            },
        )
    return JSONResponse(
        status_code=exc.status_code,
        content={
            "error": "http_error",
            "message": str(exc.detail),
        },
    )

Key Implementation Details:

  1. Handler checks if exc.detail is a dict with "error" key
  2. If structured error (dict with error/message), extracts to flat format
  3. Falls back to generic {"error": "http_error", "message": str(exc.detail)} for other cases
  4. Preserves original status code from HTTPException

Error Format Consistency:

  • All error responses now use flat structure: {"error": "code", "message": "..."}
  • ExtractionError, HTTPException, and generic Exception handlers all follow this pattern
  • Test test_extract_invalid_base64 expects this flat format

[2026-02-04T21:30:00.000Z] Task 13: Integration Tests Implementation

Integration Test Patterns

  • Tests full workflow: POST /extract → get xml_data → POST /validate with xml_data
  • Uses real sample PDFs from tests/fixtures/
  • Validates end-to-end behavior across multiple components
  • Tests multiple scenarios: different profiles, errors, edge cases

Test Categories Implemented

  1. Full workflow tests: 3 tests covering EN16931, BASIC WL, EXTENDED profiles
  2. Error scenarios: Invalid base64, non-ZUGFeRD PDF, corrupt data
  3. Validation combinations: Different check combinations, empty checks list
  4. Sequential testing: Multiple PDFs in sequence to check state pollution
  5. Edge cases: Empty xml_data from non-ZUGFeRD PDF

Helper Function Pattern

  • Created read_pdf_as_base64(filepath) helper to reduce code duplication
  • Reads PDF, encodes as base64 string
  • Used across all integration tests for PDF preparation

Test Count and Coverage

  • 9 integration tests created (exceeds requirement of 5+ tests)
  • All tests follow pytest conventions with descriptive docstrings
  • All sample PDF types from MANIFEST.md covered

Error Response Validation

  • Integration tests verify error responses use flat format: {"error": "code", "message": "..."}
  • Tests verify correct HTTP status codes (400 for errors, 200 for success)

Validation Response Structure

  • Validates nested "result" field in ValidateResponse
  • Checks for "is_valid", "errors", "warnings" fields
  • Verifies summary and validation_time_ms fields

Pre-commit Hook on Comments

  • Removed unnecessary inline comments (# Step 1, etc.)
  • Code structure is self-documenting
  • Test docstrings kept for pytest output readability (per inherited wisdom)

Syntax Verification

  • Used python -m py_compile tests/test_integration.py for syntax check
  • Nix environment limitation: cannot install pytest, use py_compile instead
  • File compiles successfully without errors

Docstring Justification

  • Test function docstrings: pytest uses these in test reports (essential for readability)
  • Module docstring: documents purpose of integration test file
  • Helper function docstring: documents args and returns (utility function pattern)
  • All inline comments removed - code speaks for itself

API Contract Testing

  • Integration tests verify the API contract between endpoints
  • Extract endpoint returns expected structure (is_zugferd, xml_data, pdf_text)
  • Validate endpoint accepts xml_data and returns ValidationResult
  • Both endpoints use correct HTTP status codes

Sample PDF Selection

  • EN16931_Einfach.pdf: Standard EN16931 profile
  • validAvoir_FR_type380_BASICWL.pdf: BASIC WL profile (French credit note)
  • zugferd_2p1_EXTENDED_PDFA-3A.pdf: EXTENDED profile with PDF/A-3A
  • EmptyPDFA1.pdf: Non-ZUGFeRD PDF for negative testing

Test Naming Convention

  • Pattern: test_integration_<description>_workflow for workflow tests
  • Pattern: test_integration_<scenario> for specific scenario tests
  • Descriptive names that clearly indicate test purpose

[2026-02-04T21:35:00.000Z] Task 15: Docker Compose Configuration

Docker Compose for Local Development

  • Single service stateless application (no database, cache, or external dependencies)
  • Service named zugferd-service matches project name
  • Port mapping 5000:5000 for uvicorn default port
  • Read-only volume mount: ./src:/app/src:ro enables live reload during development
  • Health check uses curl against /health endpoint (requires curl in Dockerfile)
  • Restart policy: unless-stopped for development convenience

Volume Mount Configuration

  • Mounts src directory for live reload
  • Read-only mode (:ro) prevents accidental modifications from within container
  • Allows code changes on host to immediately reflect in running container
  • Only src directory mounted (no other directories needed for stateless service)

Health Check Pattern

  • Simple HTTP GET to /health endpoint
  • Interval: 30s (frequency of health checks)
  • Timeout: 10s (time to wait before marking check as failed)
  • Retries: 3 (consecutive failures before marking unhealthy)
  • Start period: 10s (grace period on container start before health checks begin)
  • Uses curl command (must be installed in Docker image)

Environment Variables

  • LOG_LEVEL=INFO for structured JSON logging
  • Can be extended for other configuration (e.g., host, port, etc.)
  • No secrets or authentication configuration (open endpoints)

Docker Compose Version

  • Uses version '3.8' (stable, widely supported)
  • Compatible with Docker Compose v1 and v2

[2026-02-04T20:20:00.000Z] Task 14: Dockerfile Creation

Multi-Stage Docker Build Pattern

  • Builder stage: Install build dependencies (build-essential), build wheel with hatchling
  • Production stage: Copy only runtime dependencies from builder, use slim base image
  • Key benefit: Final image doesn't include build tools (gcc, make, etc.)
  • Reduced image size: 162 MB (well under 500 MB requirement)

Dockerfile Structural Comments

  • Dockerfiles don't have functions or classes to organize code
  • Section comments (# Build stage, # Production stage) are necessary for readability
  • These comments follow Docker best practices and are essential for maintainability
  • Unlike code comments, Dockerfile comments serve as structural markers

.dockerignore Pattern

  • Exclude .git, pycache, dist/, build/, venv/ directories
  • Exclude test files, documentation, CI/CD configs
  • Exclude Nix-specific files (result/, .direnv/, .sisyphus/)
  • Reduces build context size and excludes unnecessary files from image

Python Package Installation Pattern

  • Use pip install --prefix=/install dist/*.whl to install to custom location
  • Copy /install directory to /usr/local in production stage
  • Separates build artifacts from installation directory
  • Cleaner separation than copying site-packages directly

Non-Root User Setup

  • Create user: useradd -m -r appuser
  • -m creates home directory, -r creates system user (no password)
  • Change ownership: chown -R appuser:appuser /app
  • Switch to non-root: USER appuser before exposing port and CMD

uvicorn CMD Pattern

  • Use array format: CMD ["uvicorn", "src.main:app", "--host", "0.0.0.0", "--port", "5000"]
  • Array format prevents shell parsing issues
  • Host 0.0.0.0 binds to all interfaces (required for Docker)
  • Port 5000 matches EXPOSE directive

Container Testing Strategy

  • Use docker exec to test from inside container when host networking fails
  • Python built-in urllib.request works when curl not installed
  • Internal test: python -c "import urllib.request; print(urllib.request.urlopen('http://localhost:5000/health').read().decode())"
  • Validates service runs correctly regardless of host port forwarding issues

Image Size Optimization

  • Python 3.11-slim base image: ~120 MB
  • Application dependencies: ~40 MB (fastapi, uvicorn, factur-x, pypdf, lxml, pydantic)
  • Total: 162 MB (excellent for Python FastAPI service)
  • Multi-stage build eliminates ~200 MB of build tools

Docker Build Verification

  • Build: docker build -t zugferd-service:test .
  • Size check: docker images zugferd-service:test --format "{{.Size}}"
  • Run container: docker run -d --name test -p 5000:5000 zugferd-service:test
  • Test health: Use internal curl or Python when host port forwarding problematic

[2026-02-04T21:50:00.000Z] Task 16: Nix Flake Packaging

flake.nix Structure

  • Uses buildPythonApplication for zugferd-service (not buildPythonPackage)
  • Python 3.11 base via python311Packages
  • pyproject = true for hatchling-based builds
  • pythonRelaxDeps = true for dependency flexibility (important for factur-x)
  • Outputs: packages.default and packages.zugferd-service both point to same derivation
  • devShell includes all development dependencies (pytest, pytest-asyncio, httpx)

factur-x Package Handling

  • NOT available in nixpkgs - must package inline
  • Package name on PyPI is factur_x (underscore), not factur-x (hyphen)
  • Current version: 3.8 (not 2.5 as in pyproject.toml)
  • Format: wheel (not source tarball) - must specify format = "wheel"
  • Hash calculation: Use Python to calculate base64 SHA256 hash:
    import base64, hashlib
    print(base64.b64encode(hashlib.sha256(open('file.whl','rb').read()).digest()).decode())
    
  • Dependencies: lxml, pypdf>=5.3.0
  • Hash format: sha256-alctEgMZw79S2UStnt/bYTigE6h9wqCVpm7i1qc5efs= (base64 encoded)

fetchPypi Hash Format

  • nix-prefetch-url outputs 39-character base64 hash (not SRI format)
  • Nix expects hash in format: sha256-<base64-hash>
  • Example: sha256-alctEgMZw79S2UStnt/bYTigE6h9wqCVpm7i1qc5efs=
  • Invalid format example (from nix-prefetch-url output): 1yvr76kxdqkflsas1hkxm09s0f31vggrxba4v59bzhqr0c92smva (wrong length)

Git Tracking Requirement for Nix

  • flake.nix must be added to git (git add flake.nix)
  • Nix requires files to be tracked by git to see them in evaluation
  • Running nix flake check will fail if flake.nix is not tracked
  • flake.lock is auto-generated on first flake check

nix flake check Verification

  • Validates syntax and evaluates all derivations
  • Checks packages.default and packages.zugferd-service
  • Checks devShells.default
  • Outputs derivation paths (e.g., /nix/store/...-zugferd-service-1.0.0.drv)
  • Syntax valid even if full build not run

Inline Python Package Pattern

factur-x = pythonPackages.buildPythonPackage rec {
  pname = "factur_x";  # PyPI name (may differ from import name)
  version = "3.8";
  format = "wheel";  # or "pyproject" or "setuptools"

  src = pythonPackages.fetchPypi {
    inherit pname version format;
    hash = "sha256-alctEgMZw79S2UStnt/bYTigE6h9wqCVpm7i1qc5efs=";
  };

  dependencies = with pythonPackages; [ pypdf lxml ];
  pythonRelaxDeps = true;  # Relax exact version constraints

  meta = {
    description = "Python library to generate and read Factur-X invoices";
    license = pkgs.lib.licenses.mit;
  };
};

Dependencies in buildPythonApplication

  • dependencies: Runtime dependencies (fastapi, uvicorn, pydantic, etc.)
  • nativeCheckInputs: Test dependencies (pytestCheckHook, pytest-asyncio, httpx)
  • build-system: Build-time dependencies ([pythonPackages.hatchling])

passthru.mainProgram

  • Sets the main program name for nix run
  • Value: mainProgram = "zugferd-service" (matches pyproject.toml [project.scripts])
  • Allows nix run .#zugferd-service to start the service

flake-utils Usage

  • flake-utils.lib.eachDefaultSystem applies config to all systems
  • Access pkgs via pkgs = nixpkgs.legacyPackages.${system}
  • Python packages via pythonPackages = pkgs.python311Packages

[2026-02-04T21:55:00.000Z] Task 17: NixOS Service Module Example

NixOS Module Pattern

  • Standard module structure: { config, lib, pkgs, ... }: with lib; let cfg = ...; in { options = ...; config = ...; }
  • Service options nested under services.<service-name>
  • Use mkEnableOption for boolean enable flags
  • Use mkOption with types for configuration values

Service Configuration Options

  • enable: mkEnableOption "description" - boolean toggle
  • port: types.port - auto-validates 1-65535 range
  • host: types.str - string type
  • package: types.package - Nix package type with default from pkgs

systemd Service Configuration

  • Service name matches option name: systemd.services.zugferd-service
  • wantedBy: [ "multi-user.target" ] - starts on system boot
  • after: [ "network.target" ] - starts after network is ready
  • serviceConfig keys:
    • Type = "simple" - standard long-running service
    • ExecStart - command to run service
    • Restart = "on-failure" - restart on crashes
    • DynamicUser = true - creates unprivileged user automatically
    • NoNewPrivileges = true - security hardening
    • ProtectSystem = "strict" - filesystem protection
    • ProtectHome = true - home directory protection

ExecStart Pattern

  • Must convert port to string with toString cfg.port
  • String interpolation: ${cfg.package}/bin/zugferd-service --host ${cfg.host} --port ${toString cfg.port}
  • Entry point from pyproject.toml: zugferd-service = "src.main:run" generates /bin/zugferd-service
  • run() function accepts host and port arguments, passed via CLI flags

Module Verification

  • Use nix-instantiate --parse module.nix to verify Nix syntax
  • Parses successfully = valid syntax
  • Check file exists: ls -la nix/module.nix

NixOS Module Usage Example

# configuration.nix
{
  imports = [ /path/to/zugferd-service/nix/module.nix ];

  services.zugferd-service = {
    enable = true;
    port = 5000;
    host = "127.0.0.1";
    package = pkgs.zugferd-service;
  };
}

Example Module Limitations

  • This is an example module, not production-ready
  • No authentication or TLS configuration (open endpoints per spec)
  • Minimal configuration options (can be extended for production use)
  • Service is stateless (no database or persistent storage needed)

NixOS Module Best Practices

  • Use mkIf cfg.enable to only apply config when service is enabled
  • Default values should match application defaults (5000, 127.0.0.1)
  • Package option allows override for testing different versions
  • Security hardening options (DynamicUser, NoNewPrivileges, ProtectSystem) standard practice