Files
zugferd-service/.sisyphus/drafts/zugferd-service.md

4.4 KiB

Draft: ZUGFeRD-Service Implementation

Requirements (confirmed)

Core Functionality

  • Purpose: Python REST API for ZUGFeRD/Factur-X invoice extraction and validation
  • Framework: FastAPI (preferred by user)
  • Runtime: Python 3.11+
  • Deployment: Docker container on NixOS server + native Nix package

API Endpoints

  1. GET /health - Health check endpoint
  2. POST /extract - PDF extraction (accepts base64-encoded PDF)
  3. POST /validate - Invoice validation (pflichtfelder, betraege, ustid, pdf_abgleich)

Key Dependencies

  • factur-x>=2.5 - ZUGFeRD/Factur-X extraction
  • pypdf>=4.0.0 - PDF text extraction
  • fastapi>=0.109.0 - API framework
  • uvicorn>=0.27.0 - ASGI server
  • pydantic>=2.5.0 - Data models
  • lxml>=5.0.0 - XML parsing
  • python-multipart>=0.0.6 - File uploads

Project Structure (user-specified)

zugferd-service/
├── Dockerfile
├── requirements.txt
├── README.md
├── src/
│   ├── __init__.py
│   ├── main.py              # FastAPI App + Endpoints
│   ├── extractor.py         # ZUGFeRD/PDF Extraktion
│   ├── validator.py         # Validierungslogik
│   ├── pdf_parser.py        # PDF-Text-Parsing für Abgleich
│   ├── models.py            # Pydantic Models
│   └── utils.py             # Hilfsfunktionen
├── tests/
│   ├── __init__.py
│   ├── test_extractor.py
│   ├── test_validator.py
│   └── fixtures/
│       ├── sample_zugferd.pdf
│       └── sample_no_zugferd.pdf
└── docker-compose.yml

Research Findings

Nix Packaging (from librarian research)

  • Use buildPythonApplication for standalone service
  • pyproject = true with hatchling/setuptools
  • pythonRelaxDeps = true for dependency flexibility
  • mem0 example pattern: custom server script via postInstall
  • Consider flake.nix for modern Nix workflow

factur-x Library (from librarian research)

  • get_xml_from_pdf() - Core extraction function
  • get_level() / get_flavor() - Profile detection
  • Namespaces: rsm, ram, udt for UN/CEFACT CII format
  • Profile levels: minimum, basicwl, basic, en16931, extended

UN/ECE Unit Codes

  • C62 = Piece, KGM = Kilogram, H87 = Piece (alt)
  • Need comprehensive mapping dictionary

Technical Decisions

Python Tooling

  • PENDING: Use pyproject.toml (modern) or requirements.txt (legacy)?
  • PENDING: Build system: setuptools, hatchling, or poetry-core?

Nix Approach

  • PENDING: Flake-based or traditional Nix expressions?
  • PENDING: Include NixOS service module?

Testing Strategy

  • PENDING: TDD or tests-after?
  • PENDING: Test framework: pytest (standard choice)

Scope Boundaries

INCLUDE

  • All 3 API endpoints as specified
  • All validation checks (pflichtfelder, betraege, ustid, pdf_abgleich)
  • Docker multi-stage build
  • Nix packaging
  • Basic test suite
  • README documentation

EXCLUDE

  • Online USt-ID validation (only format check)
  • Database/persistence (stateless service)
  • Authentication/authorization
  • Rate limiting
  • Metrics/tracing

Open Questions (RESOLVED)

  1. Python project structure: pyproject.toml with hatchling
  2. Build system: hatchling (modern, Nix-friendly)
  3. Nix approach: Flake-based
  4. Testing: TDD (test-first) with pytest
  5. Sample PDFs: Source from official ZUGFeRD repositories

Metis Gap Analysis (Reviewed)

Gaps Classified as MINOR (Auto-Resolved)

  • UN/ECE unit codes: Start with common codes (C62, KGM, H87, MTR, LTR, etc.), expand as needed
  • Tolerance: Hardcode 0.01 EUR as specified
  • Validation scope: Check required fields exist for declared profile
  • Error codes: Implement as specified in user's detailed spec

Gaps Classified as DEFAULTS APPLIED

  • Authentication: OPEN (no auth mentioned in spec → stateless public API)
  • ZUGFeRD profiles: ALL profiles supported (MINIMUM, BASIC, BASIC WL, EN16931, EXTENDED)
  • Deployment: Container-based on NixOS (as per NixOS config section in spec)
  • PDF text extraction: REQUIRED for pdf_abgleich check (explicitly in spec)
  • File size limit: Handle as error for >10MB (spec mentions this edge case)

Guardrails (Must NOT Have)

  • NO authentication middleware
  • NO database/persistence
  • NO caching layers
  • NO rate limiting
  • NO metrics endpoints (beyond /health)
  • NO CLI interface
  • NO web UI
  • NO abstraction layers for "future extensibility"