# Draft: ZUGFeRD-Service Implementation ## Requirements (confirmed) ### Core Functionality - **Purpose**: Python REST API for ZUGFeRD/Factur-X invoice extraction and validation - **Framework**: FastAPI (preferred by user) - **Runtime**: Python 3.11+ - **Deployment**: Docker container on NixOS server + native Nix package ### API Endpoints 1. `GET /health` - Health check endpoint 2. `POST /extract` - PDF extraction (accepts base64-encoded PDF) 3. `POST /validate` - Invoice validation (pflichtfelder, betraege, ustid, pdf_abgleich) ### Key Dependencies - `factur-x>=2.5` - ZUGFeRD/Factur-X extraction - `pypdf>=4.0.0` - PDF text extraction - `fastapi>=0.109.0` - API framework - `uvicorn>=0.27.0` - ASGI server - `pydantic>=2.5.0` - Data models - `lxml>=5.0.0` - XML parsing - `python-multipart>=0.0.6` - File uploads ### Project Structure (user-specified) ``` zugferd-service/ ├── Dockerfile ├── requirements.txt ├── README.md ├── src/ │ ├── __init__.py │ ├── main.py # FastAPI App + Endpoints │ ├── extractor.py # ZUGFeRD/PDF Extraktion │ ├── validator.py # Validierungslogik │ ├── pdf_parser.py # PDF-Text-Parsing für Abgleich │ ├── models.py # Pydantic Models │ └── utils.py # Hilfsfunktionen ├── tests/ │ ├── __init__.py │ ├── test_extractor.py │ ├── test_validator.py │ └── fixtures/ │ ├── sample_zugferd.pdf │ └── sample_no_zugferd.pdf └── docker-compose.yml ``` ## Research Findings ### Nix Packaging (from librarian research) - Use `buildPythonApplication` for standalone service - `pyproject = true` with hatchling/setuptools - `pythonRelaxDeps = true` for dependency flexibility - mem0 example pattern: custom server script via `postInstall` - Consider flake.nix for modern Nix workflow ### factur-x Library (from librarian research) - `get_xml_from_pdf()` - Core extraction function - `get_level()` / `get_flavor()` - Profile detection - Namespaces: rsm, ram, udt for UN/CEFACT CII format - Profile levels: minimum, basicwl, basic, en16931, extended ### UN/ECE Unit Codes - C62 = Piece, KGM = Kilogram, H87 = Piece (alt) - Need comprehensive mapping dictionary ## Technical Decisions ### Python Tooling - **PENDING**: Use pyproject.toml (modern) or requirements.txt (legacy)? - **PENDING**: Build system: setuptools, hatchling, or poetry-core? ### Nix Approach - **PENDING**: Flake-based or traditional Nix expressions? - **PENDING**: Include NixOS service module? ### Testing Strategy - **PENDING**: TDD or tests-after? - **PENDING**: Test framework: pytest (standard choice) ## Scope Boundaries ### INCLUDE - All 3 API endpoints as specified - All validation checks (pflichtfelder, betraege, ustid, pdf_abgleich) - Docker multi-stage build - Nix packaging - Basic test suite - README documentation ### EXCLUDE - Online USt-ID validation (only format check) - Database/persistence (stateless service) - Authentication/authorization - Rate limiting - Metrics/tracing ## Open Questions (RESOLVED) 1. ✅ **Python project structure**: pyproject.toml with hatchling 2. ✅ **Build system**: hatchling (modern, Nix-friendly) 3. ✅ **Nix approach**: Flake-based 4. ✅ **Testing**: TDD (test-first) with pytest 5. ✅ **Sample PDFs**: Source from official ZUGFeRD repositories ## Metis Gap Analysis (Reviewed) ### Gaps Classified as MINOR (Auto-Resolved) - **UN/ECE unit codes**: Start with common codes (C62, KGM, H87, MTR, LTR, etc.), expand as needed - **Tolerance**: Hardcode 0.01 EUR as specified - **Validation scope**: Check required fields exist for declared profile - **Error codes**: Implement as specified in user's detailed spec ### Gaps Classified as DEFAULTS APPLIED - **Authentication**: OPEN (no auth mentioned in spec → stateless public API) - **ZUGFeRD profiles**: ALL profiles supported (MINIMUM, BASIC, BASIC WL, EN16931, EXTENDED) - **Deployment**: Container-based on NixOS (as per NixOS config section in spec) - **PDF text extraction**: REQUIRED for pdf_abgleich check (explicitly in spec) - **File size limit**: Handle as error for >10MB (spec mentions this edge case) ### Guardrails (Must NOT Have) - NO authentication middleware - NO database/persistence - NO caching layers - NO rate limiting - NO metrics endpoints (beyond /health) - NO CLI interface - NO web UI - NO abstraction layers for "future extensibility"