Files
zugferd-service/.sisyphus/plans/zugferd-service.md
m3tm3re 29bd8453ec test(fixtures): add ZUGFeRD sample PDFs and feat(models): add Pydantic models
- Download 11 official ZUGFeRD sample PDFs
- Cover profiles: BASIC, BASIC WL, EN16931, EXTENDED, XRechnung
- Add non-ZUGFeRD PDF for negative testing
- Create MANIFEST.md documenting all samples
- Implement all Pydantic models from spec
- Add 28 TDD tests for models
- All tests pass
2026-02-04 19:26:01 +01:00

52 KiB
Raw Blame History

ZUGFeRD-Service: Python REST API for Invoice Extraction and Validation

TL;DR

Quick Summary: Build a stateless FastAPI service that extracts structured data from ZUGFeRD/Factur-X invoices embedded in PDFs, validates invoice correctness, and is packaged for both Docker and NixOS deployment.

Deliverables:

  • Complete FastAPI application with 3 endpoints (/health, /extract, /validate)
  • Pydantic models for all request/response schemas
  • ZUGFeRD extraction pipeline using factur-x library
  • 4 validation checks (pflichtfelder, betraege, ustid, pdf_abgleich)
  • Docker multi-stage build (production-ready)
  • Nix flake packaging (buildPythonApplication)
  • Comprehensive test suite (TDD, pytest)
  • README documentation with installation guide

Estimated Effort: Medium-Large (20-30 tasks) Parallel Execution: YES - 4 waves Critical Path: Project Setup → Models → Extractor → Validator → API → Packaging


Context

Original Request

Build a Python-based REST API service for ZUGFeRD invoice extraction and validation. The service runs as a Docker container on NixOS and should also be packaged as a Nix flake. Key requirements include handling all ZUGFeRD profiles (MINIMUM through EXTENDED), validating invoice amounts and required fields, and comparing XML data against PDF text.

Interview Summary

Key Discussions:

  • Framework choice: FastAPI (user preference, excellent for REST APIs)
  • Project format: pyproject.toml with hatchling (modern, Nix-friendly)
  • Nix packaging: Flake-based approach with buildPythonApplication
  • Testing strategy: TDD with pytest, source official sample PDFs
  • Authentication: Open endpoints (no auth required)
  • File handling: 10MB limit, reject password-protected PDFs
  • Calculations: Standard rounding, 0.01 EUR tolerance

Research Findings:

  • factur-x library provides get_xml_from_pdf(), get_level(), get_flavor() for extraction
  • Nix packaging follows mem0 pattern: pyproject=true, pythonRelaxDeps=true
  • UN/ECE unit codes require mapping dictionary (C62→Piece, KGM→Kilogram, etc.)
  • ZUGFeRD profiles detected via GuidelineSpecifiedDocumentContextParameter in XML

Metis Review

Identified Gaps (addressed):

  • Password-protected PDFs → Reject with error
  • File size limits → 10MB hard limit
  • Decimal precision → Standard rounding
  • Authentication → Confirmed open/stateless

Work Objectives

Core Objective

Create a production-ready, stateless REST API that extracts ZUGFeRD/Factur-X invoice data from PDFs, validates invoice correctness against business rules, and provides clear error reporting for invoice processing workflows.

Concrete Deliverables

  • src/main.py - FastAPI application with all endpoints
  • src/models.py - Pydantic models for all data structures
  • src/extractor.py - ZUGFeRD extraction logic
  • src/validator.py - Validation checks implementation
  • src/pdf_parser.py - PDF text extraction for cross-validation
  • src/utils.py - Helper functions (unit codes, decimal handling)
  • pyproject.toml - Project configuration with hatchling
  • Dockerfile - Multi-stage production build
  • docker-compose.yml - Local development setup
  • flake.nix - Nix flake packaging
  • tests/ - Complete test suite with fixtures
  • README.md - Installation and usage documentation

Definition of Done

  • nix build .#zugferd-service completes without errors
  • docker build -t zugferd-service . produces image <500MB
  • pytest runs all tests with 100% pass rate
  • curl http://localhost:5000/health returns {"status": "healthy", "version": "1.0.0"}
  • All ZUGFeRD profiles correctly detected from sample PDFs
  • All validation checks produce expected errors/warnings

Must Have

  • All 3 API endpoints as specified
  • Support for all ZUGFeRD profiles (MINIMUM, BASIC, BASIC WL, EN16931, EXTENDED)
  • All 4 validation checks (pflichtfelder, betraege, ustid, pdf_abgleich)
  • Structured JSON error responses with error codes
  • UTF-8 encoding throughout
  • Structured JSON logging

Must NOT Have (Guardrails)

  • Authentication middleware (open endpoints)
  • Database or persistence layer (stateless only)
  • Caching layers
  • Rate limiting
  • Metrics endpoints beyond /health
  • CLI interface
  • Web UI or admin dashboard
  • Batch processing or queue system
  • Online USt-ID validation (format check only)
  • Support for ZUGFeRD 1.x (only 2.x)
  • Abstraction layers for "future extensibility"

Verification Strategy

UNIVERSAL RULE: ZERO HUMAN INTERVENTION

ALL tasks in this plan MUST be verifiable WITHOUT any human action. Every verification step is executed by the agent using tools (curl, pytest, nix, docker).

Test Decision

  • Infrastructure exists: NO (new project)
  • Automated tests: TDD (test-first)
  • Framework: pytest with pytest-asyncio

TDD Workflow

Each implementation task follows RED-GREEN-REFACTOR:

  1. RED: Write failing test first
  2. GREEN: Implement minimum code to pass
  3. REFACTOR: Clean up while keeping tests green

Test Infrastructure Setup (Task 0)

# Install pytest and test dependencies
pip install pytest pytest-asyncio httpx

# Verify pytest works
pytest --version

# Run initial test
pytest tests/ -v

Agent-Executed QA Scenarios (MANDATORY)

All verifications use:

  • API Testing: curl commands with JSON assertions
  • Docker Testing: docker build and docker run commands
  • Nix Testing: nix build and nix run commands
  • Unit Testing: pytest with specific test file targets

Execution Strategy

Parallel Execution Waves

Wave 1 (Start Immediately):
├── Task 1: Project scaffold (pyproject.toml, directories)
├── Task 2: Download ZUGFeRD sample PDFs
└── Task 3: Create Pydantic models

Wave 2 (After Wave 1):
├── Task 4: Extractor unit tests + implementation
├── Task 5: PDF parser unit tests + implementation
└── Task 6: Utils (unit codes, decimal handling)

Wave 3 (After Wave 2):
├── Task 7: Validator unit tests + implementation
├── Task 8: FastAPI app structure
└── Task 9: Health endpoint

Wave 4 (After Wave 3):
├── Task 10: Extract endpoint
├── Task 11: Validate endpoint
└── Task 12: Error handling middleware

Wave 5 (After Wave 4):
├── Task 13: Integration tests
├── Task 14: Dockerfile
└── Task 15: docker-compose.yml

Wave 6 (After Wave 5):
├── Task 16: Nix flake packaging
├── Task 17: NixOS module example
└── Task 18: README documentation

Critical Path: Task 1 → Task 4 → Task 7 → Task 10 → Task 13 → Task 16

Dependency Matrix

Task Depends On Blocks Can Parallelize With
1 None 2-18 None (must be first)
2 1 4, 5 3
3 1 4, 5, 7 2
4 2, 3 7, 10 5, 6
5 2 7 4, 6
6 1 4, 7 4, 5
7 4, 5, 6 11 8, 9
8 3 9, 10, 11 7
9 8 10, 11 7
10 4, 8, 9 13 11, 12
11 7, 8, 9 13 10, 12
12 8 13 10, 11
13 10, 11, 12 14, 16 None
14 13 16 15
15 13 None 14
16 14 17, 18 None
17 16 18 None
18 16, 17 None None

TODOs

Wave 1: Project Foundation

  • 1. Project Scaffold and Configuration

    What to do:

    • Create project directory structure as specified
    • Create pyproject.toml with hatchling build system
    • Configure pytest in pyproject.toml
    • Create src/__init__.py and tests/__init__.py
    • Set up Python version requirement (3.11+)

    Must NOT do:

    • Do NOT add optional dependencies
    • Do NOT create README yet (separate task)
    • Do NOT add pre-commit hooks

    Recommended Agent Profile:

    • Category: quick
      • Reason: Straightforward file creation with established patterns
    • Skills: [git-master]
      • git-master: For proper initial commit after scaffold

    Parallelization:

    • Can Run In Parallel: NO
    • Parallel Group: Sequential (must be first)
    • Blocks: All subsequent tasks
    • Blocked By: None

    References:

    Files to Create:

    pyproject.toml
    src/__init__.py
    src/main.py (placeholder)
    src/models.py (placeholder)
    src/extractor.py (placeholder)
    src/validator.py (placeholder)
    src/pdf_parser.py (placeholder)
    src/utils.py (placeholder)
    tests/__init__.py
    tests/conftest.py (pytest fixtures)
    tests/fixtures/.gitkeep
    

    pyproject.toml Structure:

    [build-system]
    requires = ["hatchling"]
    build-backend = "hatchling.build"
    
    [project]
    name = "zugferd-service"
    version = "1.0.0"
    description = "REST API for ZUGFeRD invoice extraction and validation"
    requires-python = ">=3.11"
    dependencies = [
        "fastapi>=0.109.0",
        "uvicorn>=0.27.0",
        "python-multipart>=0.0.6",
        "factur-x>=2.5",
        "pypdf>=4.0.0",
        "pydantic>=2.5.0",
        "lxml>=5.0.0",
    ]
    
    [project.optional-dependencies]
    dev = [
        "pytest>=8.0.0",
        "pytest-asyncio>=0.23.0",
        "httpx>=0.27.0",
    ]
    
    [project.scripts]
    zugferd-service = "src.main:run"
    
    [tool.pytest.ini_options]
    asyncio_mode = "auto"
    testpaths = ["tests"]
    

    Acceptance Criteria:

    Agent-Executed QA Scenarios:

    Scenario: pyproject.toml is valid
      Tool: Bash
      Steps:
        1. python -c "import tomllib; tomllib.load(open('pyproject.toml', 'rb'))"
        2. Assert: Exit code 0
      Expected Result: TOML parses without error
      Evidence: Command output captured
    
    Scenario: Project structure exists
      Tool: Bash
      Steps:
        1. ls -la src/
        2. Assert: main.py, models.py, extractor.py, validator.py, pdf_parser.py, utils.py exist
        3. ls -la tests/
        4. Assert: conftest.py, fixtures/ exist
      Expected Result: All required files present
      Evidence: Directory listing captured
    
    Scenario: Dependencies install correctly
      Tool: Bash
      Steps:
        1. pip install -e ".[dev]"
        2. Assert: Exit code 0
        3. python -c "import fastapi; import facturx; import pypdf"
        4. Assert: Exit code 0
      Expected Result: All dependencies importable
      Evidence: pip output captured
    

    Commit: YES

    • Message: feat(project): initialize ZUGFeRD service with pyproject.toml and directory structure
    • Files: All created files
    • Pre-commit: python -c "import tomllib; tomllib.load(open('pyproject.toml', 'rb'))"

  • 2. Download ZUGFeRD Sample PDFs

    What to do:

    • Download official ZUGFeRD sample PDFs from FeRD/ZUGFeRD repositories
    • Include samples for: MINIMUM, BASIC, BASIC WL, EN16931, EXTENDED profiles
    • Include a non-ZUGFeRD PDF for negative testing
    • Store in tests/fixtures/
    • Create fixture manifest documenting each file

    Must NOT do:

    • Do NOT create synthetic PDFs
    • Do NOT download more than 10 samples (keep focused)

    Recommended Agent Profile:

    • Category: quick
      • Reason: Download and organize files
    • Skills: []
      • No special skills needed

    Parallelization:

    • Can Run In Parallel: YES
    • Parallel Group: Wave 1 (with Task 3)
    • Blocks: Tasks 4, 5
    • Blocked By: Task 1

    References:

    Acceptance Criteria:

    Agent-Executed QA Scenarios:

    Scenario: Sample PDFs exist for all profiles
      Tool: Bash
      Steps:
        1. ls tests/fixtures/*.pdf | wc -l
        2. Assert: At least 5 PDF files
        3. file tests/fixtures/*.pdf
        4. Assert: All files identified as "PDF document"
      Expected Result: Multiple valid PDF files in fixtures
      Evidence: File listing captured
    
    Scenario: Fixture manifest documents samples
      Tool: Bash
      Steps:
        1. cat tests/fixtures/MANIFEST.md
        2. Assert: Contains entries for MINIMUM, BASIC, EN16931, EXTENDED
      Expected Result: Manifest describes all test fixtures
      Evidence: Manifest content captured
    

    Commit: YES

    • Message: test(fixtures): add official ZUGFeRD sample PDFs for all profiles
    • Files: tests/fixtures/*.pdf, tests/fixtures/MANIFEST.md

  • 3. Create Pydantic Models

    What to do:

    • Define all Pydantic models as per API specification
    • Models: ExtractRequest, ExtractResponse, ValidateRequest, ValidateResponse
    • Nested models: Supplier, Buyer, LineItem, Totals, VatBreakdown, PaymentTerms
    • Error models: ErrorResponse, ValidationError
    • Add field validators where appropriate (e.g., VAT ID format)

    Must NOT do:

    • Do NOT add ORM mappings (no database)
    • Do NOT add serialization beyond Pydantic defaults

    Recommended Agent Profile:

    • Category: quick
      • Reason: Straightforward Pydantic model definitions from spec
    • Skills: []
      • Standard Python work

    Parallelization:

    • Can Run In Parallel: YES
    • Parallel Group: Wave 1 (with Task 2)
    • Blocks: Tasks 4, 5, 7
    • Blocked By: Task 1

    References:

    Key Models from Spec:

    class Supplier(BaseModel):
        name: str
        street: str | None = None
        postal_code: str | None = None
        city: str | None = None
        country: str | None = None
        vat_id: str | None = None
        email: str | None = None
    
    class LineItem(BaseModel):
        position: int
        article_number: str | None = None
        article_number_buyer: str | None = None
        description: str
        quantity: float
        unit: str  # Human-readable, translated from UN/ECE code
        unit_price: float
        line_total: float
        vat_rate: float | None = None
        vat_amount: float | None = None
    
    class ExtractResponse(BaseModel):
        is_zugferd: bool
        zugferd_profil: str | None = None
        xml_raw: str | None = None
        xml_data: XmlData | None = None
        pdf_text: str | None = None
        extraction_meta: ExtractionMeta
    

    Acceptance Criteria:

    TDD - Write Tests First:

    # tests/test_models.py
    def test_extract_response_zugferd():
        response = ExtractResponse(
            is_zugferd=True,
            zugferd_profil="EN16931",
            xml_raw="<?xml ...",
            # ... full data
        )
        assert response.is_zugferd is True
        assert response.zugferd_profil == "EN16931"
    
    def test_extract_response_non_zugferd():
        response = ExtractResponse(
            is_zugferd=False,
            pdf_text="Invoice text...",
            extraction_meta=ExtractionMeta(pages=1, extraction_time_ms=50)
        )
        assert response.xml_data is None
    
    def test_line_item_validation():
        item = LineItem(
            position=1,
            description="Widget",
            quantity=10.0,
            unit="Piece",
            unit_price=9.99,
            line_total=99.90
        )
        assert item.line_total == 99.90
    

    Agent-Executed QA Scenarios:

    Scenario: Models pass all unit tests
      Tool: Bash
      Steps:
        1. pytest tests/test_models.py -v
        2. Assert: All tests pass (exit code 0)
      Expected Result: 100% test pass rate
      Evidence: pytest output captured
    
    Scenario: Models serialize to JSON correctly
      Tool: Bash
      Steps:
        1. python -c "from src.models import ExtractResponse, ExtractionMeta; r = ExtractResponse(is_zugferd=False, extraction_meta=ExtractionMeta(pages=1, extraction_time_ms=50)); print(r.model_dump_json())"
        2. Assert: Valid JSON output containing "is_zugferd": false
      Expected Result: JSON serialization works
      Evidence: JSON output captured
    

    Commit: YES

    • Message: feat(models): add Pydantic models for all API request/response schemas
    • Files: src/models.py, tests/test_models.py
    • Pre-commit: pytest tests/test_models.py

Wave 2: Core Extraction Logic

  • 4. ZUGFeRD Extractor Implementation (TDD)

    What to do:

    • Write tests first using sample PDFs from fixtures
    • Implement extract_zugferd() function using factur-x library
    • Handle profile detection (get_level, get_flavor)
    • Parse XML to structured data matching models
    • Handle non-ZUGFeRD PDFs gracefully
    • Handle errors (corrupt PDF, password-protected)

    Must NOT do:

    • Do NOT implement custom XML parsing if factur-x provides it
    • Do NOT cache extracted data

    Recommended Agent Profile:

    • Category: unspecified-high
      • Reason: Core business logic requiring careful implementation
    • Skills: [systematic-debugging]
      • systematic-debugging: For handling edge cases in PDF extraction

    Parallelization:

    • Can Run In Parallel: YES
    • Parallel Group: Wave 2 (with Tasks 5, 6)
    • Blocks: Tasks 7, 10
    • Blocked By: Tasks 2, 3

    References:

    Key Functions:

    from facturx import get_xml_from_pdf, get_level, get_flavor
    
    def extract_zugferd(pdf_bytes: bytes) -> ExtractResponse:
        """Extract ZUGFeRD data from PDF bytes."""
        # 1. Check file size (<10MB)
        # 2. Try to extract XML using factur-x
        # 3. Detect profile and flavor
        # 4. Parse XML to structured data
        # 5. Extract PDF text for pdf_abgleich
        # 6. Return ExtractResponse
    

    Acceptance Criteria:

    TDD - Write Tests First:

    # tests/test_extractor.py
    def test_extract_en16931_profile(sample_en16931_pdf):
        result = extract_zugferd(sample_en16931_pdf)
        assert result.is_zugferd is True
        assert result.zugferd_profil == "EN16931"
        assert result.xml_data is not None
        assert result.xml_data.invoice_number is not None
    
    def test_extract_non_zugferd_pdf(sample_plain_pdf):
        result = extract_zugferd(sample_plain_pdf)
        assert result.is_zugferd is False
        assert result.xml_data is None
        assert result.pdf_text is not None
    
    def test_extract_corrupt_pdf():
        with pytest.raises(ExtractionError) as exc:
            extract_zugferd(b"not a pdf")
        assert exc.value.error_code == "invalid_pdf"
    
    def test_file_size_limit():
        large_pdf = b"x" * (11 * 1024 * 1024)  # 11MB
        with pytest.raises(ExtractionError) as exc:
            extract_zugferd(large_pdf)
        assert exc.value.error_code == "file_too_large"
    

    Agent-Executed QA Scenarios:

    Scenario: Extractor passes all unit tests
      Tool: Bash
      Steps:
        1. pytest tests/test_extractor.py -v
        2. Assert: All tests pass
      Expected Result: 100% test pass rate
      Evidence: pytest output captured
    
    Scenario: EN16931 profile correctly detected
      Tool: Bash
      Steps:
        1. python -c "
           from src.extractor import extract_zugferd
           with open('tests/fixtures/sample_en16931.pdf', 'rb') as f:
               result = extract_zugferd(f.read())
           print(f'Profile: {result.zugferd_profil}')
           print(f'Invoice: {result.xml_data.invoice_number}')"
        2. Assert: Output contains "Profile: EN16931" or "Profile: en16931"
      Expected Result: Profile correctly identified
      Evidence: Script output captured
    
    Scenario: Non-ZUGFeRD PDF handled gracefully
      Tool: Bash
      Steps:
        1. python -c "
           from src.extractor import extract_zugferd
           with open('tests/fixtures/sample_no_zugferd.pdf', 'rb') as f:
               result = extract_zugferd(f.read())
           assert result.is_zugferd == False
           assert result.pdf_text is not None
           print('OK: Non-ZUGFeRD handled correctly')"
        2. Assert: Output contains "OK:"
      Expected Result: Graceful handling
      Evidence: Script output captured
    

    Commit: YES

    • Message: feat(extractor): implement ZUGFeRD extraction with profile detection
    • Files: src/extractor.py, tests/test_extractor.py
    • Pre-commit: pytest tests/test_extractor.py

  • 5. PDF Text Parser Implementation (TDD)

    What to do:

    • Write tests first with expected extraction patterns
    • Implement PDF text extraction using pypdf
    • Create regex patterns for extracting key values from text
    • Extract: invoice_number, invoice_date, amounts (net, gross, vat)
    • Return confidence scores for each extraction

    Must NOT do:

    • Do NOT use OCR (text extraction only)
    • Do NOT parse images in PDFs

    Recommended Agent Profile:

    • Category: unspecified-high
      • Reason: Regex pattern development requires careful testing
    • Skills: [systematic-debugging]
      • systematic-debugging: For developing and testing regex patterns

    Parallelization:

    • Can Run In Parallel: YES
    • Parallel Group: Wave 2 (with Tasks 4, 6)
    • Blocks: Task 7
    • Blocked By: Task 2

    References:

    Key Patterns from Spec:

    EXTRACTION_PATTERNS = {
        "invoice_number": [
            r"Rechnungs?-?(?:Nr|Nummer)[.:\s]*([A-Z0-9\-]+)",
            r"Invoice\s*(?:No|Number)?[.:\s]*([A-Z0-9\-]+)",
            r"Beleg-?Nr[.:\s]*([A-Z0-9\-]+)"
        ],
        "gross_amount": [
            r"Brutto[:\s]*([0-9.,]+)\s*(?:EUR|€)?",
            r"Gesamtbetrag[:\s]*([0-9.,]+)",
            r"Total[:\s]*([0-9.,]+)\s*(?:EUR|€)?"
        ],
        # ... more patterns
    }
    

    Acceptance Criteria:

    TDD - Write Tests First:

    # tests/test_pdf_parser.py
    def test_extract_invoice_number():
        text = "Rechnung\nRechnungs-Nr.: RE-2025-001234\nDatum: 04.02.2025"
        result = extract_from_text(text)
        assert result["invoice_number"] == "RE-2025-001234"
        assert result["invoice_number_confidence"] >= 0.9
    
    def test_extract_amounts():
        text = "Netto: 100,00 EUR\nMwSt 19%: 19,00 EUR\nBrutto: 119,00 EUR"
        result = extract_from_text(text)
        assert result["gross_amount"] == 119.00
        assert result["net_amount"] == 100.00
    
    def test_german_number_format():
        text = "Gesamtbetrag: 1.234,56 €"
        result = extract_from_text(text)
        assert result["gross_amount"] == 1234.56
    

    Agent-Executed QA Scenarios:

    Scenario: PDF parser passes all unit tests
      Tool: Bash
      Steps:
        1. pytest tests/test_pdf_parser.py -v
        2. Assert: All tests pass
      Expected Result: 100% test pass rate
      Evidence: pytest output captured
    
    Scenario: Real PDF text extraction works
      Tool: Bash
      Steps:
        1. python -c "
           from src.pdf_parser import extract_text_from_pdf, extract_from_text
           with open('tests/fixtures/sample_en16931.pdf', 'rb') as f:
               text = extract_text_from_pdf(f.read())
           print(f'Extracted {len(text)} characters')
           result = extract_from_text(text)
           print(f'Invoice: {result.get(\"invoice_number\", \"NOT FOUND\")}')"
        2. Assert: Output shows extracted characters and invoice number
      Expected Result: Text extraction works on real PDFs
      Evidence: Script output captured
    

    Commit: YES

    • Message: feat(pdf-parser): implement PDF text extraction with regex patterns
    • Files: src/pdf_parser.py, tests/test_pdf_parser.py
    • Pre-commit: pytest tests/test_pdf_parser.py

  • 6. Utility Functions Implementation

    What to do:

    • Create UN/ECE unit code mapping dictionary
    • Implement decimal rounding helper (2 decimal places, standard rounding)
    • Implement tolerance comparison function (0.01 EUR)
    • Add German date format parser

    Must NOT do:

    • Do NOT create configurable tolerance (hardcode 0.01)
    • Do NOT add unit codes beyond common ones (expand later if needed)

    Recommended Agent Profile:

    • Category: quick
      • Reason: Simple utility functions
    • Skills: []
      • Standard Python work

    Parallelization:

    • Can Run In Parallel: YES
    • Parallel Group: Wave 2 (with Tasks 4, 5)
    • Blocks: Tasks 4, 7
    • Blocked By: Task 1

    References:

    Unit Code Dictionary:

    UNECE_UNIT_CODES = {
        "C62": "Stück",
        "H87": "Stück",
        "KGM": "Kilogramm",
        "GRM": "Gramm",
        "TNE": "Tonne",
        "MTR": "Meter",
        "KMT": "Kilometer",
        "MTK": "Quadratmeter",
        "LTR": "Liter",
        "MLT": "Milliliter",
        "DAY": "Tag",
        "HUR": "Stunde",
        "MON": "Monat",
        "ANN": "Jahr",
        "SET": "Set",
        "PCE": "Stück",
        "EA": "Stück",
    }
    

    Acceptance Criteria:

    Agent-Executed QA Scenarios:

    Scenario: Unit code translation works
      Tool: Bash
      Steps:
        1. python -c "
           from src.utils import translate_unit_code
           assert translate_unit_code('C62') == 'Stück'
           assert translate_unit_code('KGM') == 'Kilogramm'
           assert translate_unit_code('UNKNOWN') == 'UNKNOWN'
           print('OK: Unit codes translate correctly')"
        2. Assert: Output contains "OK:"
      Expected Result: All translations correct
      Evidence: Script output captured
    
    Scenario: Decimal comparison with tolerance
      Tool: Bash
      Steps:
        1. python -c "
           from src.utils import amounts_match
           assert amounts_match(100.00, 100.00) == True
           assert amounts_match(100.00, 100.005) == True  # Within 0.01
           assert amounts_match(100.00, 100.02) == False  # Outside tolerance
           print('OK: Tolerance comparison works')"
        2. Assert: Output contains "OK:"
      Expected Result: Tolerance logic correct
      Evidence: Script output captured
    

    Commit: YES

    • Message: feat(utils): add unit code mapping and decimal utilities
    • Files: src/utils.py, tests/test_utils.py
    • Pre-commit: pytest tests/test_utils.py

Wave 3: Validation Logic

  • 7. Validator Implementation (TDD)

    What to do:

    • Write tests first for each validation check
    • Implement all 4 validation checks:
      1. pflichtfelder - Required fields check
      2. betraege - Amount calculations check
      3. ustid - VAT ID format check
      4. pdf_abgleich - XML vs PDF comparison
    • Return structured ValidationResult with errors and warnings

    Must NOT do:

    • Do NOT implement online USt-ID validation
    • Do NOT add validation checks beyond spec

    Recommended Agent Profile:

    • Category: unspecified-high
      • Reason: Core business logic with multiple validation rules
    • Skills: [systematic-debugging]
      • systematic-debugging: For handling edge cases in validation logic

    Parallelization:

    • Can Run In Parallel: YES
    • Parallel Group: Wave 3 (with Tasks 8, 9)
    • Blocks: Task 11
    • Blocked By: Tasks 4, 5, 6

    References:

    • User Spec: Detailed validation logic tables
    • Pattern References: src/models.py ValidationError model

    Validation Rules from Spec:

    pflichtfelder (Required Fields):

    Field Severity
    invoice_number critical
    invoice_date critical
    supplier.name critical
    supplier.vat_id critical
    buyer.name critical
    totals.net critical
    totals.gross critical
    totals.vat_total critical
    due_date warning
    payment_terms.iban warning
    line_items (min 1) critical

    betraege (Calculations):

    • line_total ≈ quantity × unit_price (±0.01)
    • totals.net ≈ Σ(line_items.line_total) (±0.01)
    • vat_breakdown.amount ≈ base × (rate/100) (±0.01)
    • totals.vat_total ≈ Σ(vat_breakdown.amount) (±0.01)
    • totals.gross ≈ totals.net + totals.vat_total (±0.01)

    ustid (VAT ID Format):

    Country Regex
    DE ^DE[0-9]{9}$
    AT ^ATU[0-9]{8}$
    CH ^CHE[0-9]{9}(MWST|TVA|IVA)$

    Acceptance Criteria:

    TDD - Write Tests First:

    # tests/test_validator.py
    def test_pflichtfelder_missing_invoice_number():
        data = XmlData(invoice_number=None, ...)
        result = validate_pflichtfelder(data)
        assert any(e.field == "invoice_number" for e in result.errors)
        assert any(e.severity == "critical" for e in result.errors)
    
    def test_betraege_calculation_mismatch():
        data = XmlData(
            line_items=[LineItem(quantity=10, unit_price=9.99, line_total=100.00)],  # Wrong!
            ...
        )
        result = validate_betraege(data)
        assert any(e.error_code == "calculation_mismatch" for e in result.errors)
        assert any("99.90" in e.message for e in result.errors)
    
    def test_ustid_valid_german():
        result = validate_ustid("DE123456789")
        assert result.is_valid is True
    
    def test_ustid_invalid_format():
        result = validate_ustid("DE12345")  # Too short
        assert result.is_valid is False
        assert result.error_code == "invalid_format"
    
    def test_pdf_abgleich_mismatch():
        xml_data = XmlData(invoice_number="RE-001", totals=Totals(gross=118.88))
        pdf_values = {"invoice_number": "RE-002", "gross_amount": 118.88}
        result = validate_pdf_abgleich(xml_data, pdf_values)
        assert any(e.field == "invoice_number" for e in result.errors)
    

    Agent-Executed QA Scenarios:

    Scenario: Validator passes all unit tests
      Tool: Bash
      Steps:
        1. pytest tests/test_validator.py -v
        2. Assert: All tests pass
      Expected Result: 100% test pass rate
      Evidence: pytest output captured
    
    Scenario: All validation checks work together
      Tool: Bash
      Steps:
        1. python -c "
           from src.validator import validate_invoice
           from src.models import XmlData, ValidateRequest
           # Create a request with known issues
           request = ValidateRequest(
               xml_data={'invoice_number': None, ...},
               checks=['pflichtfelder', 'betraege']
           )
           result = validate_invoice(request)
           print(f'Errors: {len(result.errors)}')"
        2. Assert: Script runs without error, shows error count
      Expected Result: Validator processes all checks
      Evidence: Script output captured
    

    Commit: YES

    • Message: feat(validator): implement all validation checks (pflichtfelder, betraege, ustid, pdf_abgleich)
    • Files: src/validator.py, tests/test_validator.py
    • Pre-commit: pytest tests/test_validator.py

Wave 3 (continued): API Foundation

  • 8. FastAPI Application Structure

    What to do:

    • Create FastAPI app instance in main.py
    • Configure exception handlers for custom errors
    • Set up structured JSON logging
    • Add CORS middleware (for local development)
    • Configure app metadata (title, version, description)

    Must NOT do:

    • Do NOT add authentication middleware
    • Do NOT add rate limiting

    Recommended Agent Profile:

    • Category: quick
      • Reason: Standard FastAPI setup
    • Skills: []
      • Standard FastAPI patterns

    Parallelization:

    • Can Run In Parallel: YES
    • Parallel Group: Wave 3 (with Tasks 7, 9)
    • Blocks: Tasks 9, 10, 11
    • Blocked By: Task 3

    References:

    Acceptance Criteria:

    Agent-Executed QA Scenarios:

    Scenario: FastAPI app starts
      Tool: Bash
      Steps:
        1. timeout 5 uvicorn src.main:app --port 5001 &
        2. sleep 2
        3. curl -s http://localhost:5001/openapi.json | head -c 100
        4. Assert: Output contains "openapi"
        5. kill %1
      Expected Result: OpenAPI schema accessible
      Evidence: curl output captured
    

    Commit: YES

    • Message: feat(api): create FastAPI application structure with error handling
    • Files: src/main.py
    • Pre-commit: python -c "from src.main import app; print(app.title)"

  • 9. Health Endpoint Implementation

    What to do:

    • Implement GET /health endpoint
    • Return version from pyproject.toml
    • Return simple health status

    Must NOT do:

    • Do NOT add complex health checks (no dependencies to check)

    Recommended Agent Profile:

    • Category: quick
      • Reason: Simple endpoint
    • Skills: []

    Parallelization:

    • Can Run In Parallel: YES
    • Parallel Group: Wave 3 (with Tasks 7, 8)
    • Blocks: Tasks 10, 11
    • Blocked By: Task 8

    References:

    • User Spec: Health endpoint response format

    Acceptance Criteria:

    Agent-Executed QA Scenarios:

    Scenario: Health endpoint returns correct format
      Tool: Bash
      Steps:
        1. uvicorn src.main:app --port 5002 &
        2. sleep 2
        3. curl -s http://localhost:5002/health
        4. Assert: Response contains {"status": "healthy", "version": "1.0.0"}
        5. kill %1
      Expected Result: Health check passes
      Evidence: curl output captured
    

    Commit: YES (groups with Task 8)

    • Message: feat(api): add health endpoint
    • Files: src/main.py

Wave 4: API Endpoints

  • 10. Extract Endpoint Implementation (TDD)

    What to do:

    • Write integration tests for /extract endpoint
    • Implement POST /extract endpoint
    • Accept base64-encoded PDF in JSON body
    • Use extractor module for extraction
    • Return structured ExtractResponse

    Must NOT do:

    • Do NOT accept multipart file uploads (JSON only per spec)

    Recommended Agent Profile:

    • Category: unspecified-high
      • Reason: Integration of extractor into API endpoint
    • Skills: []

    Parallelization:

    • Can Run In Parallel: YES
    • Parallel Group: Wave 4 (with Tasks 11, 12)
    • Blocks: Task 13
    • Blocked By: Tasks 4, 8, 9

    References:

    • User Spec: /extract request/response format
    • Pattern References: src/extractor.py

    Acceptance Criteria:

    TDD - Write Tests First:

    # tests/test_api.py
    @pytest.fixture
    def client():
        return TestClient(app)
    
    def test_extract_zugferd_pdf(client, sample_en16931_pdf_base64):
        response = client.post("/extract", json={"pdf_base64": sample_en16931_pdf_base64})
        assert response.status_code == 200
        data = response.json()
        assert data["is_zugferd"] is True
        assert data["zugferd_profil"] is not None
    
    def test_extract_invalid_base64(client):
        response = client.post("/extract", json={"pdf_base64": "not-valid-base64!!!"})
        assert response.status_code == 400
        assert response.json()["error"] == "invalid_base64"
    
    def test_extract_non_pdf(client):
        # Base64 of "Hello World" (not a PDF)
        response = client.post("/extract", json={"pdf_base64": "SGVsbG8gV29ybGQ="})
        assert response.status_code == 400
        assert response.json()["error"] == "invalid_pdf"
    

    Agent-Executed QA Scenarios:

    Scenario: Extract endpoint integration test
      Tool: Bash
      Steps:
        1. pytest tests/test_api.py::test_extract -v
        2. Assert: All extract tests pass
      Expected Result: Endpoint works correctly
      Evidence: pytest output captured
    
    Scenario: Live extract endpoint test
      Tool: Bash
      Steps:
        1. uvicorn src.main:app --port 5003 &
        2. sleep 2
        3. PDF_BASE64=$(base64 -w 0 tests/fixtures/sample_en16931.pdf)
        4. curl -s -X POST http://localhost:5003/extract \
             -H "Content-Type: application/json" \
             -d "{\"pdf_base64\": \"$PDF_BASE64\"}" | jq '.is_zugferd'
        5. Assert: Output is "true"
        6. kill %1
      Expected Result: ZUGFeRD detected
      Evidence: curl output captured
    

    Commit: YES

    • Message: feat(api): implement /extract endpoint for PDF processing
    • Files: src/main.py, tests/test_api.py
    • Pre-commit: pytest tests/test_api.py::test_extract

  • 11. Validate Endpoint Implementation (TDD)

    What to do:

    • Write integration tests for /validate endpoint
    • Implement POST /validate endpoint
    • Accept xml_data, pdf_text, and checks array
    • Use validator module for validation
    • Return structured ValidateResponse

    Must NOT do:

    • Do NOT run checks not in the request's checks array

    Recommended Agent Profile:

    • Category: unspecified-high
      • Reason: Integration of validator into API endpoint
    • Skills: []

    Parallelization:

    • Can Run In Parallel: YES
    • Parallel Group: Wave 4 (with Tasks 10, 12)
    • Blocks: Task 13
    • Blocked By: Tasks 7, 8, 9

    References:

    • User Spec: /validate request/response format
    • Pattern References: src/validator.py

    Acceptance Criteria:

    TDD - Write Tests First:

    def test_validate_all_checks(client):
        response = client.post("/validate", json={
            "xml_data": {...},
            "pdf_text": "...",
            "checks": ["pflichtfelder", "betraege", "ustid", "pdf_abgleich"]
        })
        assert response.status_code == 200
        data = response.json()
        assert "is_valid" in data
        assert "errors" in data
        assert "summary" in data
    
    def test_validate_partial_checks(client):
        response = client.post("/validate", json={
            "xml_data": {...},
            "checks": ["pflichtfelder"]
        })
        assert response.status_code == 200
        # Only pflichtfelder check should run
    

    Agent-Executed QA Scenarios:

    Scenario: Validate endpoint integration test
      Tool: Bash
      Steps:
        1. pytest tests/test_api.py::test_validate -v
        2. Assert: All validate tests pass
      Expected Result: Endpoint works correctly
      Evidence: pytest output captured
    
    Scenario: Live validate endpoint with invalid data
      Tool: Bash
      Steps:
        1. uvicorn src.main:app --port 5004 &
        2. sleep 2
        3. curl -s -X POST http://localhost:5004/validate \
             -H "Content-Type: application/json" \
             -d '{"xml_data": {"invoice_number": null}, "checks": ["pflichtfelder"]}' | jq '.is_valid'
        4. Assert: Output is "false"
        5. kill %1
      Expected Result: Validation detects missing field
      Evidence: curl output captured
    

    Commit: YES

    • Message: feat(api): implement /validate endpoint for invoice validation
    • Files: src/main.py, tests/test_api.py
    • Pre-commit: pytest tests/test_api.py::test_validate

  • 12. Error Handling Middleware

    What to do:

    • Implement exception handlers for all error types
    • Map exceptions to HTTP status codes and error responses
    • Ensure all errors return JSON format
    • Add request logging

    Must NOT do:

    • Do NOT expose stack traces in production

    Recommended Agent Profile:

    • Category: quick
      • Reason: Standard FastAPI error handling
    • Skills: []

    Parallelization:

    • Can Run In Parallel: YES
    • Parallel Group: Wave 4 (with Tasks 10, 11)
    • Blocks: Task 13
    • Blocked By: Task 8

    References:

    • User Spec: Error handling table

    Error Mapping from Spec:

    Error Status error_code
    Invalid JSON 400 invalid_json
    Not a PDF 400 invalid_pdf
    PDF corrupt 400 corrupt_pdf
    Base64 invalid 400 invalid_base64
    File too large 400 file_too_large
    Password protected 400 password_protected_pdf
    Internal error 500 internal_error

    Acceptance Criteria:

    Agent-Executed QA Scenarios:

    Scenario: Error responses are JSON
      Tool: Bash
      Steps:
        1. uvicorn src.main:app --port 5005 &
        2. sleep 2
        3. curl -s -X POST http://localhost:5005/extract \
             -H "Content-Type: application/json" \
             -d '{"pdf_base64": "invalid!!!"}' | jq '.error'
        4. Assert: Output is "invalid_base64"
        5. kill %1
      Expected Result: JSON error response
      Evidence: curl output captured
    

    Commit: YES (groups with Task 10, 11)

    • Message: feat(api): add comprehensive error handling middleware
    • Files: src/main.py

Wave 5: Packaging

  • 13. Integration Tests

    What to do:

    • Create end-to-end integration tests
    • Test full workflow: extract → validate
    • Test with all sample PDFs
    • Test error scenarios

    Must NOT do:

    • Do NOT create performance tests

    Recommended Agent Profile:

    • Category: unspecified-high
      • Reason: Integration testing requires comprehensive coverage
    • Skills: [systematic-debugging]

    Parallelization:

    • Can Run In Parallel: NO
    • Parallel Group: Sequential (depends on all endpoints)
    • Blocks: Tasks 14, 16
    • Blocked By: Tasks 10, 11, 12

    Acceptance Criteria:

    Agent-Executed QA Scenarios:

    Scenario: Full integration test suite passes
      Tool: Bash
      Steps:
        1. pytest tests/ -v --tb=short
        2. Assert: All tests pass (exit code 0)
      Expected Result: 100% test pass rate
      Evidence: pytest output captured
    

    Commit: YES

    • Message: test(integration): add end-to-end integration tests
    • Files: tests/test_integration.py
    • Pre-commit: pytest tests/

  • 14. Dockerfile Creation

    What to do:

    • Create multi-stage Dockerfile as per spec
    • Build stage: install dependencies
    • Production stage: slim image with app
    • Non-root user for security
    • Expose port 5000

    Must NOT do:

    • Do NOT include dev dependencies in production image

    Recommended Agent Profile:

    • Category: quick
      • Reason: Standard Dockerfile from spec template
    • Skills: []

    Parallelization:

    • Can Run In Parallel: YES
    • Parallel Group: Wave 5 (with Task 15)
    • Blocks: Task 16
    • Blocked By: Task 13

    References:

    • User Spec: Complete Dockerfile template provided

    Acceptance Criteria:

    Agent-Executed QA Scenarios:

    Scenario: Docker build succeeds
      Tool: Bash
      Steps:
        1. docker build -t zugferd-service:test .
        2. Assert: Exit code 0
        3. docker images zugferd-service:test --format "{{.Size}}"
        4. Assert: Size < 500MB
      Expected Result: Image builds and is reasonably sized
      Evidence: Build output captured
    
    Scenario: Container runs and responds
      Tool: Bash
      Steps:
        1. docker run -d --name zugferd-test -p 5006:5000 zugferd-service:test
        2. sleep 3
        3. curl -s http://localhost:5006/health | jq '.status'
        4. Assert: Output is "healthy"
        5. docker stop zugferd-test && docker rm zugferd-test
      Expected Result: Container is functional
      Evidence: curl output captured
    

    Commit: YES

    • Message: build(docker): add multi-stage Dockerfile for production
    • Files: Dockerfile
    • Pre-commit: docker build -t zugferd-service:test .

  • 15. Docker Compose Configuration

    What to do:

    • Create docker-compose.yml for local development
    • Include volume mount for live reload
    • Configure environment variables
    • Add health check

    Must NOT do:

    • Do NOT add additional services (no DB, no cache)

    Recommended Agent Profile:

    • Category: quick
      • Reason: Simple docker-compose setup
    • Skills: []

    Parallelization:

    • Can Run In Parallel: YES
    • Parallel Group: Wave 5 (with Task 14)
    • Blocks: None
    • Blocked By: Task 13

    Acceptance Criteria:

    Agent-Executed QA Scenarios:

    Scenario: Docker Compose starts service
      Tool: Bash
      Steps:
        1. docker-compose up -d
        2. sleep 5
        3. curl -s http://localhost:5000/health | jq '.status'
        4. Assert: Output is "healthy"
        5. docker-compose down
      Expected Result: Compose setup works
      Evidence: curl output captured
    

    Commit: YES

    • Message: build(docker): add docker-compose.yml for local development
    • Files: docker-compose.yml

Wave 6: Nix Packaging

  • 16. Nix Flake Packaging

    What to do:

    • Create flake.nix with buildPythonApplication
    • Use pythonRelaxDeps for dependency flexibility
    • Include devShell for development
    • Test with nix build and nix run

    Must NOT do:

    • Do NOT create complex overlay structure

    Recommended Agent Profile:

    • Category: unspecified-high
      • Reason: Nix packaging requires careful dependency handling
    • Skills: []

    Parallelization:

    • Can Run In Parallel: NO
    • Parallel Group: Sequential
    • Blocks: Tasks 17, 18
    • Blocked By: Task 14

    References:

    flake.nix Structure:

    {
      description = "ZUGFeRD REST API Service";
    
      inputs = {
        nixpkgs.url = "github:NixOS/nixpkgs/nixos-unstable";
        flake-utils.url = "github:numtide/flake-utils";
      };
    
      outputs = { self, nixpkgs, flake-utils }:
        flake-utils.lib.eachDefaultSystem (system:
          let
            pkgs = nixpkgs.legacyPackages.${system};
            pythonPackages = pkgs.python311Packages;
    
            zugferd-service = pythonPackages.buildPythonApplication {
              pname = "zugferd-service";
              version = "1.0.0";
              pyproject = true;
              src = ./.;
    
              pythonRelaxDeps = true;
    
              build-system = [ pythonPackages.hatchling ];
    
              dependencies = with pythonPackages; [
                fastapi
                uvicorn
                pydantic
                python-multipart
                # factur-x - may need packaging
                pypdf
                lxml
              ];
    
              nativeCheckInputs = with pythonPackages; [
                pytestCheckHook
                pytest-asyncio
                httpx
              ];
    
              passthru = {
                mainProgram = "zugferd-service";
              };
    
              meta = {
                description = "REST API for ZUGFeRD invoice extraction";
                license = pkgs.lib.licenses.mit;
              };
            };
          in
          {
            packages.default = zugferd-service;
            packages.zugferd-service = zugferd-service;
    
            devShells.default = pkgs.mkShell {
              packages = [
                (pkgs.python311.withPackages (ps: with ps; [
                  fastapi uvicorn pydantic pypdf lxml
                  pytest pytest-asyncio httpx
                ]))
              ];
            };
          }
        );
    }
    

    Acceptance Criteria:

    Agent-Executed QA Scenarios:

    Scenario: Nix flake builds successfully
      Tool: Bash
      Steps:
        1. nix build .#zugferd-service
        2. Assert: Exit code 0
        3. ls -la result/bin/
        4. Assert: zugferd-service binary exists
      Expected Result: Nix package builds
      Evidence: Build output captured
    
    Scenario: Nix package runs correctly
      Tool: Bash
      Steps:
        1. nix run .#zugferd-service &
        2. sleep 3
        3. curl -s http://localhost:5000/health | jq '.status'
        4. Assert: Output is "healthy"
        5. kill %1
      Expected Result: Nix-built service runs
      Evidence: curl output captured
    
    Scenario: Dev shell provides dependencies
      Tool: Bash
      Steps:
        1. nix develop -c python -c "import fastapi; import pypdf; print('OK')"
        2. Assert: Output is "OK"
      Expected Result: Dev shell has all deps
      Evidence: Command output captured
    

    Commit: YES

    • Message: build(nix): add flake.nix for Nix packaging
    • Files: flake.nix
    • Pre-commit: nix flake check

  • 17. NixOS Service Module Example

    What to do:

    • Create example NixOS module for deployment
    • Include service configuration options
    • Add systemd service definition
    • Document usage in README

    Must NOT do:

    • Do NOT create production-ready module (example only)

    Recommended Agent Profile:

    • Category: quick
      • Reason: Example module following standard patterns
    • Skills: []

    Parallelization:

    • Can Run In Parallel: NO
    • Parallel Group: Sequential
    • Blocks: Task 18
    • Blocked By: Task 16

    References:

    • User Spec: NixOS container configuration example
    • Librarian Research: NixOS service module pattern

    Acceptance Criteria:

    Agent-Executed QA Scenarios:

    Scenario: NixOS module syntax is valid
      Tool: Bash
      Steps:
        1. nix-instantiate --eval -E "import ./nix/module.nix"
        2. Assert: Exit code 0 or specific Nix evaluation output
      Expected Result: Module parses correctly
      Evidence: Nix output captured
    

    Commit: YES

    • Message: docs(nix): add example NixOS service module
    • Files: nix/module.nix

  • 18. README Documentation

    What to do:

    • Create comprehensive README.md
    • Include: Overview, Installation, Usage, API Reference
    • Add examples for Docker, Nix, and direct Python usage
    • Document all endpoints with curl examples
    • Include troubleshooting section

    Must NOT do:

    • Do NOT duplicate API spec (reference it)

    Recommended Agent Profile:

    • Category: writing
      • Reason: Documentation writing
    • Skills: []

    Parallelization:

    • Can Run In Parallel: NO
    • Parallel Group: Sequential (last task)
    • Blocks: None
    • Blocked By: Tasks 16, 17

    README Structure:

    # ZUGFeRD-Service
    
    ## Overview
    ## Quick Start
    ### Docker
    ### Nix
    ### Python (Development)
    ## API Reference
    ### GET /health
    ### POST /extract
    ### POST /validate
    ## Configuration
    ## NixOS Deployment
    ## Development
    ## Troubleshooting
    ## License
    

    Acceptance Criteria:

    Agent-Executed QA Scenarios:

    Scenario: README contains all required sections
      Tool: Bash
      Steps:
        1. grep -c "## " README.md
        2. Assert: At least 8 sections
        3. grep "curl" README.md | wc -l
        4. Assert: At least 3 curl examples
      Expected Result: Comprehensive documentation
      Evidence: grep output captured
    

    Commit: YES

    • Message: docs: add comprehensive README with installation and usage guide
    • Files: README.md

Commit Strategy

After Task Message Files Verification
1 feat(project): initialize ZUGFeRD service pyproject.toml, src/, tests/ toml parse
2 test(fixtures): add ZUGFeRD sample PDFs tests/fixtures/ file exists
3 feat(models): add Pydantic models src/models.py pytest
4 feat(extractor): implement extraction src/extractor.py pytest
5 feat(pdf-parser): implement PDF parsing src/pdf_parser.py pytest
6 feat(utils): add utilities src/utils.py pytest
7 feat(validator): implement validation src/validator.py pytest
8-9 feat(api): add FastAPI app + health src/main.py curl
10 feat(api): add /extract endpoint src/main.py pytest
11 feat(api): add /validate endpoint src/main.py pytest
12 feat(api): add error handling src/main.py curl
13 test(integration): add e2e tests tests/ pytest
14 build(docker): add Dockerfile Dockerfile docker build
15 build(docker): add docker-compose docker-compose.yml compose up
16 build(nix): add flake.nix flake.nix nix build
17 docs(nix): add NixOS module nix/module.nix nix eval
18 docs: add README README.md grep check

Success Criteria

Verification Commands

# All tests pass
pytest tests/ -v
# Expected: All tests pass (exit code 0)

# Docker builds and runs
docker build -t zugferd-service .
docker run -p 5000:5000 zugferd-service &
curl http://localhost:5000/health
# Expected: {"status": "healthy", "version": "1.0.0"}

# Nix builds and runs
nix build .#zugferd-service
./result/bin/zugferd-service &
curl http://localhost:5000/health
# Expected: {"status": "healthy", "version": "1.0.0"}

# Extract endpoint works
PDF_BASE64=$(base64 -w 0 tests/fixtures/sample_en16931.pdf)
curl -X POST http://localhost:5000/extract \
  -H "Content-Type: application/json" \
  -d "{\"pdf_base64\": \"$PDF_BASE64\"}" | jq '.is_zugferd'
# Expected: true

Final Checklist

  • All 18 tasks completed
  • All tests pass (pytest)
  • Docker image builds (<500MB)
  • Docker container runs and responds
  • Nix flake builds without errors
  • Nix package runs and responds
  • All endpoints return expected responses
  • README documents all features
  • No "Must NOT Have" items present