ZUGFeRD-Service

A REST API service for extracting and validating ZUGFeRD/Factur-X invoice data from PDF files. Built with FastAPI and Python 3.11+.

Overview

ZUGFeRD-Service provides a simple HTTP API to:

  • Extract structured invoice data from ZUGFeRD-enabled PDFs
  • Detect and identify ZUGFeRD profiles (MINIMUM, BASIC, BASIC WL, EN16931, EXTENDED)
  • Validate invoice data against business rules and regulatory requirements
  • Compare XML data against PDF text content

ZUGFeRD (Zentraler User Guide der Forums Elektronische Rechnung Deutschland) is a German standard for electronic invoices, using the Cross Industry Invoice (CII) XML format embedded in PDF files (also known as Factur-X in France).

Quick Start

Docker

The quickest way to get started is using Docker:

# Build the image
docker build -t zugferd-service .

# Run the service
docker run -p 5000:5000 zugferd-service

# Or use Docker Compose
docker-compose up -d

Nix

If you're using Nix, build and run with:

# Build the package
nix build .#zugferd-service

# Run the service
nix run .#zugferd-service

# Enter development shell
nix develop

Python (Development)

For local development:

# Install dependencies
pip install -e .

# Run directly
python -m src.main

# Or use the installed script
zugferd-service

The service starts on http://0.0.0.0:5000 by default.

API Reference

GET /health

Check if the service is running.

Response:

{
  "status": "healthy",
  "version": "1.0.0"
}

Example:

curl http://localhost:5000/health

POST /extract

Extract ZUGFeRD data from a base64-encoded PDF file.

Request:

{
  "pdf_base64": "JVBERi0xLjQKJeLjz9MK..."
}

Response (ZUGFeRD PDF):

{
  "is_zugferd": true,
  "zugferd_profil": "EN16931",
  "xml_raw": "<?xml version=\"1.0\"...?>",
  "xml_data": {
    "invoice_number": "RE-2024-001",
    "invoice_date": "2024-02-01",
    "due_date": "2024-02-28",
    "supplier": {
      "name": "Acme Corp",
      "street": "Main Street 123",
      "postal_code": "12345",
      "city": "Berlin",
      "country": "DE",
      "vat_id": "DE123456789",
      "email": "billing@acme-corp.de"
    },
    "buyer": {
      "name": "Customer GmbH",
      "street": "Market Square 5",
      "postal_code": "54321",
      "city": "Hamburg",
      "country": "DE",
      "vat_id": "DE987654321"
    },
    "line_items": [
      {
        "position": 1,
        "article_number": "ART-001",
        "article_number_buyer": null,
        "description": "Consulting Services",
        "quantity": 10.0,
        "unit": "HUR",
        "unit_price": 100.0,
        "line_total": 1000.0,
        "vat_rate": 19.0,
        "vat_amount": 190.0
      }
    ],
    "totals": {
      "line_total_sum": 1000.0,
      "net": 1000.0,
      "vat_total": 190.0,
      "gross": 1190.0,
      "vat_breakdown": [
        {
          "rate": 19.0,
          "base": 1000.0,
          "amount": 190.0
        }
      ]
    },
    "currency": "EUR",
    "payment_terms": {
      "iban": "DE89370400440532013000",
      "bic": "COBADEFFXXX",
      "account_holder": "Acme Corp"
    },
    "notes": "Payment due within 30 days"
  },
  "pdf_text": "Invoice RE-2024-001\nAcme Corp...",
  "extraction_meta": {
    "pages": 2,
    "xml_attachment_name": "factur-x.xml",
    "extraction_time_ms": 45
  }
}

Response (Non-ZUGFeRD PDF):

{
  "is_zugferd": false,
  "zugferd_profil": null,
  "xml_raw": null,
  "xml_data": null,
  "pdf_text": "Regular PDF content...",
  "extraction_meta": {
    "pages": 1,
    "xml_attachment_name": null,
    "extraction_time_ms": 20
  }
}

Example:

# Convert PDF to base64 and extract
PDF_BASE64=$(base64 -w 0 invoice.pdf)

curl -X POST http://localhost:5000/extract \
  -H "Content-Type: application/json" \
  -d "{\"pdf_base64\": \"$PDF_BASE64\"}"

POST /validate

Validate invoice data against business rules and regulatory requirements.

Request:

{
  "xml_data": {
    "invoice_number": "RE-2024-001",
    "invoice_date": "2024-02-01",
    "due_date": "2024-02-28",
    "supplier": {
      "name": "Acme Corp",
      "vat_id": "DE123456789"
    },
    "buyer": {
      "name": "Customer GmbH",
      "vat_id": "DE987654321"
    },
    "line_items": [
      {
        "position": 1,
        "description": "Consulting Services",
        "quantity": 10.0,
        "unit": "HUR",
        "unit_price": 100.0,
        "line_total": 1000.0,
        "vat_rate": 19.0
      }
    ],
    "totals": {
      "line_total_sum": 1000.0,
      "net": 1000.0,
      "vat_total": 190.0,
      "gross": 1190.0,
      "vat_breakdown": [
        {
          "rate": 19.0,
          "base": 1000.0,
          "amount": 190.0
        }
      ]
    },
    "currency": "EUR"
  },
  "pdf_text": "Invoice RE-2024-001\nTotal: 1190.00 EUR",
  "checks": ["pflichtfelder", "betraege", "ustid", "pdf_abgleich"]
}

Response:

{
  "result": {
    "is_valid": true,
    "errors": [],
    "warnings": [],
    "summary": {
      "total_checks": 4,
      "checks_passed": 4,
      "checks_failed": 0,
      "critical_errors": 0,
      "warnings": 0
    },
    "validation_time_ms": 12
  }
}

Example:

curl -X POST http://localhost:5000/validate \
  -H "Content-Type: application/json" \
  -d '{
    "xml_data": {"invoice_number": "RE-001", ...},
    "checks": ["pflichtfelder", "betraege"]
  }'

Validation Checks

The service supports four validation checks:

1. pflichtfelder (Required Fields)

Validates that all critical invoice fields are present and non-empty:

  • Critical errors: invoice_number, invoice_date, supplier.name, supplier.vat_id, buyer.name, totals.net, totals.gross, totals.vat_total, line_items array, line item fields
  • Warnings: due_date, payment_terms.iban

2. betraege (Amount Calculations)

Verifies all monetary calculations are correct:

  • Line total = quantity × unit_price (for each line item)
  • totals.net = sum of all line totals
  • VAT breakdown amount = base × (rate/100) (for each VAT entry)
  • totals.vat_total = sum of VAT breakdown amounts
  • totals.gross = totals.net + totals.vat_total

Uses a tolerance of 0.01 for floating-point comparison.

3. ustid (VAT ID Format)

Validates VAT ID format for supported countries:

  • Germany (DE): DE followed by 9 digits (e.g., DE123456789)
  • Austria (AT): ATU followed by 8 digits (e.g., ATU12345678)
  • Switzerland (CH): CHE followed by 9 digits and MWST/TVA/IVA suffix (e.g., CHE123456789MWST)

4. pdf_abgleich (PDF Comparison)

Compares XML data against extracted PDF text:

  • Invoice number exact match
  • Totals (net, gross, vat_total) within tolerance
  • Returns warnings (not errors) for mismatches

ZUGFeRD Profiles

The service detects and reports the following ZUGFeRD 2.x profiles:

Profile Description
MINIMUM Minimal profile with basic invoice data
BASIC Basic profile for simple B2B invoicing
BASIC WL Basic profile with additional buyer data
EN16931 Full profile compliant with EN 16931 standard
EXTENDED Extended profile with additional optional fields

The profile is automatically detected from the embedded XML metadata.

Configuration

Environment Variables

The service supports the following environment variables:

Variable Default Description
HOST 0.0.0.0 Host address to bind to
PORT 5000 Port to listen on
LOG_LEVEL INFO Logging level (DEBUG, INFO, WARNING, ERROR)

Docker Compose

The provided docker-compose.yml includes:

  • Port mapping: 5000:5000
  • Health check endpoint
  • Read-only source mount for development
  • Restart policy: unless-stopped

Nix

The flake provides:

  • packages.zugferd-service: Production build
  • devShells.default: Development shell with all dependencies

NixOS Deployment

Example NixOS module configuration:

{ config, pkgs, ... }:

let
  zugferd-service = (import ./zugferd-service {}).packages.zugferd-service;
in {
  systemd.services.zugferd-service = {
    enable = true;
    description = "ZUGFeRD Invoice Service";
    after = [ "network.target" ];
    wantedBy = [ "multi-user.target" ];

    serviceConfig = {
      ExecStart = "${zugferd-service}/bin/zugferd-service";
      Restart = "always";
      RestartSec = "10";
      DynamicUser = true;
      ProtectSystem = "strict";
      ProtectHome = true;
      PrivateTmp = true;
      NoNewPrivileges = true;
    };

    environment = {
      HOST = "127.0.0.1";
      PORT = "5000";
      LOG_LEVEL = "INFO";
    };
  };
}

For production, consider adding:

  • Reverse proxy (nginx/caddy) with HTTPS
  • Authentication middleware
  • Rate limiting
  • Logging aggregation

Development

Running Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=src

# Run specific test file
pytest tests/test_extract.py

Project Structure

zugferd-service/
├── src/
│   ├── __init__.py
│   ├── main.py          # FastAPI application and endpoints
│   ├── models.py        # Pydantic models for requests/responses
│   ├── extractor.py     # ZUGFeRD XML extraction logic
│   ├── validator.py     # Invoice validation logic
│   ├── pdf_parser.py    # PDF text extraction
│   └── utils.py         # Utility functions
├── tests/
│   ├── test_extract.py
│   ├── test_validate.py
│   └── fixtures/        # Test PDF files
├── pyproject.toml       # Project metadata and dependencies
├── Dockerfile           # Multi-stage Docker build
├── docker-compose.yml   # Docker Compose configuration
└── flake.nix           # Nix flake for reproducible builds

Dependencies

Core:

  • fastapi>=0.109.0 - Web framework
  • uvicorn>=0.27.0 - ASGI server
  • pydantic>=2.5.0 - Data validation
  • factur-x>=2.5 - ZUGFeRD/Factur-X library
  • pypdf>=4.0.0 - PDF text extraction
  • lxml>=5.0.0 - XML processing

Development:

  • pytest>=8.0.0 - Testing framework
  • pytest-asyncio>=0.23.0 - Async test support
  • httpx>=0.27.0 - HTTP client for testing

Troubleshooting

Common Issues

Service fails to start with "Address already in use"

Change the port:

# Docker
docker run -p 8000:5000 zugferd-service

# Nix/Python
PORT=8000 python -m src.main

Extraction returns "is_zugferd: false"

  • Verify the PDF contains ZUGFeRD/Factur-X XML attachment
  • Check that the file is not password-protected
  • Ensure the file is a valid PDF (not corrupt)

Validation fails with "missing_required" errors

Check that all required fields are present:

  • invoice_number
  • invoice_date (YYYY-MM-DD format)
  • supplier.name and supplier.vat_id
  • buyer.name
  • Non-zero totals (net, gross, vat_total)
  • At least one line item with description, quantity, and unit_price

VAT ID validation fails

Verify the VAT ID format:

  • German: DE + 9 digits
  • Austrian: ATU + 8 digits
  • Swiss: CHE + 9 digits + MWST/TVA/IVA

Docker build is slow

Use BuildKit for faster builds:

DOCKER_BUILDKIT=1 docker build -t zugferd-service .

Error Codes

Error Code Description
invalid_base64 Invalid base64 encoding in request
file_too_large PDF exceeds 10MB limit
password_protected_pdf PDF is password-protected
invalid_pdf File is not a valid PDF
corrupt_pdf PDF file is corrupted or unreadable
invalid_xml Embedded XML is malformed

Logs

The service outputs structured JSON logs:

{
  "timestamp": "2024-02-01T10:30:00Z",
  "level": "INFO",
  "message": "Extraction completed",
  "data": {
    "is_zugferd": true,
    "profile": "EN16931",
    "extraction_time_ms": 45
  }
}

License

MIT

Support

For issues, questions, or contributions, please refer to the project repository.

Description
No description provided
Readme 1.6 MiB
Languages
Python 96.7%
Nix 2.8%
Dockerfile 0.5%