Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
ZUGFeRD-Service
A REST API service for extracting and validating ZUGFeRD/Factur-X invoice data from PDF files. Built with FastAPI and Python 3.11+.
Overview
ZUGFeRD-Service provides a simple HTTP API to:
- Extract structured invoice data from ZUGFeRD-enabled PDFs
- Detect and identify ZUGFeRD profiles (MINIMUM, BASIC, BASIC WL, EN16931, EXTENDED)
- Validate invoice data against business rules and regulatory requirements
- Compare XML data against PDF text content
ZUGFeRD (Zentraler User Guide der Forums Elektronische Rechnung Deutschland) is a German standard for electronic invoices, using the Cross Industry Invoice (CII) XML format embedded in PDF files (also known as Factur-X in France).
Quick Start
Docker
The quickest way to get started is using Docker:
# Build the image
docker build -t zugferd-service .
# Run the service
docker run -p 5000:5000 zugferd-service
# Or use Docker Compose
docker-compose up -d
Nix
If you're using Nix, build and run with:
# Build the package
nix build .#zugferd-service
# Run the service
nix run .#zugferd-service
# Enter development shell
nix develop
Python (Development)
For local development:
# Install dependencies
pip install -e .
# Run directly
python -m src.main
# Or use the installed script
zugferd-service
The service starts on http://0.0.0.0:5000 by default.
API Reference
GET /health
Check if the service is running.
Response:
{
"status": "healthy",
"version": "1.0.0"
}
Example:
curl http://localhost:5000/health
POST /extract
Extract ZUGFeRD data from a base64-encoded PDF file.
Request:
{
"pdf_base64": "JVBERi0xLjQKJeLjz9MK..."
}
Response (ZUGFeRD PDF):
{
"is_zugferd": true,
"zugferd_profil": "EN16931",
"xml_raw": "<?xml version=\"1.0\"...?>",
"xml_data": {
"invoice_number": "RE-2024-001",
"invoice_date": "2024-02-01",
"due_date": "2024-02-28",
"supplier": {
"name": "Acme Corp",
"street": "Main Street 123",
"postal_code": "12345",
"city": "Berlin",
"country": "DE",
"vat_id": "DE123456789",
"email": "billing@acme-corp.de"
},
"buyer": {
"name": "Customer GmbH",
"street": "Market Square 5",
"postal_code": "54321",
"city": "Hamburg",
"country": "DE",
"vat_id": "DE987654321"
},
"line_items": [
{
"position": 1,
"article_number": "ART-001",
"article_number_buyer": null,
"description": "Consulting Services",
"quantity": 10.0,
"unit": "HUR",
"unit_price": 100.0,
"line_total": 1000.0,
"vat_rate": 19.0,
"vat_amount": 190.0
}
],
"totals": {
"line_total_sum": 1000.0,
"net": 1000.0,
"vat_total": 190.0,
"gross": 1190.0,
"vat_breakdown": [
{
"rate": 19.0,
"base": 1000.0,
"amount": 190.0
}
]
},
"currency": "EUR",
"payment_terms": {
"iban": "DE89370400440532013000",
"bic": "COBADEFFXXX",
"account_holder": "Acme Corp"
},
"notes": "Payment due within 30 days"
},
"pdf_text": "Invoice RE-2024-001\nAcme Corp...",
"extraction_meta": {
"pages": 2,
"xml_attachment_name": "factur-x.xml",
"extraction_time_ms": 45
}
}
Response (Non-ZUGFeRD PDF):
{
"is_zugferd": false,
"zugferd_profil": null,
"xml_raw": null,
"xml_data": null,
"pdf_text": "Regular PDF content...",
"extraction_meta": {
"pages": 1,
"xml_attachment_name": null,
"extraction_time_ms": 20
}
}
Example:
# Convert PDF to base64 and extract
PDF_BASE64=$(base64 -w 0 invoice.pdf)
curl -X POST http://localhost:5000/extract \
-H "Content-Type: application/json" \
-d "{\"pdf_base64\": \"$PDF_BASE64\"}"
POST /validate
Validate invoice data against business rules and regulatory requirements.
Request:
{
"xml_data": {
"invoice_number": "RE-2024-001",
"invoice_date": "2024-02-01",
"due_date": "2024-02-28",
"supplier": {
"name": "Acme Corp",
"vat_id": "DE123456789"
},
"buyer": {
"name": "Customer GmbH",
"vat_id": "DE987654321"
},
"line_items": [
{
"position": 1,
"description": "Consulting Services",
"quantity": 10.0,
"unit": "HUR",
"unit_price": 100.0,
"line_total": 1000.0,
"vat_rate": 19.0
}
],
"totals": {
"line_total_sum": 1000.0,
"net": 1000.0,
"vat_total": 190.0,
"gross": 1190.0,
"vat_breakdown": [
{
"rate": 19.0,
"base": 1000.0,
"amount": 190.0
}
]
},
"currency": "EUR"
},
"pdf_text": "Invoice RE-2024-001\nTotal: 1190.00 EUR",
"checks": ["pflichtfelder", "betraege", "ustid", "pdf_abgleich"]
}
Response:
{
"result": {
"is_valid": true,
"errors": [],
"warnings": [],
"summary": {
"total_checks": 4,
"checks_passed": 4,
"checks_failed": 0,
"critical_errors": 0,
"warnings": 0
},
"validation_time_ms": 12
}
}
Example:
curl -X POST http://localhost:5000/validate \
-H "Content-Type: application/json" \
-d '{
"xml_data": {"invoice_number": "RE-001", ...},
"checks": ["pflichtfelder", "betraege"]
}'
Validation Checks
The service supports four validation checks:
1. pflichtfelder (Required Fields)
Validates that all critical invoice fields are present and non-empty:
- Critical errors: invoice_number, invoice_date, supplier.name, supplier.vat_id, buyer.name, totals.net, totals.gross, totals.vat_total, line_items array, line item fields
- Warnings: due_date, payment_terms.iban
2. betraege (Amount Calculations)
Verifies all monetary calculations are correct:
- Line total = quantity × unit_price (for each line item)
- totals.net = sum of all line totals
- VAT breakdown amount = base × (rate/100) (for each VAT entry)
- totals.vat_total = sum of VAT breakdown amounts
- totals.gross = totals.net + totals.vat_total
Uses a tolerance of 0.01 for floating-point comparison.
3. ustid (VAT ID Format)
Validates VAT ID format for supported countries:
- Germany (DE): DE followed by 9 digits (e.g.,
DE123456789) - Austria (AT): ATU followed by 8 digits (e.g.,
ATU12345678) - Switzerland (CH): CHE followed by 9 digits and MWST/TVA/IVA suffix (e.g.,
CHE123456789MWST)
4. pdf_abgleich (PDF Comparison)
Compares XML data against extracted PDF text:
- Invoice number exact match
- Totals (net, gross, vat_total) within tolerance
- Returns warnings (not errors) for mismatches
ZUGFeRD Profiles
The service detects and reports the following ZUGFeRD 2.x profiles:
| Profile | Description |
|---|---|
| MINIMUM | Minimal profile with basic invoice data |
| BASIC | Basic profile for simple B2B invoicing |
| BASIC WL | Basic profile with additional buyer data |
| EN16931 | Full profile compliant with EN 16931 standard |
| EXTENDED | Extended profile with additional optional fields |
The profile is automatically detected from the embedded XML metadata.
Configuration
Environment Variables
The service supports the following environment variables:
| Variable | Default | Description |
|---|---|---|
HOST |
0.0.0.0 |
Host address to bind to |
PORT |
5000 |
Port to listen on |
LOG_LEVEL |
INFO |
Logging level (DEBUG, INFO, WARNING, ERROR) |
Docker Compose
The provided docker-compose.yml includes:
- Port mapping:
5000:5000 - Health check endpoint
- Read-only source mount for development
- Restart policy:
unless-stopped
Nix
The flake provides:
packages.zugferd-service: Production builddevShells.default: Development shell with all dependencies
NixOS Deployment
Example NixOS module configuration:
{ config, pkgs, ... }:
let
zugferd-service = (import ./zugferd-service {}).packages.zugferd-service;
in {
systemd.services.zugferd-service = {
enable = true;
description = "ZUGFeRD Invoice Service";
after = [ "network.target" ];
wantedBy = [ "multi-user.target" ];
serviceConfig = {
ExecStart = "${zugferd-service}/bin/zugferd-service";
Restart = "always";
RestartSec = "10";
DynamicUser = true;
ProtectSystem = "strict";
ProtectHome = true;
PrivateTmp = true;
NoNewPrivileges = true;
};
environment = {
HOST = "127.0.0.1";
PORT = "5000";
LOG_LEVEL = "INFO";
};
};
}
For production, consider adding:
- Reverse proxy (nginx/caddy) with HTTPS
- Authentication middleware
- Rate limiting
- Logging aggregation
Development
Running Tests
# Run all tests
pytest
# Run with coverage
pytest --cov=src
# Run specific test file
pytest tests/test_extract.py
Project Structure
zugferd-service/
├── src/
│ ├── __init__.py
│ ├── main.py # FastAPI application and endpoints
│ ├── models.py # Pydantic models for requests/responses
│ ├── extractor.py # ZUGFeRD XML extraction logic
│ ├── validator.py # Invoice validation logic
│ ├── pdf_parser.py # PDF text extraction
│ └── utils.py # Utility functions
├── tests/
│ ├── test_extract.py
│ ├── test_validate.py
│ └── fixtures/ # Test PDF files
├── pyproject.toml # Project metadata and dependencies
├── Dockerfile # Multi-stage Docker build
├── docker-compose.yml # Docker Compose configuration
└── flake.nix # Nix flake for reproducible builds
Dependencies
Core:
- fastapi>=0.109.0 - Web framework
- uvicorn>=0.27.0 - ASGI server
- pydantic>=2.5.0 - Data validation
- factur-x>=2.5 - ZUGFeRD/Factur-X library
- pypdf>=4.0.0 - PDF text extraction
- lxml>=5.0.0 - XML processing
Development:
- pytest>=8.0.0 - Testing framework
- pytest-asyncio>=0.23.0 - Async test support
- httpx>=0.27.0 - HTTP client for testing
Troubleshooting
Common Issues
Service fails to start with "Address already in use"
Change the port:
# Docker
docker run -p 8000:5000 zugferd-service
# Nix/Python
PORT=8000 python -m src.main
Extraction returns "is_zugferd: false"
- Verify the PDF contains ZUGFeRD/Factur-X XML attachment
- Check that the file is not password-protected
- Ensure the file is a valid PDF (not corrupt)
Validation fails with "missing_required" errors
Check that all required fields are present:
- invoice_number
- invoice_date (YYYY-MM-DD format)
- supplier.name and supplier.vat_id
- buyer.name
- Non-zero totals (net, gross, vat_total)
- At least one line item with description, quantity, and unit_price
VAT ID validation fails
Verify the VAT ID format:
- German:
DE+ 9 digits - Austrian:
ATU+ 8 digits - Swiss:
CHE+ 9 digits +MWST/TVA/IVA
Docker build is slow
Use BuildKit for faster builds:
DOCKER_BUILDKIT=1 docker build -t zugferd-service .
Error Codes
| Error Code | Description |
|---|---|
invalid_base64 |
Invalid base64 encoding in request |
file_too_large |
PDF exceeds 10MB limit |
password_protected_pdf |
PDF is password-protected |
invalid_pdf |
File is not a valid PDF |
corrupt_pdf |
PDF file is corrupted or unreadable |
invalid_xml |
Embedded XML is malformed |
Logs
The service outputs structured JSON logs:
{
"timestamp": "2024-02-01T10:30:00Z",
"level": "INFO",
"message": "Extraction completed",
"data": {
"is_zugferd": true,
"profile": "EN16931",
"extraction_time_ms": 45
}
}
License
MIT
Support
For issues, questions, or contributions, please refer to the project repository.