Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
527 lines
12 KiB
Markdown
527 lines
12 KiB
Markdown
# ZUGFeRD-Service
|
||
|
||
A REST API service for extracting and validating ZUGFeRD/Factur-X invoice data from PDF files. Built with FastAPI and Python 3.11+.
|
||
|
||
## Overview
|
||
|
||
ZUGFeRD-Service provides a simple HTTP API to:
|
||
|
||
- Extract structured invoice data from ZUGFeRD-enabled PDFs
|
||
- Detect and identify ZUGFeRD profiles (MINIMUM, BASIC, BASIC WL, EN16931, EXTENDED)
|
||
- Validate invoice data against business rules and regulatory requirements
|
||
- Compare XML data against PDF text content
|
||
|
||
ZUGFeRD (Zentraler User Guide der Forums Elektronische Rechnung Deutschland) is a German standard for electronic invoices, using the Cross Industry Invoice (CII) XML format embedded in PDF files (also known as Factur-X in France).
|
||
|
||
## Quick Start
|
||
|
||
### Docker
|
||
|
||
The quickest way to get started is using Docker:
|
||
|
||
```bash
|
||
# Build the image
|
||
docker build -t zugferd-service .
|
||
|
||
# Run the service
|
||
docker run -p 5000:5000 zugferd-service
|
||
|
||
# Or use Docker Compose
|
||
docker-compose up -d
|
||
```
|
||
|
||
### Nix
|
||
|
||
If you're using Nix, build and run with:
|
||
|
||
```bash
|
||
# Build the package
|
||
nix build .#zugferd-service
|
||
|
||
# Run the service
|
||
nix run .#zugferd-service
|
||
|
||
# Enter development shell
|
||
nix develop
|
||
```
|
||
|
||
### Python (Development)
|
||
|
||
For local development:
|
||
|
||
```bash
|
||
# Install dependencies
|
||
pip install -e .
|
||
|
||
# Run directly
|
||
python -m src.main
|
||
|
||
# Or use the installed script
|
||
zugferd-service
|
||
```
|
||
|
||
The service starts on `http://0.0.0.0:5000` by default.
|
||
|
||
## API Reference
|
||
|
||
### GET /health
|
||
|
||
Check if the service is running.
|
||
|
||
**Response:**
|
||
```json
|
||
{
|
||
"status": "healthy",
|
||
"version": "1.0.0"
|
||
}
|
||
```
|
||
|
||
**Example:**
|
||
```bash
|
||
curl http://localhost:5000/health
|
||
```
|
||
|
||
### POST /extract
|
||
|
||
Extract ZUGFeRD data from a base64-encoded PDF file.
|
||
|
||
**Request:**
|
||
```json
|
||
{
|
||
"pdf_base64": "JVBERi0xLjQKJeLjz9MK..."
|
||
}
|
||
```
|
||
|
||
**Response (ZUGFeRD PDF):**
|
||
```json
|
||
{
|
||
"is_zugferd": true,
|
||
"zugferd_profil": "EN16931",
|
||
"xml_raw": "<?xml version=\"1.0\"...?>",
|
||
"xml_data": {
|
||
"invoice_number": "RE-2024-001",
|
||
"invoice_date": "2024-02-01",
|
||
"due_date": "2024-02-28",
|
||
"supplier": {
|
||
"name": "Acme Corp",
|
||
"street": "Main Street 123",
|
||
"postal_code": "12345",
|
||
"city": "Berlin",
|
||
"country": "DE",
|
||
"vat_id": "DE123456789",
|
||
"email": "billing@acme-corp.de"
|
||
},
|
||
"buyer": {
|
||
"name": "Customer GmbH",
|
||
"street": "Market Square 5",
|
||
"postal_code": "54321",
|
||
"city": "Hamburg",
|
||
"country": "DE",
|
||
"vat_id": "DE987654321"
|
||
},
|
||
"line_items": [
|
||
{
|
||
"position": 1,
|
||
"article_number": "ART-001",
|
||
"article_number_buyer": null,
|
||
"description": "Consulting Services",
|
||
"quantity": 10.0,
|
||
"unit": "HUR",
|
||
"unit_price": 100.0,
|
||
"line_total": 1000.0,
|
||
"vat_rate": 19.0,
|
||
"vat_amount": 190.0
|
||
}
|
||
],
|
||
"totals": {
|
||
"line_total_sum": 1000.0,
|
||
"net": 1000.0,
|
||
"vat_total": 190.0,
|
||
"gross": 1190.0,
|
||
"vat_breakdown": [
|
||
{
|
||
"rate": 19.0,
|
||
"base": 1000.0,
|
||
"amount": 190.0
|
||
}
|
||
]
|
||
},
|
||
"currency": "EUR",
|
||
"payment_terms": {
|
||
"iban": "DE89370400440532013000",
|
||
"bic": "COBADEFFXXX",
|
||
"account_holder": "Acme Corp"
|
||
},
|
||
"notes": "Payment due within 30 days"
|
||
},
|
||
"pdf_text": "Invoice RE-2024-001\nAcme Corp...",
|
||
"extraction_meta": {
|
||
"pages": 2,
|
||
"xml_attachment_name": "factur-x.xml",
|
||
"extraction_time_ms": 45
|
||
}
|
||
}
|
||
```
|
||
|
||
**Response (Non-ZUGFeRD PDF):**
|
||
```json
|
||
{
|
||
"is_zugferd": false,
|
||
"zugferd_profil": null,
|
||
"xml_raw": null,
|
||
"xml_data": null,
|
||
"pdf_text": "Regular PDF content...",
|
||
"extraction_meta": {
|
||
"pages": 1,
|
||
"xml_attachment_name": null,
|
||
"extraction_time_ms": 20
|
||
}
|
||
}
|
||
```
|
||
|
||
**Example:**
|
||
```bash
|
||
# Convert PDF to base64 and extract
|
||
PDF_BASE64=$(base64 -w 0 invoice.pdf)
|
||
|
||
curl -X POST http://localhost:5000/extract \
|
||
-H "Content-Type: application/json" \
|
||
-d "{\"pdf_base64\": \"$PDF_BASE64\"}"
|
||
```
|
||
|
||
### POST /validate
|
||
|
||
Validate invoice data against business rules and regulatory requirements.
|
||
|
||
**Request:**
|
||
```json
|
||
{
|
||
"xml_data": {
|
||
"invoice_number": "RE-2024-001",
|
||
"invoice_date": "2024-02-01",
|
||
"due_date": "2024-02-28",
|
||
"supplier": {
|
||
"name": "Acme Corp",
|
||
"vat_id": "DE123456789"
|
||
},
|
||
"buyer": {
|
||
"name": "Customer GmbH",
|
||
"vat_id": "DE987654321"
|
||
},
|
||
"line_items": [
|
||
{
|
||
"position": 1,
|
||
"description": "Consulting Services",
|
||
"quantity": 10.0,
|
||
"unit": "HUR",
|
||
"unit_price": 100.0,
|
||
"line_total": 1000.0,
|
||
"vat_rate": 19.0
|
||
}
|
||
],
|
||
"totals": {
|
||
"line_total_sum": 1000.0,
|
||
"net": 1000.0,
|
||
"vat_total": 190.0,
|
||
"gross": 1190.0,
|
||
"vat_breakdown": [
|
||
{
|
||
"rate": 19.0,
|
||
"base": 1000.0,
|
||
"amount": 190.0
|
||
}
|
||
]
|
||
},
|
||
"currency": "EUR"
|
||
},
|
||
"pdf_text": "Invoice RE-2024-001\nTotal: 1190.00 EUR",
|
||
"checks": ["pflichtfelder", "betraege", "ustid", "pdf_abgleich"]
|
||
}
|
||
```
|
||
|
||
**Response:**
|
||
```json
|
||
{
|
||
"result": {
|
||
"is_valid": true,
|
||
"errors": [],
|
||
"warnings": [],
|
||
"summary": {
|
||
"total_checks": 4,
|
||
"checks_passed": 4,
|
||
"checks_failed": 0,
|
||
"critical_errors": 0,
|
||
"warnings": 0
|
||
},
|
||
"validation_time_ms": 12
|
||
}
|
||
}
|
||
```
|
||
|
||
**Example:**
|
||
```bash
|
||
curl -X POST http://localhost:5000/validate \
|
||
-H "Content-Type: application/json" \
|
||
-d '{
|
||
"xml_data": {"invoice_number": "RE-001", ...},
|
||
"checks": ["pflichtfelder", "betraege"]
|
||
}'
|
||
```
|
||
|
||
## Validation Checks
|
||
|
||
The service supports four validation checks:
|
||
|
||
### 1. pflichtfelder (Required Fields)
|
||
|
||
Validates that all critical invoice fields are present and non-empty:
|
||
|
||
- **Critical errors:** invoice_number, invoice_date, supplier.name, supplier.vat_id, buyer.name, totals.net, totals.gross, totals.vat_total, line_items array, line item fields
|
||
- **Warnings:** due_date, payment_terms.iban
|
||
|
||
### 2. betraege (Amount Calculations)
|
||
|
||
Verifies all monetary calculations are correct:
|
||
|
||
- Line total = quantity × unit_price (for each line item)
|
||
- totals.net = sum of all line totals
|
||
- VAT breakdown amount = base × (rate/100) (for each VAT entry)
|
||
- totals.vat_total = sum of VAT breakdown amounts
|
||
- totals.gross = totals.net + totals.vat_total
|
||
|
||
Uses a tolerance of 0.01 for floating-point comparison.
|
||
|
||
### 3. ustid (VAT ID Format)
|
||
|
||
Validates VAT ID format for supported countries:
|
||
|
||
- **Germany (DE):** DE followed by 9 digits (e.g., `DE123456789`)
|
||
- **Austria (AT):** ATU followed by 8 digits (e.g., `ATU12345678`)
|
||
- **Switzerland (CH):** CHE followed by 9 digits and MWST/TVA/IVA suffix (e.g., `CHE123456789MWST`)
|
||
|
||
### 4. pdf_abgleich (PDF Comparison)
|
||
|
||
Compares XML data against extracted PDF text:
|
||
|
||
- Invoice number exact match
|
||
- Totals (net, gross, vat_total) within tolerance
|
||
- Returns warnings (not errors) for mismatches
|
||
|
||
## ZUGFeRD Profiles
|
||
|
||
The service detects and reports the following ZUGFeRD 2.x profiles:
|
||
|
||
| Profile | Description |
|
||
|---------|-------------|
|
||
| MINIMUM | Minimal profile with basic invoice data |
|
||
| BASIC | Basic profile for simple B2B invoicing |
|
||
| BASIC WL | Basic profile with additional buyer data |
|
||
| EN16931 | Full profile compliant with EN 16931 standard |
|
||
| EXTENDED | Extended profile with additional optional fields |
|
||
|
||
The profile is automatically detected from the embedded XML metadata.
|
||
|
||
## Configuration
|
||
|
||
### Environment Variables
|
||
|
||
The service supports the following environment variables:
|
||
|
||
| Variable | Default | Description |
|
||
|----------|---------|-------------|
|
||
| `HOST` | `0.0.0.0` | Host address to bind to |
|
||
| `PORT` | `5000` | Port to listen on |
|
||
| `LOG_LEVEL` | `INFO` | Logging level (DEBUG, INFO, WARNING, ERROR) |
|
||
|
||
### Docker Compose
|
||
|
||
The provided `docker-compose.yml` includes:
|
||
|
||
- Port mapping: `5000:5000`
|
||
- Health check endpoint
|
||
- Read-only source mount for development
|
||
- Restart policy: `unless-stopped`
|
||
|
||
### Nix
|
||
|
||
The flake provides:
|
||
|
||
- `packages.zugferd-service`: Production build
|
||
- `devShells.default`: Development shell with all dependencies
|
||
|
||
## NixOS Deployment
|
||
|
||
Example NixOS module configuration:
|
||
|
||
```nix
|
||
{ config, pkgs, ... }:
|
||
|
||
let
|
||
zugferd-service = (import ./zugferd-service {}).packages.zugferd-service;
|
||
in {
|
||
systemd.services.zugferd-service = {
|
||
enable = true;
|
||
description = "ZUGFeRD Invoice Service";
|
||
after = [ "network.target" ];
|
||
wantedBy = [ "multi-user.target" ];
|
||
|
||
serviceConfig = {
|
||
ExecStart = "${zugferd-service}/bin/zugferd-service";
|
||
Restart = "always";
|
||
RestartSec = "10";
|
||
DynamicUser = true;
|
||
ProtectSystem = "strict";
|
||
ProtectHome = true;
|
||
PrivateTmp = true;
|
||
NoNewPrivileges = true;
|
||
};
|
||
|
||
environment = {
|
||
HOST = "127.0.0.1";
|
||
PORT = "5000";
|
||
LOG_LEVEL = "INFO";
|
||
};
|
||
};
|
||
}
|
||
```
|
||
|
||
For production, consider adding:
|
||
|
||
- Reverse proxy (nginx/caddy) with HTTPS
|
||
- Authentication middleware
|
||
- Rate limiting
|
||
- Logging aggregation
|
||
|
||
## Development
|
||
|
||
### Running Tests
|
||
|
||
```bash
|
||
# Run all tests
|
||
pytest
|
||
|
||
# Run with coverage
|
||
pytest --cov=src
|
||
|
||
# Run specific test file
|
||
pytest tests/test_extract.py
|
||
```
|
||
|
||
### Project Structure
|
||
|
||
```
|
||
zugferd-service/
|
||
├── src/
|
||
│ ├── __init__.py
|
||
│ ├── main.py # FastAPI application and endpoints
|
||
│ ├── models.py # Pydantic models for requests/responses
|
||
│ ├── extractor.py # ZUGFeRD XML extraction logic
|
||
│ ├── validator.py # Invoice validation logic
|
||
│ ├── pdf_parser.py # PDF text extraction
|
||
│ └── utils.py # Utility functions
|
||
├── tests/
|
||
│ ├── test_extract.py
|
||
│ ├── test_validate.py
|
||
│ └── fixtures/ # Test PDF files
|
||
├── pyproject.toml # Project metadata and dependencies
|
||
├── Dockerfile # Multi-stage Docker build
|
||
├── docker-compose.yml # Docker Compose configuration
|
||
└── flake.nix # Nix flake for reproducible builds
|
||
```
|
||
|
||
### Dependencies
|
||
|
||
**Core:**
|
||
- fastapi>=0.109.0 - Web framework
|
||
- uvicorn>=0.27.0 - ASGI server
|
||
- pydantic>=2.5.0 - Data validation
|
||
- factur-x>=2.5 - ZUGFeRD/Factur-X library
|
||
- pypdf>=4.0.0 - PDF text extraction
|
||
- lxml>=5.0.0 - XML processing
|
||
|
||
**Development:**
|
||
- pytest>=8.0.0 - Testing framework
|
||
- pytest-asyncio>=0.23.0 - Async test support
|
||
- httpx>=0.27.0 - HTTP client for testing
|
||
|
||
## Troubleshooting
|
||
|
||
### Common Issues
|
||
|
||
**Service fails to start with "Address already in use"**
|
||
|
||
Change the port:
|
||
```bash
|
||
# Docker
|
||
docker run -p 8000:5000 zugferd-service
|
||
|
||
# Nix/Python
|
||
PORT=8000 python -m src.main
|
||
```
|
||
|
||
**Extraction returns "is_zugferd: false"**
|
||
|
||
- Verify the PDF contains ZUGFeRD/Factur-X XML attachment
|
||
- Check that the file is not password-protected
|
||
- Ensure the file is a valid PDF (not corrupt)
|
||
|
||
**Validation fails with "missing_required" errors**
|
||
|
||
Check that all required fields are present:
|
||
- invoice_number
|
||
- invoice_date (YYYY-MM-DD format)
|
||
- supplier.name and supplier.vat_id
|
||
- buyer.name
|
||
- Non-zero totals (net, gross, vat_total)
|
||
- At least one line item with description, quantity, and unit_price
|
||
|
||
**VAT ID validation fails**
|
||
|
||
Verify the VAT ID format:
|
||
- German: `DE` + 9 digits
|
||
- Austrian: `ATU` + 8 digits
|
||
- Swiss: `CHE` + 9 digits + `MWST`/`TVA`/`IVA`
|
||
|
||
**Docker build is slow**
|
||
|
||
Use BuildKit for faster builds:
|
||
```bash
|
||
DOCKER_BUILDKIT=1 docker build -t zugferd-service .
|
||
```
|
||
|
||
### Error Codes
|
||
|
||
| Error Code | Description |
|
||
|------------|-------------|
|
||
| `invalid_base64` | Invalid base64 encoding in request |
|
||
| `file_too_large` | PDF exceeds 10MB limit |
|
||
| `password_protected_pdf` | PDF is password-protected |
|
||
| `invalid_pdf` | File is not a valid PDF |
|
||
| `corrupt_pdf` | PDF file is corrupted or unreadable |
|
||
| `invalid_xml` | Embedded XML is malformed |
|
||
|
||
### Logs
|
||
|
||
The service outputs structured JSON logs:
|
||
|
||
```json
|
||
{
|
||
"timestamp": "2024-02-01T10:30:00Z",
|
||
"level": "INFO",
|
||
"message": "Extraction completed",
|
||
"data": {
|
||
"is_zugferd": true,
|
||
"profile": "EN16931",
|
||
"extraction_time_ms": 45
|
||
}
|
||
}
|
||
```
|
||
|
||
## License
|
||
|
||
MIT
|
||
|
||
## Support
|
||
|
||
For issues, questions, or contributions, please refer to the project repository.
|