Files
zugferd-service/README.md
2026-02-04 20:39:43 +01:00

527 lines
12 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# ZUGFeRD-Service
A REST API service for extracting and validating ZUGFeRD/Factur-X invoice data from PDF files. Built with FastAPI and Python 3.11+.
## Overview
ZUGFeRD-Service provides a simple HTTP API to:
- Extract structured invoice data from ZUGFeRD-enabled PDFs
- Detect and identify ZUGFeRD profiles (MINIMUM, BASIC, BASIC WL, EN16931, EXTENDED)
- Validate invoice data against business rules and regulatory requirements
- Compare XML data against PDF text content
ZUGFeRD (Zentraler User Guide der Forums Elektronische Rechnung Deutschland) is a German standard for electronic invoices, using the Cross Industry Invoice (CII) XML format embedded in PDF files (also known as Factur-X in France).
## Quick Start
### Docker
The quickest way to get started is using Docker:
```bash
# Build the image
docker build -t zugferd-service .
# Run the service
docker run -p 5000:5000 zugferd-service
# Or use Docker Compose
docker-compose up -d
```
### Nix
If you're using Nix, build and run with:
```bash
# Build the package
nix build .#zugferd-service
# Run the service
nix run .#zugferd-service
# Enter development shell
nix develop
```
### Python (Development)
For local development:
```bash
# Install dependencies
pip install -e .
# Run directly
python -m src.main
# Or use the installed script
zugferd-service
```
The service starts on `http://0.0.0.0:5000` by default.
## API Reference
### GET /health
Check if the service is running.
**Response:**
```json
{
"status": "healthy",
"version": "1.0.0"
}
```
**Example:**
```bash
curl http://localhost:5000/health
```
### POST /extract
Extract ZUGFeRD data from a base64-encoded PDF file.
**Request:**
```json
{
"pdf_base64": "JVBERi0xLjQKJeLjz9MK..."
}
```
**Response (ZUGFeRD PDF):**
```json
{
"is_zugferd": true,
"zugferd_profil": "EN16931",
"xml_raw": "<?xml version=\"1.0\"...?>",
"xml_data": {
"invoice_number": "RE-2024-001",
"invoice_date": "2024-02-01",
"due_date": "2024-02-28",
"supplier": {
"name": "Acme Corp",
"street": "Main Street 123",
"postal_code": "12345",
"city": "Berlin",
"country": "DE",
"vat_id": "DE123456789",
"email": "billing@acme-corp.de"
},
"buyer": {
"name": "Customer GmbH",
"street": "Market Square 5",
"postal_code": "54321",
"city": "Hamburg",
"country": "DE",
"vat_id": "DE987654321"
},
"line_items": [
{
"position": 1,
"article_number": "ART-001",
"article_number_buyer": null,
"description": "Consulting Services",
"quantity": 10.0,
"unit": "HUR",
"unit_price": 100.0,
"line_total": 1000.0,
"vat_rate": 19.0,
"vat_amount": 190.0
}
],
"totals": {
"line_total_sum": 1000.0,
"net": 1000.0,
"vat_total": 190.0,
"gross": 1190.0,
"vat_breakdown": [
{
"rate": 19.0,
"base": 1000.0,
"amount": 190.0
}
]
},
"currency": "EUR",
"payment_terms": {
"iban": "DE89370400440532013000",
"bic": "COBADEFFXXX",
"account_holder": "Acme Corp"
},
"notes": "Payment due within 30 days"
},
"pdf_text": "Invoice RE-2024-001\nAcme Corp...",
"extraction_meta": {
"pages": 2,
"xml_attachment_name": "factur-x.xml",
"extraction_time_ms": 45
}
}
```
**Response (Non-ZUGFeRD PDF):**
```json
{
"is_zugferd": false,
"zugferd_profil": null,
"xml_raw": null,
"xml_data": null,
"pdf_text": "Regular PDF content...",
"extraction_meta": {
"pages": 1,
"xml_attachment_name": null,
"extraction_time_ms": 20
}
}
```
**Example:**
```bash
# Convert PDF to base64 and extract
PDF_BASE64=$(base64 -w 0 invoice.pdf)
curl -X POST http://localhost:5000/extract \
-H "Content-Type: application/json" \
-d "{\"pdf_base64\": \"$PDF_BASE64\"}"
```
### POST /validate
Validate invoice data against business rules and regulatory requirements.
**Request:**
```json
{
"xml_data": {
"invoice_number": "RE-2024-001",
"invoice_date": "2024-02-01",
"due_date": "2024-02-28",
"supplier": {
"name": "Acme Corp",
"vat_id": "DE123456789"
},
"buyer": {
"name": "Customer GmbH",
"vat_id": "DE987654321"
},
"line_items": [
{
"position": 1,
"description": "Consulting Services",
"quantity": 10.0,
"unit": "HUR",
"unit_price": 100.0,
"line_total": 1000.0,
"vat_rate": 19.0
}
],
"totals": {
"line_total_sum": 1000.0,
"net": 1000.0,
"vat_total": 190.0,
"gross": 1190.0,
"vat_breakdown": [
{
"rate": 19.0,
"base": 1000.0,
"amount": 190.0
}
]
},
"currency": "EUR"
},
"pdf_text": "Invoice RE-2024-001\nTotal: 1190.00 EUR",
"checks": ["pflichtfelder", "betraege", "ustid", "pdf_abgleich"]
}
```
**Response:**
```json
{
"result": {
"is_valid": true,
"errors": [],
"warnings": [],
"summary": {
"total_checks": 4,
"checks_passed": 4,
"checks_failed": 0,
"critical_errors": 0,
"warnings": 0
},
"validation_time_ms": 12
}
}
```
**Example:**
```bash
curl -X POST http://localhost:5000/validate \
-H "Content-Type: application/json" \
-d '{
"xml_data": {"invoice_number": "RE-001", ...},
"checks": ["pflichtfelder", "betraege"]
}'
```
## Validation Checks
The service supports four validation checks:
### 1. pflichtfelder (Required Fields)
Validates that all critical invoice fields are present and non-empty:
- **Critical errors:** invoice_number, invoice_date, supplier.name, supplier.vat_id, buyer.name, totals.net, totals.gross, totals.vat_total, line_items array, line item fields
- **Warnings:** due_date, payment_terms.iban
### 2. betraege (Amount Calculations)
Verifies all monetary calculations are correct:
- Line total = quantity × unit_price (for each line item)
- totals.net = sum of all line totals
- VAT breakdown amount = base × (rate/100) (for each VAT entry)
- totals.vat_total = sum of VAT breakdown amounts
- totals.gross = totals.net + totals.vat_total
Uses a tolerance of 0.01 for floating-point comparison.
### 3. ustid (VAT ID Format)
Validates VAT ID format for supported countries:
- **Germany (DE):** DE followed by 9 digits (e.g., `DE123456789`)
- **Austria (AT):** ATU followed by 8 digits (e.g., `ATU12345678`)
- **Switzerland (CH):** CHE followed by 9 digits and MWST/TVA/IVA suffix (e.g., `CHE123456789MWST`)
### 4. pdf_abgleich (PDF Comparison)
Compares XML data against extracted PDF text:
- Invoice number exact match
- Totals (net, gross, vat_total) within tolerance
- Returns warnings (not errors) for mismatches
## ZUGFeRD Profiles
The service detects and reports the following ZUGFeRD 2.x profiles:
| Profile | Description |
|---------|-------------|
| MINIMUM | Minimal profile with basic invoice data |
| BASIC | Basic profile for simple B2B invoicing |
| BASIC WL | Basic profile with additional buyer data |
| EN16931 | Full profile compliant with EN 16931 standard |
| EXTENDED | Extended profile with additional optional fields |
The profile is automatically detected from the embedded XML metadata.
## Configuration
### Environment Variables
The service supports the following environment variables:
| Variable | Default | Description |
|----------|---------|-------------|
| `HOST` | `0.0.0.0` | Host address to bind to |
| `PORT` | `5000` | Port to listen on |
| `LOG_LEVEL` | `INFO` | Logging level (DEBUG, INFO, WARNING, ERROR) |
### Docker Compose
The provided `docker-compose.yml` includes:
- Port mapping: `5000:5000`
- Health check endpoint
- Read-only source mount for development
- Restart policy: `unless-stopped`
### Nix
The flake provides:
- `packages.zugferd-service`: Production build
- `devShells.default`: Development shell with all dependencies
## NixOS Deployment
Example NixOS module configuration:
```nix
{ config, pkgs, ... }:
let
zugferd-service = (import ./zugferd-service {}).packages.zugferd-service;
in {
systemd.services.zugferd-service = {
enable = true;
description = "ZUGFeRD Invoice Service";
after = [ "network.target" ];
wantedBy = [ "multi-user.target" ];
serviceConfig = {
ExecStart = "${zugferd-service}/bin/zugferd-service";
Restart = "always";
RestartSec = "10";
DynamicUser = true;
ProtectSystem = "strict";
ProtectHome = true;
PrivateTmp = true;
NoNewPrivileges = true;
};
environment = {
HOST = "127.0.0.1";
PORT = "5000";
LOG_LEVEL = "INFO";
};
};
}
```
For production, consider adding:
- Reverse proxy (nginx/caddy) with HTTPS
- Authentication middleware
- Rate limiting
- Logging aggregation
## Development
### Running Tests
```bash
# Run all tests
pytest
# Run with coverage
pytest --cov=src
# Run specific test file
pytest tests/test_extract.py
```
### Project Structure
```
zugferd-service/
├── src/
│ ├── __init__.py
│ ├── main.py # FastAPI application and endpoints
│ ├── models.py # Pydantic models for requests/responses
│ ├── extractor.py # ZUGFeRD XML extraction logic
│ ├── validator.py # Invoice validation logic
│ ├── pdf_parser.py # PDF text extraction
│ └── utils.py # Utility functions
├── tests/
│ ├── test_extract.py
│ ├── test_validate.py
│ └── fixtures/ # Test PDF files
├── pyproject.toml # Project metadata and dependencies
├── Dockerfile # Multi-stage Docker build
├── docker-compose.yml # Docker Compose configuration
└── flake.nix # Nix flake for reproducible builds
```
### Dependencies
**Core:**
- fastapi>=0.109.0 - Web framework
- uvicorn>=0.27.0 - ASGI server
- pydantic>=2.5.0 - Data validation
- factur-x>=2.5 - ZUGFeRD/Factur-X library
- pypdf>=4.0.0 - PDF text extraction
- lxml>=5.0.0 - XML processing
**Development:**
- pytest>=8.0.0 - Testing framework
- pytest-asyncio>=0.23.0 - Async test support
- httpx>=0.27.0 - HTTP client for testing
## Troubleshooting
### Common Issues
**Service fails to start with "Address already in use"**
Change the port:
```bash
# Docker
docker run -p 8000:5000 zugferd-service
# Nix/Python
PORT=8000 python -m src.main
```
**Extraction returns "is_zugferd: false"**
- Verify the PDF contains ZUGFeRD/Factur-X XML attachment
- Check that the file is not password-protected
- Ensure the file is a valid PDF (not corrupt)
**Validation fails with "missing_required" errors**
Check that all required fields are present:
- invoice_number
- invoice_date (YYYY-MM-DD format)
- supplier.name and supplier.vat_id
- buyer.name
- Non-zero totals (net, gross, vat_total)
- At least one line item with description, quantity, and unit_price
**VAT ID validation fails**
Verify the VAT ID format:
- German: `DE` + 9 digits
- Austrian: `ATU` + 8 digits
- Swiss: `CHE` + 9 digits + `MWST`/`TVA`/`IVA`
**Docker build is slow**
Use BuildKit for faster builds:
```bash
DOCKER_BUILDKIT=1 docker build -t zugferd-service .
```
### Error Codes
| Error Code | Description |
|------------|-------------|
| `invalid_base64` | Invalid base64 encoding in request |
| `file_too_large` | PDF exceeds 10MB limit |
| `password_protected_pdf` | PDF is password-protected |
| `invalid_pdf` | File is not a valid PDF |
| `corrupt_pdf` | PDF file is corrupted or unreadable |
| `invalid_xml` | Embedded XML is malformed |
### Logs
The service outputs structured JSON logs:
```json
{
"timestamp": "2024-02-01T10:30:00Z",
"level": "INFO",
"message": "Extraction completed",
"data": {
"is_zugferd": true,
"profile": "EN16931",
"extraction_time_ms": 45
}
}
```
## License
MIT
## Support
For issues, questions, or contributions, please refer to the project repository.