# ZUGFeRD-Service A REST API service for extracting and validating ZUGFeRD/Factur-X invoice data from PDF files. Built with FastAPI and Python 3.11+. ## Overview ZUGFeRD-Service provides a simple HTTP API to: - Extract structured invoice data from ZUGFeRD-enabled PDFs - Detect and identify ZUGFeRD profiles (MINIMUM, BASIC, BASIC WL, EN16931, EXTENDED) - Validate invoice data against business rules and regulatory requirements - Compare XML data against PDF text content ZUGFeRD (Zentraler User Guide der Forums Elektronische Rechnung Deutschland) is a German standard for electronic invoices, using the Cross Industry Invoice (CII) XML format embedded in PDF files (also known as Factur-X in France). ## Quick Start ### Docker The quickest way to get started is using Docker: ```bash # Build the image docker build -t zugferd-service . # Run the service docker run -p 5000:5000 zugferd-service # Or use Docker Compose docker-compose up -d ``` ### Nix If you're using Nix, build and run with: ```bash # Build the package nix build .#zugferd-service # Run the service nix run .#zugferd-service # Enter development shell nix develop ``` ### Python (Development) For local development: ```bash # Install dependencies pip install -e . # Run directly python -m src.main # Or use the installed script zugferd-service ``` The service starts on `http://0.0.0.0:5000` by default. ## API Reference ### GET /health Check if the service is running. **Response:** ```json { "status": "healthy", "version": "1.0.0" } ``` **Example:** ```bash curl http://localhost:5000/health ``` ### POST /extract Extract ZUGFeRD data from a base64-encoded PDF file. **Request:** ```json { "pdf_base64": "JVBERi0xLjQKJeLjz9MK..." } ``` **Response (ZUGFeRD PDF):** ```json { "is_zugferd": true, "zugferd_profil": "EN16931", "xml_raw": "", "xml_data": { "invoice_number": "RE-2024-001", "invoice_date": "2024-02-01", "due_date": "2024-02-28", "supplier": { "name": "Acme Corp", "street": "Main Street 123", "postal_code": "12345", "city": "Berlin", "country": "DE", "vat_id": "DE123456789", "email": "billing@acme-corp.de" }, "buyer": { "name": "Customer GmbH", "street": "Market Square 5", "postal_code": "54321", "city": "Hamburg", "country": "DE", "vat_id": "DE987654321" }, "line_items": [ { "position": 1, "article_number": "ART-001", "article_number_buyer": null, "description": "Consulting Services", "quantity": 10.0, "unit": "HUR", "unit_price": 100.0, "line_total": 1000.0, "vat_rate": 19.0, "vat_amount": 190.0 } ], "totals": { "line_total_sum": 1000.0, "net": 1000.0, "vat_total": 190.0, "gross": 1190.0, "vat_breakdown": [ { "rate": 19.0, "base": 1000.0, "amount": 190.0 } ] }, "currency": "EUR", "payment_terms": { "iban": "DE89370400440532013000", "bic": "COBADEFFXXX", "account_holder": "Acme Corp" }, "notes": "Payment due within 30 days" }, "pdf_text": "Invoice RE-2024-001\nAcme Corp...", "extraction_meta": { "pages": 2, "xml_attachment_name": "factur-x.xml", "extraction_time_ms": 45 } } ``` **Response (Non-ZUGFeRD PDF):** ```json { "is_zugferd": false, "zugferd_profil": null, "xml_raw": null, "xml_data": null, "pdf_text": "Regular PDF content...", "extraction_meta": { "pages": 1, "xml_attachment_name": null, "extraction_time_ms": 20 } } ``` **Example:** ```bash # Convert PDF to base64 and extract PDF_BASE64=$(base64 -w 0 invoice.pdf) curl -X POST http://localhost:5000/extract \ -H "Content-Type: application/json" \ -d "{\"pdf_base64\": \"$PDF_BASE64\"}" ``` ### POST /validate Validate invoice data against business rules and regulatory requirements. **Request:** ```json { "xml_data": { "invoice_number": "RE-2024-001", "invoice_date": "2024-02-01", "due_date": "2024-02-28", "supplier": { "name": "Acme Corp", "vat_id": "DE123456789" }, "buyer": { "name": "Customer GmbH", "vat_id": "DE987654321" }, "line_items": [ { "position": 1, "description": "Consulting Services", "quantity": 10.0, "unit": "HUR", "unit_price": 100.0, "line_total": 1000.0, "vat_rate": 19.0 } ], "totals": { "line_total_sum": 1000.0, "net": 1000.0, "vat_total": 190.0, "gross": 1190.0, "vat_breakdown": [ { "rate": 19.0, "base": 1000.0, "amount": 190.0 } ] }, "currency": "EUR" }, "pdf_text": "Invoice RE-2024-001\nTotal: 1190.00 EUR", "checks": ["pflichtfelder", "betraege", "ustid", "pdf_abgleich"] } ``` **Response:** ```json { "result": { "is_valid": true, "errors": [], "warnings": [], "summary": { "total_checks": 4, "checks_passed": 4, "checks_failed": 0, "critical_errors": 0, "warnings": 0 }, "validation_time_ms": 12 } } ``` **Example:** ```bash curl -X POST http://localhost:5000/validate \ -H "Content-Type: application/json" \ -d '{ "xml_data": {"invoice_number": "RE-001", ...}, "checks": ["pflichtfelder", "betraege"] }' ``` ## Validation Checks The service supports four validation checks: ### 1. pflichtfelder (Required Fields) Validates that all critical invoice fields are present and non-empty: - **Critical errors:** invoice_number, invoice_date, supplier.name, supplier.vat_id, buyer.name, totals.net, totals.gross, totals.vat_total, line_items array, line item fields - **Warnings:** due_date, payment_terms.iban ### 2. betraege (Amount Calculations) Verifies all monetary calculations are correct: - Line total = quantity × unit_price (for each line item) - totals.net = sum of all line totals - VAT breakdown amount = base × (rate/100) (for each VAT entry) - totals.vat_total = sum of VAT breakdown amounts - totals.gross = totals.net + totals.vat_total Uses a tolerance of 0.01 for floating-point comparison. ### 3. ustid (VAT ID Format) Validates VAT ID format for supported countries: - **Germany (DE):** DE followed by 9 digits (e.g., `DE123456789`) - **Austria (AT):** ATU followed by 8 digits (e.g., `ATU12345678`) - **Switzerland (CH):** CHE followed by 9 digits and MWST/TVA/IVA suffix (e.g., `CHE123456789MWST`) ### 4. pdf_abgleich (PDF Comparison) Compares XML data against extracted PDF text: - Invoice number exact match - Totals (net, gross, vat_total) within tolerance - Returns warnings (not errors) for mismatches ## ZUGFeRD Profiles The service detects and reports the following ZUGFeRD 2.x profiles: | Profile | Description | |---------|-------------| | MINIMUM | Minimal profile with basic invoice data | | BASIC | Basic profile for simple B2B invoicing | | BASIC WL | Basic profile with additional buyer data | | EN16931 | Full profile compliant with EN 16931 standard | | EXTENDED | Extended profile with additional optional fields | The profile is automatically detected from the embedded XML metadata. ## Configuration ### Environment Variables The service supports the following environment variables: | Variable | Default | Description | |----------|---------|-------------| | `HOST` | `0.0.0.0` | Host address to bind to | | `PORT` | `5000` | Port to listen on | | `LOG_LEVEL` | `INFO` | Logging level (DEBUG, INFO, WARNING, ERROR) | ### Docker Compose The provided `docker-compose.yml` includes: - Port mapping: `5000:5000` - Health check endpoint - Read-only source mount for development - Restart policy: `unless-stopped` ### Nix The flake provides: - `packages.zugferd-service`: Production build - `devShells.default`: Development shell with all dependencies ## NixOS Deployment Example NixOS module configuration: ```nix { config, pkgs, ... }: let zugferd-service = (import ./zugferd-service {}).packages.zugferd-service; in { systemd.services.zugferd-service = { enable = true; description = "ZUGFeRD Invoice Service"; after = [ "network.target" ]; wantedBy = [ "multi-user.target" ]; serviceConfig = { ExecStart = "${zugferd-service}/bin/zugferd-service"; Restart = "always"; RestartSec = "10"; DynamicUser = true; ProtectSystem = "strict"; ProtectHome = true; PrivateTmp = true; NoNewPrivileges = true; }; environment = { HOST = "127.0.0.1"; PORT = "5000"; LOG_LEVEL = "INFO"; }; }; } ``` For production, consider adding: - Reverse proxy (nginx/caddy) with HTTPS - Authentication middleware - Rate limiting - Logging aggregation ## Development ### Running Tests ```bash # Run all tests pytest # Run with coverage pytest --cov=src # Run specific test file pytest tests/test_extract.py ``` ### Project Structure ``` zugferd-service/ ├── src/ │ ├── __init__.py │ ├── main.py # FastAPI application and endpoints │ ├── models.py # Pydantic models for requests/responses │ ├── extractor.py # ZUGFeRD XML extraction logic │ ├── validator.py # Invoice validation logic │ ├── pdf_parser.py # PDF text extraction │ └── utils.py # Utility functions ├── tests/ │ ├── test_extract.py │ ├── test_validate.py │ └── fixtures/ # Test PDF files ├── pyproject.toml # Project metadata and dependencies ├── Dockerfile # Multi-stage Docker build ├── docker-compose.yml # Docker Compose configuration └── flake.nix # Nix flake for reproducible builds ``` ### Dependencies **Core:** - fastapi>=0.109.0 - Web framework - uvicorn>=0.27.0 - ASGI server - pydantic>=2.5.0 - Data validation - factur-x>=2.5 - ZUGFeRD/Factur-X library - pypdf>=4.0.0 - PDF text extraction - lxml>=5.0.0 - XML processing **Development:** - pytest>=8.0.0 - Testing framework - pytest-asyncio>=0.23.0 - Async test support - httpx>=0.27.0 - HTTP client for testing ## Troubleshooting ### Common Issues **Service fails to start with "Address already in use"** Change the port: ```bash # Docker docker run -p 8000:5000 zugferd-service # Nix/Python PORT=8000 python -m src.main ``` **Extraction returns "is_zugferd: false"** - Verify the PDF contains ZUGFeRD/Factur-X XML attachment - Check that the file is not password-protected - Ensure the file is a valid PDF (not corrupt) **Validation fails with "missing_required" errors** Check that all required fields are present: - invoice_number - invoice_date (YYYY-MM-DD format) - supplier.name and supplier.vat_id - buyer.name - Non-zero totals (net, gross, vat_total) - At least one line item with description, quantity, and unit_price **VAT ID validation fails** Verify the VAT ID format: - German: `DE` + 9 digits - Austrian: `ATU` + 8 digits - Swiss: `CHE` + 9 digits + `MWST`/`TVA`/`IVA` **Docker build is slow** Use BuildKit for faster builds: ```bash DOCKER_BUILDKIT=1 docker build -t zugferd-service . ``` ### Error Codes | Error Code | Description | |------------|-------------| | `invalid_base64` | Invalid base64 encoding in request | | `file_too_large` | PDF exceeds 10MB limit | | `password_protected_pdf` | PDF is password-protected | | `invalid_pdf` | File is not a valid PDF | | `corrupt_pdf` | PDF file is corrupted or unreadable | | `invalid_xml` | Embedded XML is malformed | ### Logs The service outputs structured JSON logs: ```json { "timestamp": "2024-02-01T10:30:00Z", "level": "INFO", "message": "Extraction completed", "data": { "is_zugferd": true, "profile": "EN16931", "extraction_time_ms": 45 } } ``` ## License MIT ## Support For issues, questions, or contributions, please refer to the project repository.